-
Notifications
You must be signed in to change notification settings - Fork 868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update networking.rst #12938
base: main
Are you sure you want to change the base?
Update networking.rst #12938
Conversation
Updated Intel to Cornelis Networks for True Scale and Omni-Path. Signed-off-by: LisaSchupp <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Hmmm....just to be careful. When the split occurred, Intel also maintained an "OmniPath" device because that is what they continued to call their library. Caused some friction at the time. Has that been resolved - i.e., does Intel concur with this change? Do we need to differentiate any legacy devices that are still out there - i.e., do we need to add clarifying language so an Intel device user that employs "OmniPath" software doesn't get confused by this doc? |
@rhc54 Good question; I wasn't aware of the history. I converted this PR to Draft to ensure we don't merge it before we figure this out. @lisaschupper-cornelisnetworks Can you answer? |
Things got more than a little confusing after the divestiture. Intel continued to use the PSM name/terminology for their software library, which caused a lot of confusion since the marketplace had largely equated OmniPath device with PSM software. In fact, Intel often referred to both device and software as "OmniPath". So as Intel continued to market devices that used PSM and PSM-2, there was a lot of brand confusion over what constituted OmniPath. Not sure how all that eventually got resolved. Anyway, I'm no legal expert and it has been a few years since all this went down. My point was only to raise awareness to be careful that the changes don't create more confusion in the user community. A quick web search indicates that Intel continues to market PSM-based devices - not sure how you folks are trying to differentiate your "PSM" from theirs in the market. Do we need to add verbiage here? |
Omni-Path is owned and trademarked by Cornelis Networks. Intel does not have any other product with that name or OmniPath. |
Thanks - appreciate the clarification. However, I probably didn't express myself clearly enough. There has been a fair amount of confusion in the community regarding "PSM". As far as I can see, the problem hasn't really been resolved - we have two incompatible library families, both named "PSM". I think your clarifications here are fine, and I grok that you are correcting the named affiliations, but I'm wondering if this really goes far enough. I guess what I'm trying to lead to probably merits its own PR. The problem is that we have had users who provide the specified configure flag and either point at the Intel version of the library, or configure finds the Intel version in a default location. I'm not sure our configure logic is smart enough to distinguish between the two library families - at least, the users report problems at runtime when this confusion occurs. I'm not sure of the current direction of Intel's PSM-based product line (rumblings of a "Slingshot-like" product used to circulate), but the fact that the confusion has been reported a couple of times indicates that we might see this problem a bit more (appears that some orgs utilize the Intel Ethernet-based product in some capacity in their systems). It would be nice if the docs could at least provide some insight into the "whose PSM?" problem so people are aware of the potential confusion - at least until the configure logic can be hardened to avoid mistakenly allowing the wrong "PSM" to be used. Might avoid some user hair pulling - and OMPI spending time trying to figure out their problem. If that isn't possible (and I don't know the proper wording for it), then perhaps someone could follow-up soon with a review and an update (if needed) to the configure logic so we properly ignore the Intel "PSM" variant when configuring for Cornelis devices? If Intel wants/needs to specifically add support for their "PSM" product, then they can add their own configure option and/or logic for that purpose. |
Sidenote: as a non-PSM user myself, I do remember complaining to Intel when they made PSM3 in libfabric, indicating that exactly this kind of confusion will occur with customers. Are you saying that there are multiple different libraries out there named "PSM" that are incompatible with each other? I'm not talking about PSM1 vs. PSM2 vs. PSM3 -- but Intel "PSM" vs. Cornelis "PSM", and they're different, with different APIs, headers, and/or functionality? Such that Open MPI's |
Let me clarify my question: Open MPI has
EDIT: Open MPI only has
I note that |
I believe PSM was superseded by PSM2 when True Scale product was end of life'd. **Omni-Path PSM2 is still used. So Open MPI is correct. As for PSM3, Intel would need to provide that information. It may not be part of Open MPI at all. **Yes, configure --help should be updated as well. Thank you. (We have more to do finding and replacing Intel with Cornelis Networks in the documentation.) |
@lisaschupper-cornelisnetworks I see a fair number of mentions of "PSM" and "PSM2" in the docs. Do you want to update this PR to be more comprehensive and always state PSM2? I also see at least one reference to Intel True Scale and Intel Omni-Path. If the hood is already up here, it would probably be good to fix all the docs to make them correct. Not just a few minor changes here and there (and leave a bunch of stuff as old / stale / outdated).
Yep, let's fix this kind of thing, too!
That's horrible as well.
10000% agree. @lisaschupper-cornelisnetworks Can all of these issues be addressed? |
I haven't tested it myself, and it has been a few years since I was involved in all this - so my info may not be accurate any more. What I can say is that Intel and Cornelis "PSM" were initially one-and-the-same at the time of the split. This covered both PSM1 and PSM2 libraries (since both families had been released). There was considerable debate at the time over who retained control over those - I gather from this thread that Cornelis may have eventually assumed control, but that is unclear. The follow-on Intel devices initially used PSM2, but now have moved on to PSM3 and (most recently) PSM4. I don't know the internals, but would imagine they may be quite different. I don't know if Cornelis has made incompatible changes to PSM2 such that Intel devices can no longer use it. The documentation shows both At the least, we probably need to document all this so people realize the potential for confusion, especially if they install an Intel Ethernet device (which I assume would be solely used for things like data storage support) and a Cornelis fabric device. Should probably also provide enough info so they understand that they need to take some care as to which PSM2 library they are installing, and would be nice if the configure logic could differentiate them. Might mean we need to add configure options that make it clear "I want the Cornelis PSM library (whatever version it is)" vs the Intel one. |
This first commit was a training exercise for me as I am new to GitHub PRs. |
And let's not forget - there are still quite a few Intel OmniPath devices (i.e., sold prior to the split) out there. May be getting old in the teeth, but still serviceable. Do we need to provide some language here so those people know their device is still supported (as they may not know anything about Cornelis)? |
Fully understand - and welcome! Didn't mean to muddy the waters so much - as I noted, much of this conversation may be more appropriate for a follow-on PR or two. |
Cornelis is responsible for these older devices. That was part of the divestiture agreement. |
@lisaschupper-cornelisnetworks Haha! I echo what Ralph said -- welcome! And sorry to make you jump right into the deep end of complicated discussions for what should have been a simple issue to fix! 😂 |
No worries. I like this kind of interaction. How else will I learn. |
How about this:
Does that sound reasonable? |
We revamped the scope of this PR after the initial approval
I do not have the authority to make these decisions. Bringing in another person to review the conversation and proposal. Thanks for your guidance @jsquyres. @BrendanCunningham - Please review this PR and the proposed changes and let me know how we should proceed. Thanks! |
Hi Brendan! Been a long time!
Sounds reasonable to me, but I defer to Brendan's input. |
Ping @BrendanCunningham |
Updated Intel to Cornelis Networks for True Scale and Omni-Path.