WeeklyTelcon_20210119

Open MPI Weekly Telecon ---

Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (on Web-ex)

Akshay Venkatesh (NVIDIA)
Aurelien Bouteiller (UTK)
Austen Lauria (IBM)
Brendan Cunningham (Cornelis Networks)
Christoph Niethammer (HLRS)
David Bernhold (ORNL)
Edgar Gabriel (UH)
Geoffrey Paulsen (IBM)
George Bosilca (UTK)
Hessam Mirsadeghi (UCX/nVidia)
Howard Pritchard (LANL)
Jeff Squyres (Cisco)
Joseph Schuchart
Josh Hursey (IBM)
Joshua Ladd (nVidia/Mellanox)
Matthew Dosanjh (Sandia)
Michael Heinz (Cornelis Networks)
Naughton III, Thomas (ORNL)
Raghu Raja (AWS)
Ralph Castain (Intel)
Todd Kordenbrock (Sandia)
William Zhang (AWS)

not there today (I keep this for easy cut-n-paste for future notes)

Artem Polyakov (nVidia/Mellanox)
Barrett, Brian (AWS)
Brandon Yates (Intel)
Charles Shereda (LLNL)
Erik Zeiske
Geoffroy Vallee (ARM)
Harumi Kuno (HPE)
Mark Allen (IBM)
Matias Cabral (Intel)
Nathan Hjelm (Google)
Noah Evans (Sandia)
Scott Breyer (Sandia?)
Shintaro iwasaki
Tomislav Janjusic
Xin Zhao (nVidia/Mellanox)

Web-Ex

link has changed for 2021. Please see email from Jeff Squyres to [email protected] on 12/15/2020 for the new link

4.0.x

Flux fix in master + UCX PML. Commit merged.
SLURM_WHOLE issue, want to stay in sync with OMPI v4.1.x.
- Revert "v4.0.x: Update Slurm launch support"
- Want concensus with v4.1 branch.
- Running into lots of problems trying to srun a job with SLURM (20.11.{0,1,2})
  - with or without this patch, still seeing some other issues.
  - Not just confined to OMPI.
  - Ralph has been advising users to either downgrade or upgrade.
- Ralph is suggesting to revert this and advise users to not use those versions of SLURM
  - SLURM has posted a 10.11.3 tarball that reverts the offending changes.
- Reverting in Ompi v4.0 and v4.1 and Ralph reverting in PRRTE
v4.0 release, would like to take this ROMIO one-off fix instead of
- https://github.com/open-mpi/ompi/pull/8370 - Fixes HDF5 on LUSTRE
- Proposing take this one-off for v4.0.6, as a whole new ROMIO is a big change.
- Can ask Rob Latham - what the advice of taking this in the middle of a release train.
  - How disruptive would this be?
  - case by case basis
  - Geoff will email him, and ask.
- ROMIO Author.
- Waiting on v4.0.6rc2 until we get an aswer.
Discussed https://github.com/open-mpi/ompi/issues/8321
- Howard is trying to reproduce, but another user having difficulty reproducing.
- Could affect
- UCX in VM possible silent error.
- Added blocker label.
- in v4.0.x and master, though might be down in UCX.

v4.1

Flux fix in master + UCX PML.
- Will need a version of https://github.com/open-mpi/ompi/pull/8380
- No Flux on master currently.
8367 - Packager Hassan and Josh will take to UCX community
Issue 8334 - a performance regression with AVX512 on Skylake. Still digging into.
- Nex Gen of Processor doesn't hit this issue.
  - THIS is an issue in LAMMPS.
- Simple MCA parameter on v4.1 to remove this code-path.
- George had a PR to not use these by default until we can do something better.
- Raghu tested AVX512 seems to make it slower.
- Papers show that anything after AVX2 throttles down cores and have this effect.
- Conservative approach is to disable the AVX enhancement by default.
  - PR 8176 disables this optimization by default.
  - Would like to merge to master (at least temporarily) to PR to v4.1
Issue 8379 - UCT appears to be default and not UCX
- UCT One-sided issue
  - Everyone on call thought UCT BTL was Disabled by default.
- Looks like a bug has crept in? Perhaps selection is not selecting the UCX correctly due to versioning?
  - He's uing UCX 1.9, but specifically disallowing UCT on anything > 1.8
Big performance regression in Open-MPI v4.1.0 in VAST
- PR 8123
- Brought in PMIx v3.2.1 as internal PMIx
  - from PMIx v3.1.5
  - Wasn't brought into master as normal, due to submodules.
  - Started bysecting v3.2.x
    - Properly support demodex PR on PMIx.
- What is the default for preconnect?
  - If we turn preconnect-all to true, then this resolves the performance regression
  - 32Nodes 40ppn - 80 seconds wireup
  - Is direct modex the default?
- Is it the auto-selection if PML is not specified?
- Looks like Default in OMPI v4.1.0 was changed to Direct-Modex from Full-Modex
  - Bringing this up to PMIx standard, shouldn't have
  - The issue of Not knowing the PMLs, migh5t have caused each node to do a Direct Modex with everyone else.
  - PML Direct Modex was fixed in master, but not sure if it was taking back to OMPI v4.1
- nVidia is proposing we revert the Direct-Modex default change
- Ralph will make a default change in next few hours.
- But will need an OMPI v4.1 fix soon.

Open-MPI v5.0

What's the state of ULFM (PR 7740) for v5.0?

Does the community want this ULFM PR 7740 for OMPI v5.0? If so, we need a PRRTE v3.0
- Aurelien will rebase.
- Works with PRRTE refered to ompi master submodule pointer.
- Currently used in a bunch of places.
- Run normal regression tests. Should not see any performance regressions.
- When this works, can provide other tests.
- Is a configure flag. Default is to configure in, but disabled at runtime.
  - A number of things to set to enable.
  - Aurelien is working to get a single parameter
- Lets get some CODE reviews done.
  - Look at intersections of the core, and ensure that the NOT-ULFM paths are "clean".
- Also we have a downstream affect PMIX and PRRTE to get a
- Lets put a deadline on reviews. Lets say in 4 weeks, we'll push the merge button.
  - Jan 26th we'll merge if no issues

Josh and George removed Checkpoint Restart

Modified ABI - removed one callback/member function from some components (BTLs/PMLs) used for FT event.
- All these structures for these components.
- Pending for this discussion.
- Going to version the frameworks that are affected.
- Not this simple in practice, because usually we just return a pointer to a static object.
  - But this isn't possible anymore.
  - We don't support multiple versions
Do we think we should allow Open-MPI v5.0 to run with mcas from past versions?
- Maybe good to protect against it?
- Unless we know of someone we need to support like this, we shouldn't bend over for this.
- Josh thinks the Container community is experimenting with this.
Josh has advised that Open-MPI doesn't guarantee
v5.0 is advertised as an ABI break.
In this case, the framework doesn't exist anymore.
George will do a check to ensure we're not loading mcas from earlier version. *

Jeff Squyres want the v5.0 RMs to generate a list of versions it'll support, to document.

Still need to coirdinate on this. He'd like this, this week.
PMIx v4.0 working on Tools, hopefully done soon.
- PMIx go through python bindings.
- a new Shmem component to replace
- Still working on.
Dave Wooten pushed up some PRRTE patches, and making some progress there.
- Slow but steady progress.
- Once tool work is more stabilized on PMIx v4.0, will add some tool tests to CI.
- Probably won't start until first of the year.
How is the submodule reference updatees on Open-MPI master
- Probably be switching OMPI master to master PMIx in next few weeks.
  - PR 8319 - this failed. Should this be closed and create a new one?
- Josh was still looking to see about adding some cross checking CI
- When making a PRTE PR, could add some comment to the PR and it'll trigger Open-MPI CI with that PR.
v4.0 PMIx and PRRTE master.
- When PRRTE branches a v2.0 branch, we can switch to that then, but that'll
Two different drivers:
- OFI MTL
- HFI support
- Interest in PRRTE in a release, and a few other things that are already in v4.1.x
- HAN and ADAPT as default.
- Amazon helping testing and other resources
- Amazon also investing to contract Ralph to help get PRRTE up to speed.
Other features in PMIX
- can set GPU affinities, can query GPU info

Longer Term discussions

ROMIO Long Term (12/8)

What do we want to do about ROMIO in general.
- OMPIO is the default everywhere.
- Giles is saying the changes we made are integration changes.
  - There have been some OMPI specific changes put into ROMIO, meaning upstream maintainers refuse to help us with it.
  - We may be able to work with upstream to make a clear API between the two.
- As a 3rd party package, should we move it upto the 3rd party packaging area, to be clear that we shouldn't make changes to this area?
Need to look at this treematch thing. Upstream package that is now inside of Open-MPI.
Might want a CI bot to watch a set of files, and flag PRs that violate principles like this.

Doc update

PR 8329 - convert README, HACKING, and possibly Manpages to restructured text.
- Uses https://www.sphinx-doc.org/en/master/ (Python tool, can pip install)
- Has a built from this PR, so we can see what it looks like.
- Have a look. It's a different approach to have one document that's the whole thing.
  - FAQ, README, HACKING.
Do people even use manpages anymore? Do we need/want them in our tarballs?

How's the state of https://github.com/open-mpi/ompi-tests-public/

Putting new tests there
Very little there so far, but working on adding some more.
Should have some new Sessions tests

What's going to be the state of the SM Cuda BTL and CUDA support in v5.0?

What's the general state? Any known issues?
AWS would like to get.
Josh Ladd - Will take internally to see what they have to say.
From nVidia/Mellanox, Cuda Support is through UCX, SM Cuda isn't tested that much.
Hessam Mirsadeg - All Cuda awareness through UCX
May ask George Bosilica about this.
Don't want to remove a BTL if someone is interested in it.
UCX also supports TCP via CUDA
PRRTE CLI on v5.0 will have some GPU functionality that Ralph is working on
Update 11/17/2020
- UTK is interested in this BTL, and maybe others.
- Still gap in the MTL use-case.
- nVidia is not maintaining SMCuda anymore. All CUDA support will be through UCX
- What's the state of the shared memory in the BTL?
  - This is the really old generation Shared Memory. Older than Vader.
- Was told after a certain point, no more development in SM Cuda.
- One option might be to
- Another option might be to bring that SM in SMCuda to Vader(now SM)
Discussion on:
- Draft Request Make default static https://github.com/open-mpi/ompi/pull/8132
- One con is that many providers hard link against libraries, which would then make libmpi dependent on this.
- Non-Homogenous clusters (GPUs on some nodes, and non-GPUs on some other)

Video Presentation

New George and Jeff are leading
One for Open-MPI and one for PMIx
In a month and a half or so. George will send date to Jeff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WeeklyTelcon_20210119

Open MPI Weekly Telecon ---

Attendees (on Web-ex)

not there today (I keep this for easy cut-n-paste for future notes)

Web-Ex

4.0.x

v4.1

Open-MPI v5.0

What's the state of ULFM (PR 7740) for v5.0?

Josh and George removed Checkpoint Restart

Jeff Squyres want the v5.0 RMs to generate a list of versions it'll support, to document.

Longer Term discussions

ROMIO Long Term (12/8)

Doc update

How's the state of https://github.com/open-mpi/ompi-tests-public/

What's going to be the state of the SM Cuda BTL and CUDA support in v5.0?

Video Presentation

Clone this wiki locally