-
Notifications
You must be signed in to change notification settings - Fork 868
Meeting Minutes 2020 08 10
Attendees:
- Jeff Squyres
- Austen Lauria
- Brian Barrett
- Brice Goglin
- Christoph Niethammer
- Howard Pritchard
- Joseph Schuchart
- Josh Hursey
- Matthew Dosanjh
- Nysal Jan
- Raghu Raja
- Tom Naughton
- Ralph Castain
- Shinji Sumimoto
- Todd Kordenbrock
- Nathan Hjelm
Just gather a list of all the user-noticeable changes here.
Make a list here in this one place so that it's easy to find if/when we go to actually document them.
-
5.0 Breaks BW compat
-
ABI / SO version change
-
mpirun command line arguments change
-
MPIR is gone from v5.0, there is a shim library that users can use.
- If you need the shim, go get it yourself
-
Totalview and DDT are both working on releasing -- they're waiting for Open MPI v5.0, so we don't have TV/DDT version numbers that support this yet.
-
PMI1/PMI2 from Slurm and cray: gone
-
ORTE is gone
- Most noticeable via MCA params that are gone and/or other mpirun CLI args
-
mpirun: most args are double-dash now (single-dash is largely gone)
- Would be good to make a list of the params that are gone
-
PMIx symbols are now visible to user applications
- Be careful to not link in another PMIx!
-
Multiple different MCA layers -- three different namespaces of MCA params
- PMIx
- PRRTE
- OMPI --> Need to document these somehow
-
Similar issue for the MCA config files
-
Similar issue for configure params
- E.g., how to get to configure CLI options for underlying packages (e.g., expose PRRTE / PMIx configure CLI options through Open MPI's configure)
-
Cross reference to https://github.com/open-mpi/ompi/wiki/Webex-affinity-discussions-2019-09 wiki page for many CLI options that now exist in v5.0 (including listing deprecated options)
-
Josh is working on prte.1.md man page.
-
-
Qthreads / Argobots
- This is a compile-time only decision
- We should describe this specifically somewhere (README?)
- There is only a very small subset of people who can/should use these options.
- ...need some docs from Qthreads/Argobots people here.
- NOTE: UCX PML does not use the synchronization object, so Argobots/Qthreads will effectively be stuck.
- Joseph cites #7702 (issue)
-
ULFM:
- Point off to their documentation
-
openib:
- Use UCX
-
Vader:
- Use sm. vader name going away eventually (6.0?)
-
ADAPT and HAN
- More details on how to configure these yourselves if you want to. Advanced users only.
-
Connectivity map
- ob1: will show the BTLs
- cm: might show the MTLs...? Need to check
- Assuming it does not go into Libfabric, etc.
- UCX: shows ucx
- Talk to Josh with suggestions for mpirun CLI options
- Spectrum is "--prot" (and "--prot-lazy")
-
Tell people that they need to update their auto-completion stuff (bash, etc.) because the format of stuff [may have] changed in 5.0 ompi_info output
-
5.0 general messaging
- We really recommend you use external:
- hwloc
- libevent
- pmix
- prrte (see note below)
- If you use the internal ones, you won't get the headers
- sidenote: we do not (yet?) recommend using external PRRTE (because you won't get the trivial mpirun wrapper for PRRTE; you can "prun ..." yourself, of course)
- We really recommend you use external:
-
How do we communicate this to users?
- v5.0 release guide
- Analogous to: https://www.open-mpi.org/software/ompi/major-changes.php
- ...but perhaps we can make this a little easier to find!
- Maybe make it "before" vs "after"?
- "Nice to have", but will we actually have time to do this?
- Needs to be google-able
- Analogous to: https://www.open-mpi.org/software/ompi/major-changes.php
- Point to the EasyBuild videos / slides
- Make error messages "google-able"
- ...maybe FAQ style?
- Maybe something better...? --> Brian points out that making something google-able is pretty darn easy. We just need the content that is linked to from somewhere.
- v5.0 release guide
-
Sidenotes:
- Still need ompi_info work to see PMIx/PRRTE/etc. (???)
- Still need PRRTE pass-thru of configure params from OMPI configure (Josh)
-
Raghu: pointed out that only recent versions of HDF5 (as of Aug 2020) deleted their MPI-1 functionality
- What are distro/packagers doing?
- Debian is Open MPI v4.0.2 and does not pass --enable-mpi1-compatibility
- Fedora ...?
- Did they silently enable --enable-mpi1-compatibility so that packages didn't notice?
- Might be worth checking the community on this -- last time we talked/checked this was Oct 2018 -- see https://github.com/open-mpi/ompi/wiki/5.0.x-FeatureList
- What are distro/packagers doing?
-
MPI_SIZEOF deprecated: Jeff will handle
-
MPI_COMM_TYPE_HW_UNGUIDED/GUIDED added as possible value for split_type - Section 6.4.2 on page 269
- Guilliume: did his prototype an external library, "hsplit"
- The external library will have another hwloc
- We will want to integrate that better in OMPI -- use our scalability stuff for hwloc, yadda yadda yadda
- Embed this?
- Probably best bet: pull it in as a good basis/starting point.
- Integrate it deeply from there.
- Brice will check with Guilliume, but assumption so far is that we should just integrate it as a starting point and go from there.
-
Callback-driven event interface added to MPI_T - Section 14.3.8
- Nathan has this prototyped in a branch
- There's still a few things that need to be adjusted on the branch, but it's close
- We would want some test coverage for this
- We assume there are no existing tests (Nathan thinks there may be some, somewhere...?)
- Nathan had some test code to develop his branch.
- He might be able to re-purpose those as real tests...?
- We assume there are no existing tests (Nathan thinks there may be some, somewhere...?)
- This deletes PERUSE support -- need to sync with George on this.
-
New MPI_INFO_CREATE_ENV function - Section 10.2.1 on page 420
- This is what Aurelien did, right?
- We think so.
- It's kind of an addendum to the sessions stuff.
-
MPI Sessions - many places in standard touched, main additions are in Chapter 6 and Chapter 10.
- Howard has this on a branch.
- It's "fully functional" but not fully debugged.
- Last time it was rebased was mid-May.
- Would be beneficial to restructure INIT/FINALIZE first
- Some pieces of this went in to master already (allowed us to delete nice big chunks of code)
- There are other pieces that are still only in the SESSIONS branch
- Is this v5.0 or not?
- ...not clear yet.
- Would want to make sure that there's some MTT coverage of this
- If by Oct, we haven't yet branched for v5 -- let's talk then about whether to bring this in to master/v5.0.
- Should have some more sessions tests by then.
-
Embiggenment - https://github.com/mpi-forum/mpi-issues/issues/137
- Doesn't sound like anyone is working on this in Open MPI ...?
- Lower priority, but we'll need this someday (to claim MPI-4 conformance)
-
MPI shared memory window / alignment stuff
- Joseph has a PR outstanding for this
-
MPI partitioned communication
- Matthew/Sandia is working on a prototype implementation in Open MPI
- It's Matthew's main focus for next few months
Raghu talked about how AWS is working to support new Libfabric APIs for "hmem" (heterogeneous memory). Will eventually have a PR to talk about.
Do we want to have a public repo for Open MPI tests?
-
ompi-tests
is private because of its history- Mainly because we needed a way to easily share publicly-downloadable test suites all in one place back in the beginning of the project
- It's a different internet now (trivially easy to download tests from anywhere on the internet). But
ompi-tests
remains.
- What about new tests -- should they be private?
- At least LANL would like DOE-funded self-written tests to be in a public repo, not a private repo.
- This is a fair point.
- HLRS would like to make their test suite public, too (that's currently in
ompi-tests
).
- Proposal:
- Make a new repo that is public
- Use same permissions that we have on main OMPI repo
- Use same LICENSE file that we have in the main OMPI repo
- Any new tests can go in there
- Old tests of which we are 100% sure they are Open MPI community providence can be moved to the new public repo (e.g., the HLRS test suite)
- There was general agreement that this was a good thing.
- Jeff will create
ompi-tests-public
repo.
- Jeff will create
Agenda is complete. No need to continue this meeting tomorrow.
But we still will have the "regular" Tuesday meeting tomorrow.