-
Notifications
You must be signed in to change notification settings - Fork 868
Meeting 2024
Meeting logistics:
- Date: April 24 - 26
- Location:
AMD Austin
7171 Southwest Pkwy, Austin, TX 78735
Meeting Room B500 1A.340.Peony - Some nearby hotels:
The meeting rooms are integrated with MSTeams, there will be separate link for every day to attend the meeting for remote participants. This is a link to a non-public repo for the info (posting links publicly just invites spam; sorry folks).
If you do not have access to the non-public repo, please email Jeff Squyres.
Please put your name down here if you plan to attend.
- Edgar Gabriel (AMD)
- Howard Pritchard (LANL)
- Thomas Naughton (ORNL)
- George Bosilca (NVidia)
- Joseph Schuchart (UTK)
- Kawthar Shafie Khorassani (AMD)
- Manu Shantharam (AMD)
- Luke Robison (AWS)
- Jun Tang (AWS)
- Wenduo Wang (AWS)
- Tommy Janjusic (Nvidia)
The meeting is tentatively scheduled to start on April 24 around 1pm, and is expected to finish on April 26 around lunch time.
Please add Agenda items we need to discuss here.
-
Support for MPI 4.0 compliance (https://github.com/open-mpi/ompi/projects/2)
-
Big count
-
Generate bigcount interfaces for Fortran and C - (https://github.com/open-mpi/ompi/pull/12226)
Jake reviewed this PR
-
Standard ABI generation code and library refactor - (https://github.com/open-mpi/ompi/pull/12003)
Jake reviewed this PR. George pointed out that the 'c' code now looks very similar to the 'c' code for f77 entry points.
-
COLL framework needs work to support MPI Bigcount - (https://github.com/open-mpi/ompi/pull/12478)
Group consensus was to go with extending mca_coll_base_comm_coll_t to add '_c' entry points for functions that involve vector arguments of count/displacements. There are some issues with datatype convertor functions for big count. Discussed having a partial PR which only changes signatures of non v/w collectives.
Would like to get big count feature into the major release. Discussed possible ABI break concerns. Consensus is that it should not break ABI backward compatibility.
-
Big Count Collective Test Suite: (https://github.com/open-mpi/ompi-tests-public/pull/15)
these tests were merged into the public tests repo quite a while ago
- Related datatype PR - (https://github.com/open-mpi/ompi/pull/12351)
- ROMIO refresh
-
-
MPI_T events
- (https://github.com/open-mpi/ompi/pull/8057) How important is this feature? Joseph will ping ScoreP folks to see if they are planning MPI_T events.
-
-
Support for MPI 4.1 compliance (https://github.com/open-mpi/ompi/projects/4)
- Memory kind info objects
Edgar presents some slides summarizing this MPI 4.1 feature. Discussion of mpi_assert_memory_alloc_kinds. How would we actually use this with Open MPI? Slides also show work items. Some work required in prrte. Discuss lazy init of cuda, etc. We may not be able to do that much optimization based on memory kinds anyway out side of ptr check. Some discussion of the complications of restrictors and how that may complicate actually using this kind info for Open MPI internally. Also Open MPI can be configured with multiple device type support. Do we need to support different device types concurrently? Accelerator framework currently not set up to deal with this, it allows only one component to be active. Discuss multiple devices of a single type. Right now cuda and rocm not making use of APIs that use device IDs, but could, at least for cuda. Maybe items for a 6.0 release?
-
Support for MPI 4.2(?) ABI (https://github.com/mpi-forum/mpi-issues/issues/751)
- Operative question: when is MPI v4.2 expected to be ratified?
Consensus was that Forum was probably being optimistic to think v4.2 could be turned around in a year. Don't think a release by SC24 is realistic.
-
related PR (https://github.com/open-mpi/ompi/pull/12033)
Jake reviewed this PR. George pointed out that the 'c' code now looks very similar to the 'c' code for f77 entry points. Howard explained current way we would support ABI version and 'native' Open MPI version. Should we just switch to ABI - only George suggested? Maybe a native version is more important to MPICH community? Or maybe we have the same ISV support issues too?
-
Collective Operations
- xhc/shared memory collectives
- GPU collectives
- Collective configuration file
- Memory allocation caching
Could we combine/migrate some of adapt algorithms to libnbc? No not really, they use different approaches to non-blocking collectives. Coll framework has many components - can we possibly remove some of them? sm?_
AWS focusing on optimizing collectives for EFA libfabric provider. Using MTL. Focus on HAN optimization - alltoall/alltoallv. Focus on Tuned/base - allreduce, allgather, reduce. Also working on a selection algorithm. Also considering a decision file based approach. PRs open for many of these.
Quite a few PRs open right now for various collective algorithms: XHC, smdirect, acoll, coll/am. Should we start merging some of these in? Do we need all of these? For example, smdirect and acoll seem very similar in terms of functionality.
Lots of discussion about selection and priority of components.
Discuss whether to merge in https://github.com/open-mpi/ompi/pull/11418. George will ping PR author to see about level of commitment and if author or his org will support, go ahead with merge.
Agree to merge in the acoll PR https://github.com/open-mpi/ompi/pull/12484 once it passes CI.
Could use an easy to report way to determine what component/algorithm is being used for a collective op - maybe targeted for debugging case.
Do we still need smdirect PR - https://github.com/open-mpi/ompi/pull/10470? (decided to leave open for now so it can be salvaged).
Agree to remove coll/sm component.
-
Accelerator support
- shared memory plans for 5.1 and beyond
- one-sided operations
IPC support in accelerators for 5.1. In main no components outside of accelerator framework make cuda calls. We do need IPC support in accelerator/cuda component.
GMCA parameter support? PMIx may have something similar. Idea would be to change priorities for accelerator related components without having to set multiple MCA parameters.
Joseph working on PR 12356 - https://github.com/open-mpi/ompi/pull/12356 and 12318 - https://github.com/open-mpi/ompi/pull/12318 related to accelerator support.
Discuss implications of RDMABUF method of memory registration. Currently this is being used within some Libfabric providers and UCX. At this point it does not appear that we need to handle dmabuf registration within Open MPI itself. This might change if network providers require using dmabuf methods for memory registration.
Did not discuss one-sided operations.
-
PRRTe future topic
Jeff reviews where we are at with this. Idea is to switch over to a forked/modified slightly PRRTE for a 6.0.x branch/release. Ralph points out changing launcher (even if just switch to using a PRRTE fork) would be ill advised for the 5.0.x or even 5.1.x branch as packagers will be very unhappy. They only want to do packaging changes at major version number changes. So we would only do this if absolutely necessary. Discussion of debugger support. Ralph thinks this should be okay as these tools use(or should be using) the PMIx tools interface.
-
Review previously-created wiki pages for 5.1.x and 6.0.x in the context of planning for Open MPI vNEXT
- These were made a long time ago; it would probably be good to re-evaluate, see which items are realistic, which will actually happen, etc. Timing / version numbers may change / consolidate, too, if we re-integrate PRRTE for v6.0.x (e.g., is doing a v5.1.x worth it at all?).
- Proposed v6.0.x feature list
Following PRRTE discussion it was decided that releasing a 5.1.x is unlikely and so discussed features targeting a 6.0.x release.
-
What to do about SLURM?
- See https://github.com/open-mpi/ompi/issues/12471
- Ralph can attend via dialup to help with this discussion
The problem starts with SLURM release 23.11. SchedMD made changes both to the environment variables describing job id, etc. that impact the PRRTE RAS system's discovery mechanism. Ralph has been engaging SchedMD about ways to fix this recurring problem. Current plan is SchedMD is going to supply a supplementary library that PRRTE ras and plm components can use for allocation discovery and daemon launch.
-
For OFI group
- Adopt libfabric 2.0 API?
way too soon to worry about this.
- Adopt dma-buf API
Discussed. At this point this is being handled internally in UCX and OFI providers of interest - certainly for CUDA, ROCM, and ZE devices now. May need to reconsider if long term path is to require OFI?/UCX consumers to manage rdmabuf registrations.
- mtl/ofi vs. btl/ofi performance differences
Edgar looking at the CXI provider and noticing the BTL/OFI pt2pt inter-node performance.
-
Misc
- MPI_Info_set handling https://github.com/open-mpi/ompi/pull/11823
- What is the bar for merging something into main ? Just a successful CI pass ? What if there are complains from the rest of the community ? What is the solution is known to be partial and incomplete ?
- Should we enable better downstream build pipeline security for those downloading from open-mpi.org?
- For v5.0.x, we have md5, sha1, and sha256 checksums in the HTML on the download page.
- Should we have these values in (more easily) machine-readable formats somewhere?
- Should we be to cryptographically signing releases somehow? (tarballs do not support signatures)
- What do others do (e.g., GNU projects)?
-
Action items
- Joseph will ping ScoreP folks about interest in MPI_T events. (DONE). Marc Andre says on the list for ScoreP. Tau using it and some unreleased version of MPI Advisor.
- George will ping author of PR 11418 about level of commitment to determine whether to merge this PR into main.
- Howard will work with Edgar on OFI BTL performance issues.