-
Notifications
You must be signed in to change notification settings - Fork 861
6.0.x Feature List
Edgar Gabriel edited this page Nov 13, 2024
·
41 revisions
Target date - end CY24.
When should we plan to cut the 6.0.x branch? As late as possible, unless we are blocking 7.0 changes (ABI).
Strike through means feature is complete and committed to Open MPI main branch.
- Extended Accelerator API:
CUDA support for IPC
- Reduction op (and others) offload support (Joseph)
- Collectives:
Merge XHC if they can commit to supporting it.Merge acoll once it passes CIsmdirect won't be merged, salvage for parts.- propose JSON format for tuning file
Remove coll/sm (tuned is OK fallback, XHC/acoll coming soon)Performance testing of Luke's han alltoall pr with UCX.
- Remove:
GNI BTLudredge_rcacheRemove pvfs2 components
- Big Count:
Collective embiggening Phase 1 (everything except*v
*w
collectives)
Collective embiggening Phase 2 (*v
*w
collectives)
- Switch over to forked PRRTe Phase 1
- Documentation Changes
- Remove Remove prte binaries (Univ. Louisville)
- Remove --with-prte configure option from ompi (Univ. Louisville)
- Some MCAs (Univ. Louisville/rhc54)
- Big Count:
- API-level function generation (PR open and ready for review)
- Memory Kind support:
- Add memory-kind option
- Return supported memory kinds
- ROMIO Refresh
- Remove:
- Remove use TKR in MPI module for Fortran (old NAG compiler complicates things)
- Phase 2 PRRTE
- MCA parameters move into ompi namespace.
- prte_info is gone, move those to ompi_info, perhaps a prte-mca option?
- BTL Self accelerator aware (probably defer to later release)
- If Jacob's ABI work is ready, it might help solidify the standard to have our implementation done.
- Merge ABI work into main, enable it only when requested, and stress in documentation it is experimental.
- Big count support
- API level functions (in progress 1-2 months)(DONE PR OPEN)
- Collective embiggening (discussed at F2F, stage in none v,w functions first) (DONE)
- Changes to datatype engine/combiner support (could be a challenge)
- ROMIO refresh
- Embiggen man pages and other documentation
- Remove hcol component? (its API doesn't support big count and its been superseded by UCC)
- PRRTE switch Phase 1
- MPI_T events (probably won't do for 6.0.x).
- extended accelerator API functionality (IPC) and conversion of the last components to use accelerator API (DONE for ROCM and CUDA, not ZE).
- level zero (ze) accelerator component (DONE basic support, IPC not implemented, Howard)
- support for MPI 4.1 memory kinds info object (assume we have PRRTE move, 1 month for basic support)
- reduction op (and others) offload support (Joseph estimates 1-2 months to get in)
- SMSC accelerator (Edgar - not sure yet about this one for 6.0.x)
- Stream-aware datatype engine.
- Datatype engine accelerator awareness(e.g. memcpy2d) (George).
What about smart pointers? Probably could not get this in to a 6.0.x.
- implement memory allocation kind info. (see above for accelerator features)
- GNI BTL - no longer have access to systems to support this (Howard) (DONE)
- UDREG Rcache - no longer have access to systems that can use this (Howard) (DONE)
- FS/PVFS2 an FBTL/PVFS2 - no longer have access to systems to support this (Edgar) (DONE)
- coll/sm (DONE)
- Remove TKR version of
use mpi
module. (Howard)- This was deferred from 4.0.x because in April/May 2018 (and then deferred again from v5.0.x in October 2018), it was discovered that:
- The RHEL 7.x default gcc (4.8.5) still uses the TKR
mpi
module - The NAG compiler still uses the TKR
mpi
module.
- The RHEL 7.x default gcc (4.8.5) still uses the TKR
- This was deferred from 4.0.x because in April/May 2018 (and then deferred again from v5.0.x in October 2018), it was discovered that:
- mca/coll: blocking reduction on accelerator (this is discussed above, Joseph)
- mca/coll: hierarchical MPI_Alltoall(v), MPI_Gatherv, MPI_Scatterv. (various orgs working on this)
- mca/coll: new algorithms (various orgs working on this)
There are quite a few open PRs related to collectives. Can some of these get merged? See notes from 2024 F2F Meeting
Sessions - add support for UCX PML (Howard, 2-3 weeks)- Sessions - various small fixes (Howard, 1 month)
- ZE support for IPC (maybe)
- Atomics - can we just rely on C11 and remove some of this code? We are currently using gcc atomics for performance reasons. Joseph would like to have a wrapper for atomic types and direct load/store access.