-
Notifications
You must be signed in to change notification settings - Fork 861
WeeklyTelcon_20230801
Geoffrey Paulsen edited this page Aug 8, 2023
·
1 revision
Small Meeting today. Missing 4x and 5x release managers. Primary discussion was brought up by Howard and Edgar:
Howard: Testing PR 11689 currently
Edgar: Issue 11818 Error handler type (https://github.com/open-mpi/ompi/pull/11818)
- Does this PR need backport?
- Wenduo confirmed it was done, added comment.
Edgar: https://github.com/open-mpi/ompi/issues/11831
- main and 5.0 are both affected.
- We can probably undo the PR in 5.0, because col-cuda is always compiled in main, but not in 5.0.
- Need to find a fix that causes cuda_delayed_init to properly get out of the way.
Edgar has a technical question: Can we have a new SM component for OFI?
- Libfabric SM component supports CUDA/ROCm/Intel devices
- motivated by https://github.com/open-mpi/ompi/pull/10959
- Howard sees it could help Frontier users, might be interested in supplying an intern to assist.
- General agreement that having a libfabric SM component might be an efficient path to supporting the various SM paths.
- This could be a 5.1 feature request, needs further technical investigation and discussion with corp management.