You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OpenMPI 5.0.6 with shm+cxi:lnx fails to perform Device - Device transfers on LUMI system (AMD GPUs) with OSU benchmark. Host - Host transfers work as expected for intra- and inter-node transfers. For Device - Device transfers OpenMPI fails with
export FI_LNX_PROV_LINKS=shm+cxi
mpirun --mca opal_common_ofi_provider_include "shm+cxi:lnx" -np 2 -map-by numa ./osu_bibw -m 131072: D D
# OSU MPI-ROCM Bi-Directional Bandwidth Test v7.4
# Datatype: MPI_CHAR.
# Size Bandwidth (MB/s)
--------------------------------------------------------------------------
Open MPI failed to register your buffer.
This error is fatal, your job will abort
Buffer Type: rocm
Buffer Address: 0x154beaa00000
Buffer Length: 131072
Error: Required key not available (4294967030)
--------------------------------------------------------------------------
@hppritcha identified the problem to be related to #11076. There was a fix for this issue in #12290, but it was not merged to the 5.x branch.
The text was updated successfully, but these errors were encountered:
I will need the help here from @hppritcha and @amirshehataornl (who developed the linkx provider in libfabric), since I am not thaat familiar with this code path. If its as simple as backport PR #12290, then it shouldn't be a challenge.
OpenMPI 5.0.6 with
shm+cxi:lnx
fails to perform Device - Device transfers on LUMI system (AMD GPUs) with OSU benchmark. Host - Host transfers work as expected for intra- and inter-node transfers. For Device - Device transfers OpenMPI fails with@hppritcha identified the problem to be related to #11076. There was a fix for this issue in #12290, but it was not merged to the 5.x branch.
The text was updated successfully, but these errors were encountered: