Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xpmem: building with xpmem causes regression #10403

Open
jiaxiyan opened this issue Sep 19, 2024 · 0 comments
Open

xpmem: building with xpmem causes regression #10403

jiaxiyan opened this issue Sep 19, 2024 · 0 comments

Comments

@jiaxiyan
Copy link
Contributor

Describe the bug
We built libfabric with --enable-xpmem but did not load the xpmem kernel module. We observed a performance regression on small message sizes compared to not building with xpmem when running Intel MPI Benchmark Alltoall with Open MPI5.

To Reproduce

/openmpi5/bin/mpirun --wdir . -n 1024 --hostfile hostfile --map-by ppr:64:node --timeout 1800 -x OMPI_MCA_accelerator=null -x FI_EFA_USE_DEVICE_RDMA=1 -x LD_LIBRARY_PATH=/build/libraries/libfabric/main/install/libfabric/lib -x PATH  /build/workloads/imb/openmpi-v5.0.3-installer/source/mpi-benchmarks-IMB-v2021.7/IMB-MPI1 Alltoall -npmin 1024 -iter 200 -time 20 -mem 1 2>&1 | tee node16-ppn64.txt

Expected behavior
The performance shouldn't be impacted when not loading the xpmem kernel module, whether building libfabric with --enable-xpmem or not

Output
building libfabric without --enable-xpmem


# Calling sequence was:

# /build/workloads/imb/openmpi-v5.0.3-installer/source/mpi-benchmarks-IMB-v2021.7/IMB-MPI1 Alltoall -npmin 1024 -iter 200 -time 20 -mem 1

# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304
#
# MPI_Datatype                   :   MPI_BYTE
# MPI_Datatype for reductions    :   MPI_FLOAT
# MPI_Op                         :   MPI_SUM
#
#

# List of Benchmarks to run:

# Alltoall
#-----------------------------------------------------------------------------
# Benchmarking Alltoall
# #processes = 1024
#-----------------------------------------------------------------------------
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]      defects
            0          200         0.05         0.08         0.05         0.00
            1          200       106.05       146.90       134.23         0.00
            2          200       111.80       181.11       154.34         0.00
            4          200       124.38       185.32       160.79         0.00
            8          200       129.01       182.50       162.60         0.00
           16          200       232.57       394.08       304.96         0.00
           32          200       558.01       985.00       762.65         0.00
           64          200      7093.18     12193.73      9953.92         0.00
          128          200      8769.56     30265.41     24831.23         0.00
          256          200      2822.34     37262.43     27504.94         0.00
          512           21     12156.41     13949.84     12884.10         0.00
         1024           21     14784.92     15321.06     15067.40         0.00
         2048           21     17682.00     18692.46     18319.53         0.00
         4096           21     14576.67     15542.40     15145.84         0.00
         8192           21     21843.53     23535.42     23007.77         0.00
        16384            3     44885.20     46713.80     45756.77         0.00

building libfabric with --enable-xpmem

       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]      defects
            0          200         0.05         0.07         0.05         0.00
            1          200       115.06       148.07       137.58         0.00
            2          200       139.29       176.52       161.56         0.00
            4          200       181.14       215.82       205.05         0.00
            8          200       160.08       198.28       183.84         0.00
           16          200       308.01       431.14       363.63         0.00
           32          200       845.39      1191.64       996.77         0.00
           64          200      8688.64     18961.72     14613.72         0.00
          128          200     14885.64     29020.61     23739.58         0.00
          256          200      6650.37     38164.19     27603.49         0.00
          512           22     11599.52     12864.61     12165.26         0.00
         1024           22     14662.48     15350.84     15017.10         0.00
         2048           22     17599.68     18588.80     18188.19         0.00
         4096           22     14443.24     15390.64     14997.93         0.00
         8192           22     21959.33     23530.77     23043.91         0.00
        16384            3     44656.48     46520.43     45761.54         0.00

The latency increases for message size <= 64 bytes

Environment:
Amazon Linux2, 16 hpc7g.16xlarge

Additional context
Add any other context about the problem here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants