cuda, rc Bandwidth fluctuates regularly #10164

yangrudan · 2024-09-20T09:03:42Z

Describe the bug

When I run ucx_perftest in two nodes, the bandwidth fluctuated regularly.

+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |       overhead (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|    Stage     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
[thread 0]                33      0.588 440383.181 440383.181     2325.25    2325.25           2           2
[thread 0]                53      0.658 54057.252 294599.812    18942.88    3475.90          18           3
[thread 0]                65  43810.296 183909.575 274164.691     5567.95    3734.98           5           4
[thread 0]                87  48500.309 47570.272 216864.953    21526.05    4721.83          21           5
[thread 0]                97  48503.238 193600.011 214466.505     5289.26    4774.64           5           5
[thread 0]               118  48505.271 47841.231 184812.855    21404.13    5540.74          21           5
[thread 0]               129  48513.199 178729.274 184294.100     5729.34    5556.34           6           5
[thread 0]               151  48513.988 47579.408 164375.403    21521.92    6229.64          21           6
[thread 0]               161  48513.988 192064.810 166095.242     5331.53    6165.14           5           6
[thread 0]               182  48516.302 48136.813 152484.654    21272.70    6715.43          21           7
[thread 0]               193  48519.674 178550.265 153970.259     5735.08    6650.64           6           6
[thread 0]               215  48520.655 47590.321 143084.870    21516.98    7156.59          21           7

Steps to Reproduce

./ucx_perftest -s 1073741824 -t tag_bw -m cuda -p 12345
./ucx_perftest 172.16.4.4 -s 1073741824 -t tag_bw -m cuda -p 12345
ucx_info -v

# Library version: 1.18.0
# Library path: /root/yangrudan/ucx/build/../out/lib/libucs.so.0
# API headers version: 1.18.0
# Git branch 'master', revision 6a87bb1
# Configured with: --prefix=/root/yangrudan/ucx/build/../out --enable-compiler-opt=0 --with-cuda=/usr/local/cuda --with-verbs --with-dm --with-rdmacm --enable-mt=yes --with-rc --with-mlx5-dv --with-go=no

env variables: UCX_NET_DEVICES=mlx5_cx6_0:1 UCX_TLS=cuda,rc

Setup and versions

OS version
Linux NH-DC-NM129-I06-12U-GPU-246 5.4.0-193-generic Ubuntu SMP Fri Aug 2 19:14:16 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
For RDMA/IB/RoCE related issues:
- Driver version:
  MLNX_OFED_LINUX-5.8-3.0.7.0:
For GPU related issues:
- GPU type
- Cuda:
  - NVIDIA-SMI 545.23.08 Driver Version: 545.23.08 CUDA Version: 12.3

root@NH-DC-NM129-I06-12U-GPU-246:~/yangrudan/ucx/out/bin# lsmod|grep peer
nvidia_peermem         16384  0
ib_core               348160  9 rdma_cm,ib_ipoib,nvidia_peermem,iw_cm,ib_umad,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm
nvidia              56143872  354 nvidia_uvm,nvidia_peermem,nvidia_modeset

The text was updated successfully, but these errors were encountered:

yosefe · 2024-09-22T11:17:23Z

@yangrudan does it happen with a smaller message size (for example, 4 MB)?
does it happen if removing the "--enable-compiler-opt=0 " flag from configure?

yangrudan · 2024-09-23T01:43:46Z

@yangrudan does it happen with a smaller message size (for example, 4 MB)? does it happen if removing the "--enable-compiler-opt=0 " flag from configure?

It seems that the speed is uniform at 4MB, what's the reason for that?

The UCX_PROTO_INFO is as below:

yosefe · 2024-09-23T08:14:09Z

@yangrudan

Is there a similar flunctuation with 1GB host memory?
can you pls try setting "UCX_RNDV_SCHEME=get_zcopy" to see if it helps to resolve?
can you pls post the output of "nvidia-smi topo -m" ?

yangrudan · 2024-09-23T08:26:09Z

@yangrudan

Is there a similar flunctuation with 1GB host memory?

can you pls try setting "UCX_RNDV_SCHEME=get_zcopy" to see if it helps to resolve?

can you pls post the output of "nvidia-smi topo -m" ?

1. host memory doesn't have flunctuation;
1. setting "UCX_RNDV_SCHEME=get_zcopy" has help, but bandwidth is lower than expect;
1. server's and client's topo is as follow;

#server=======================================================================
root@NH-DC-NM129-I05-20U-GPU-242:~/yangrudan/github_ucx_build/bin# nvidia-smi topo -m
        GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7    NIC0    NIC1    NIC2    NIC3    NIC4    NIC5    NIC6    NIC7    NIC8    NIC9    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      NV12    NV12    NV12    NV12    NV12    NV12    NV12    SYS     SYS     SYS     PXB     SYS     SYS     SYS     SYS     SYS     SYS     0-31,64-95      0               N/A
GPU1    NV12     X      NV12    NV12    NV12    NV12    NV12    NV12    SYS     SYS     SYS     PXB     SYS     SYS     SYS     SYS     SYS     SYS     0-31,64-95      0               N/A
GPU2    NV12    NV12     X      NV12    NV12    NV12    NV12    NV12    SYS     SYS     PXB     SYS     SYS     SYS     SYS     SYS     SYS     SYS     0-31,64-95      0               N/A
GPU3    NV12    NV12    NV12     X      NV12    NV12    NV12    NV12    SYS     SYS     PXB     SYS     SYS     SYS     SYS     SYS     SYS     SYS     0-31,64-95      0               N/A
GPU4    NV12    NV12    NV12    NV12     X      NV12    NV12    NV12    SYS     PXB     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     32-63,96-127    1               N/A
GPU5    NV12    NV12    NV12    NV12    NV12     X      NV12    NV12    SYS     PXB     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     32-63,96-127    1               N/A
GPU6    NV12    NV12    NV12    NV12    NV12    NV12     X      NV12    PXB     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     32-63,96-127    1               N/A
GPU7    NV12    NV12    NV12    NV12    NV12    NV12    NV12     X      PXB     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     32-63,96-127    1               N/A
NIC0    SYS     SYS     SYS     SYS     SYS     SYS     PXB     PXB      X      SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS
NIC1    SYS     SYS     SYS     SYS     PXB     PXB     SYS     SYS     SYS      X      SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS
NIC2    SYS     SYS     PXB     PXB     SYS     SYS     SYS     SYS     SYS     SYS      X      SYS     SYS     SYS     SYS     SYS     SYS     SYS
NIC3    PXB     PXB     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS      X      SYS     SYS     SYS     SYS     SYS     SYS
NIC4    SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS      X      PIX     PHB     PHB     SYS     SYS
NIC5    SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     PIX      X      PHB     PHB     SYS     SYS
NIC6    SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     PHB     PHB      X      PIX     SYS     SYS
NIC7    SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     PHB     PHB     PIX      X      SYS     SYS
NIC8    SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS      X      PIX
NIC9    SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     PIX      X 

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx5_cx6_0
  NIC1: mlx5_cx6_1
  NIC2: mlx5_cx6_2
  NIC3: mlx5_cx6_3
  NIC4: mlx5_cx4lx_0
  NIC5: mlx5_cx4lx_1
  NIC6: mlx5_cx4lx_2
  NIC7: mlx5_cx4lx_3
  NIC8: mlx5_cx4lx_4
  NIC9: mlx5_cx4lx_5


#client================================================================
root@NH-DC-NM129-I06-12U-GPU-246:~/yangrudan/ucx/out/bin# nvidia-smi topo -m
        GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7    NIC0    NIC1    NIC2    NIC3    NIC4    NIC5    NIC6       NIC7    NIC8    NIC9    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      NV12    NV12    NV12    NV12    NV12    NV12    NV12    SYS     SYS     SYS     PXB     SYS     SYS     SYSSYS     SYS     SYS     0-31,64-95      0               N/A
GPU1    NV12     X      NV12    NV12    NV12    NV12    NV12    NV12    SYS     SYS     SYS     PXB     SYS     SYS     SYSSYS     SYS     SYS     0-31,64-95      0               N/A
GPU2    NV12    NV12     X      NV12    NV12    NV12    NV12    NV12    SYS     SYS     PXB     SYS     SYS     SYS     SYSSYS     SYS     SYS     0-31,64-95      0               N/A
GPU3    NV12    NV12    NV12     X      NV12    NV12    NV12    NV12    SYS     SYS     PXB     SYS     SYS     SYS     SYSSYS     SYS     SYS     0-31,64-95      0               N/A
GPU4    NV12    NV12    NV12    NV12     X      NV12    NV12    NV12    SYS     PXB     SYS     SYS     SYS     SYS     SYSSYS     SYS     SYS     32-63,96-127    1               N/A
GPU5    NV12    NV12    NV12    NV12    NV12     X      NV12    NV12    SYS     PXB     SYS     SYS     SYS     SYS     SYSSYS     SYS     SYS     32-63,96-127    1               N/A
GPU6    NV12    NV12    NV12    NV12    NV12    NV12     X      NV12    PXB     SYS     SYS     SYS     SYS     SYS     SYSSYS     SYS     SYS     32-63,96-127    1               N/A
GPU7    NV12    NV12    NV12    NV12    NV12    NV12    NV12     X      PXB     SYS     SYS     SYS     SYS     SYS     SYSSYS     SYS     SYS     32-63,96-127    1               N/A
NIC0    SYS     SYS     SYS     SYS     SYS     SYS     PXB     PXB      X      SYS     SYS     SYS     SYS     SYS     SYSSYS     SYS     SYS
NIC1    SYS     SYS     SYS     SYS     PXB     PXB     SYS     SYS     SYS      X      SYS     SYS     SYS     SYS     SYSSYS     SYS     SYS
NIC2    SYS     SYS     PXB     PXB     SYS     SYS     SYS     SYS     SYS     SYS      X      SYS     SYS     SYS     SYSSYS     SYS     SYS
NIC3    PXB     PXB     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS      X      SYS     SYS     SYSSYS     SYS     SYS
NIC4    SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS      X      PIX     PHBPHB     SYS     SYS
NIC5    SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     PIX      X      PHBPHB     SYS     SYS
NIC6    SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     PHB     PHB      X PIX     SYS     SYS
NIC7    SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     PHB     PHB     PIX X      SYS     SYS
NIC8    SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYSSYS      X      PIX
NIC9    SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYSSYS     PIX      X 

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx5_cx6_0
  NIC1: mlx5_cx6_1
  NIC2: mlx5_cx6_2
  NIC3: mlx5_cx6_3
  NIC4: mlx5_cx4lx_0
  NIC5: mlx5_cx4lx_1
  NIC6: mlx5_cx4lx_2
  NIC7: mlx5_cx4lx_3
  NIC8: mlx5_cx4lx_4
  NIC9: mlx5_cx4lx_5

yosefe · 2024-09-23T08:49:40Z

@yangrudan can you pls try setting UCX_NET_DEVICES= mlx5_cx6_3:1 - both with and without UCX_RNDV_SCHEME=get_zcopy?

yangrudan · 2024-09-23T09:09:00Z

@yangrudan can you pls try setting UCX_NET_DEVICES= mlx5_cx6_3:1 - both with and without UCX_RNDV_SCHEME=get_zcopy?

An error occurred as below:

By the way , ping is ok

root@NH-DC-NM129-I06-12U-GPU-246:~/yangrudan/ucx/out/bin# ping 172.16.4.1
PING 172.16.4.1 (172.16.4.1) 56(84) bytes of data.
64 bytes from 172.16.4.1: icmp_seq=1 ttl=61 time=0.215 ms
64 bytes from 172.16.4.1: icmp_seq=2 ttl=61 time=0.115 ms
^C
--- 172.16.4.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1030ms
rtt min/avg/max/mdev = 0.115/0.165/0.215/0.050 ms

yosefe · 2024-09-23T09:30:26Z

@yangrudan seems some issue with ip/routing config, can you pls try ping from a specific interface (add -I ens24np0)?

yangrudan · 2024-09-23T09:38:39Z

@yangrudan seems some issue with ip/routing config, can you pls try ping from a specific interface (add -I ens24np0)?

It seems like doesn't work. And I just add -I in client side.

yosefe · 2024-09-23T09:41:16Z

sorry, i've meant to try ping with specific interface, something like:
root@NH-DC-NM129-I06-12U-GPU-246:~/yangrudan/ucx/out/bin# ping -I ens24np0 172.16.4.1

yangrudan · 2024-09-23T09:42:37Z

sorry, i've meant to try ping with specific interface, something like: root@NH-DC-NM129-I06-12U-GPU-246:~/yangrudan/ucx/out/bin# ping -I ens24np0 172.16.4.1

Sorry, I misunderstood

yosefe · 2024-09-23T09:43:37Z

The first ping command fails which shows some issue with reaching from mlx5_cx6_3 on one server to mlx5_cx6_3 on the other server, can you pls check the network config?

yangrudan · 2024-09-23T09:48:53Z

I don't know much about network configuration. Could it be that the IP addresses of mlx5_cx6_3 of the two servers are not in the same subnet?

yangrudan · 2024-09-24T08:57:14Z

Maybe the net config's quesion. Close this issue.

yosefe · 2024-09-24T08:59:25Z

I don't know much about network configuration. Could it be that the IP addresses of mlx5_cx6_3 of the two servers are not in the same subnet?

Yes, it seems the reason these devices are not reachable, anyway in order to get good GPU memory performance for GPU0, then according to nvidia-smi topology output, mlx5_cx6_3 device should be used

yangrudan · 2024-09-24T16:21:39Z

I don't know much about network configuration. Could it be that the IP addresses of mlx5_cx6_3 of the two servers are not in the same subnet?

Yes, it seems the reason these devices are not reachable, anyway in order to get good GPU memory performance for GPU0, then according to nvidia-smi topology output, mlx5_cx6_3 device should be used

Thank you very much for your patients.😊

yangrudan added the Bug label Sep 20, 2024

yangrudan closed this as completed Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuda, rc Bandwidth fluctuates regularly #10164

cuda, rc Bandwidth fluctuates regularly #10164

yangrudan commented Sep 20, 2024

yosefe commented Sep 22, 2024

yangrudan commented Sep 23, 2024

yosefe commented Sep 23, 2024

yangrudan commented Sep 23, 2024

yosefe commented Sep 23, 2024

yangrudan commented Sep 23, 2024 •

edited

Loading

yosefe commented Sep 23, 2024 •

edited

Loading

yangrudan commented Sep 23, 2024 •

edited

Loading

yosefe commented Sep 23, 2024

yangrudan commented Sep 23, 2024

yosefe commented Sep 23, 2024

yangrudan commented Sep 23, 2024 •

edited

Loading

yangrudan commented Sep 24, 2024

yosefe commented Sep 24, 2024

yangrudan commented Sep 24, 2024

cuda, rc Bandwidth fluctuates regularly #10164

cuda, rc Bandwidth fluctuates regularly #10164

Comments

yangrudan commented Sep 20, 2024

Describe the bug

Steps to Reproduce

Setup and versions

yosefe commented Sep 22, 2024

yangrudan commented Sep 23, 2024

yosefe commented Sep 23, 2024

yangrudan commented Sep 23, 2024

yosefe commented Sep 23, 2024

yangrudan commented Sep 23, 2024 • edited Loading

yosefe commented Sep 23, 2024 • edited Loading

yangrudan commented Sep 23, 2024 • edited Loading

yosefe commented Sep 23, 2024

yangrudan commented Sep 23, 2024

yosefe commented Sep 23, 2024

yangrudan commented Sep 23, 2024 • edited Loading

yangrudan commented Sep 24, 2024

yosefe commented Sep 24, 2024

yangrudan commented Sep 24, 2024

yangrudan commented Sep 23, 2024 •

edited

Loading

yosefe commented Sep 23, 2024 •

edited

Loading

yangrudan commented Sep 23, 2024 •

edited

Loading

yangrudan commented Sep 23, 2024 •

edited

Loading