You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the scenario with two H20 nodes, each equipped with 16 GPU cards and 4 InfiniBand (IB) network cards per node, the bus bandwidth is only reaching 65Gbps.
#1588
Open
PaggyZhang opened this issue
Jan 24, 2025
· 2 comments
Is ACS disabled?
Is GDRDMA enabled?
The NCCL_DEBUG=INFO logs would be helpful.
Using NCCL_ALGO=Tree may confuse things further. I'd suggest NCCL_ALGO=RING if you're just benchmarking two nodes.
Do the perftests like ib_write_bw demonstrate the expected performance on each of the 4 NICs?
mpirun --allow-run-as-root -bind-to none --hostfile hostfile -mca btl_tcp_if_include ens20f0np0 -mca pml ^ucx -mca btl ^openib -mca coll_hcoll_enable 0 -x NCCL_DEBUG=WARN -x NCCL_IBEXT_DISABLE=0 -x NCCL_IB_HCA=mlx5_0:1,mlx5_1:1,mlx5_2:1,mlx5_4:1 -x NCCL_SOCKET_IFNAME=ens20f0np0 -x UCX_TLS=sm,ud -x NCCL_IB_SPLIT_DATA_ON_QPS=0 -x NCCL_IB_QPS_PER_CONNECTION=8 -x CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 -x NCCL_IB_RETRY_CNT=7 -x NCCL_IB_PCI_RELAXED_ORDERING=1 -x NCCL_SHM_DISABLE=0 -x NCCL_MIN_NCHANNELS=8 -x LD_LIBRARY_PATH=/usr/mpi/gcc/openmpi-4.1.7a1/lib64:/usr/local/lib -x NCCL_CROSS_NIC=1 -x NCCL_P2P_LEVEL=NVL -x NCCL_NET_GDR_LEVEL=PHB -x NCCL_ALGO=tree /home/ubuntu/nccl-tests/build/all_reduce_perf -b 1G -e 8G -f 2
The text was updated successfully, but these errors were encountered: