How to use nvlink in ucx #9896

lijh5 · 2024-05-24T02:10:09Z

lijh5
May 24, 2024

May I ask if ucx uses cuda_ipc for intra node GPU communication? There are two transmission methods: get_copy and put_copy. There are two issues with this:

Why only check nvlinks (struct_cuda_ipc_get_device_cvlinks) when get_zcopy, and not when put_zcopy?
Although put_zcopy does not check nvlinks, if the device has nvlinks, will put_zcopy also use nvlinks for data transfer?
Does cuda_ipc ultimately call cuMemcpyDtoDASync to copy data between GPUs? Will this function automatically use nvlink?

Thank you for your assistance and guidance!

Answered by Akshay-Venkatesh

May 24, 2024

May I ask if ucx uses cuda_ipc for intra node GPU communication

Yes all intra-node inter-GPU communication as long as CUDA allows mapping peer GPU's memory between the two GPUs. On most PCIe and all NVLINK-connected systems, this is true. You can use nvidia-smi topo -m to check reachability between GPUs.

Why only check nvlinks (struct_cuda_ipc_get_device_cvlinks) when get_zcopy, and not when put_zcopy?

On many systems which have cuda-ipc realized over PCIe instead of NVLINK, get performance is slower than puts so we want to force protocols layer to use only puts under such circumstances. So when checking if get_zcopy operations should be used, we check if NVLINKs are present and if no…

View full answer

Akshay-Venkatesh · 2024-05-24T21:50:53Z

Akshay-Venkatesh
May 24, 2024
Collaborator

May I ask if ucx uses cuda_ipc for intra node GPU communication

Yes all intra-node inter-GPU communication as long as CUDA allows mapping peer GPU's memory between the two GPUs. On most PCIe and all NVLINK-connected systems, this is true. You can use nvidia-smi topo -m to check reachability between GPUs.

Why only check nvlinks (struct_cuda_ipc_get_device_cvlinks) when get_zcopy, and not when put_zcopy?

On many systems which have cuda-ipc realized over PCIe instead of NVLINK, get performance is slower than puts so we want to force protocols layer to use only puts under such circumstances. So when checking if get_zcopy operations should be used, we check if NVLINKs are present and if not, we effectively disable get operations. Put should be used regardless of nvlink or pcie for all cases where cuda-ipc is eligible.

Although put_zcopy does not check nvlinks, if the device has nvlinks, will put_zcopy also use nvlinks for data transfer?

Yes. They are used unconditionally.

Does cuda_ipc ultimately call cuMemcpyDtoDASync to copy data between GPUs? Will this function automatically use nvlink?

Yes and yes.

1 reply

lijh5 May 26, 2024
Author

First of all, thank you very much for your patient answer!
I have another question about get and put:
For example, intra node, data transmission occurs between two GPUs, src: GPU0 and dst: GPU1
If get_zcopy is used, is gpu1 reading the data from gpu0?
If using put_zcopy, is gpu0 writing data to gpu1?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use nvlink in ucx #9896

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

How to use nvlink in ucx #9896

lijh5 May 24, 2024

Replies: 1 comment · 1 reply

Akshay-Venkatesh May 24, 2024 Collaborator

lijh5 May 26, 2024 Author

lijh5
May 24, 2024

Replies: 1 comment 1 reply

Akshay-Venkatesh
May 24, 2024
Collaborator

lijh5 May 26, 2024
Author