CUDA: support for lazy init #758

Sergei-Lebedev · 2023-03-27T11:16:54Z

What

Lazily initialize TL NCCL and TL CUDA on first CUDA collective.

Why ?

Both NCCL and CUDA require CUDA devices to be set before team create. In MPI workloads it's not always possible since MPI_Init creates UCC team and to set device we need to know rank and local rank.

swx-jenkins3 · 2024-12-07T04:22:17Z

Can one of the admins verify this patch?

Sergei-Lebedev force-pushed the topic/cuda_lazy_init branch from 4e1762e to 706053b Compare March 28, 2023 12:31

Sergei-Lebedev added 3 commits March 29, 2023 15:06

TL/NCCL: support for lazy team init

d0e3ca4

TL/CUDA: enable lazy init

915c5ac

TL/CUDA: check devices

26c36ac

Sergei-Lebedev force-pushed the topic/cuda_lazy_init branch from 706053b to 26c36ac Compare March 29, 2023 11:06

Sergei-Lebedev mentioned this pull request Oct 4, 2023

TL/NCCL: lazy init nccl comm #851

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA: support for lazy init #758

CUDA: support for lazy init #758

Uh oh!

Sergei-Lebedev commented Mar 27, 2023

Uh oh!

swx-jenkins3 commented Dec 7, 2024

Uh oh!

Uh oh!

CUDA: support for lazy init #758

Are you sure you want to change the base?

CUDA: support for lazy init #758

Uh oh!

Conversation

Sergei-Lebedev commented Mar 27, 2023

What

Why ?

Uh oh!

swx-jenkins3 commented Dec 7, 2024

Uh oh!

Uh oh!