-
Notifications
You must be signed in to change notification settings - Fork 846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can nccl dynamically add/remove GPU workers? #1543
Comments
+1 friendly ping @sjeaugey |
like ncclCommSplit or need more flexible interface? |
Right now, the way to add/remove GPUs is to create a new group with more/less ranks. ncclCommSplit can be used for that, or simply ncclCommInitRank. |
Will NCCL support dynamic feature(without creating a new group) in the future? |
It's something we've been thinking about, but it's a pretty complex thing to do, and whether it would be significantly faster than re-creating a communicator is unclear. It's not as simple as it seems; as many things are pre-computed assuming a given set of GPUs and adding/removing ranks would require to recompute a lot of things ... maybe almost everything. |
Hi,
I would like to know if NCCL supports dynamic GPU worker add/removal?
BR
The text was updated successfully, but these errors were encountered: