Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About offload stage3 source code learning problems #6735

Open
lzy-edu opened this issue Nov 9, 2024 · 3 comments
Open

About offload stage3 source code learning problems #6735

lzy-edu opened this issue Nov 9, 2024 · 3 comments
Assignees

Comments

@lzy-edu
Copy link

lzy-edu commented Nov 9, 2024

Hello everyone, I want to learn the stage3 part of zerooffload when learning the source code of deepspeed, but I can't find the scheduling process code of the gradient between cpu and gpu, please help me if you know

@jomayeri
Copy link
Contributor

The gradients are not offloaded in ZeRO. The only parts that can be offloaded are the optimizer states and the parameters.

@jomayeri jomayeri self-assigned this Nov 14, 2024
@tjruwase
Copy link
Contributor

@lzy-edu, see

def partition_grads(self, params_to_release: List[Parameter], grad_partitions: List[Tensor]) -> None:

@lzy-edu
Copy link
Author

lzy-edu commented Nov 19, 2024

@lzy-edu, see

DeepSpeed/deepspeed/runtime/zero/stage3.py

Line 1463 in fc4e733

def partition_grads(self, params_to_release: List[Parameter], grad_partitions: List[Tensor]) -> None:

Thank you for your answer. I would also like to ask you a question about the initial parameter partitioning. When I enabled zerooffload3 during initialization, will all parameters be unloaded to the cpu first? Where is this part

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants