About offload stage3 source code learning problems #6735

lzy-edu · 2024-11-09T08:13:39Z

Hello everyone, I want to learn the stage3 part of zerooffload when learning the source code of deepspeed, but I can't find the scheduling process code of the gradient between cpu and gpu, please help me if you know

jomayeri · 2024-11-14T18:58:20Z

The gradients are not offloaded in ZeRO. The only parts that can be offloaded are the optimizer states and the parameters.

tjruwase · 2024-11-15T22:08:39Z

@lzy-edu, see

DeepSpeed/deepspeed/runtime/zero/stage3.py

Line 1463 in fc4e733

    
           def partition_grads(self, params_to_release: List[Parameter], grad_partitions: List[Tensor]) -> None:

lzy-edu · 2024-11-19T02:07:00Z

@lzy-edu, see

DeepSpeed/deepspeed/runtime/zero/stage3.py

Line 1463 in fc4e733

def partition_grads(self, params_to_release: List[Parameter], grad_partitions: List[Tensor]) -> None:

Thank you for your answer. I would also like to ask you a question about the initial parameter partitioning. When I enabled zerooffload3 during initialization, will all parameters be unloaded to the cpu first? Where is this part

jomayeri self-assigned this Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About offload stage3 source code learning problems #6735

About offload stage3 source code learning problems #6735

lzy-edu commented Nov 9, 2024

jomayeri commented Nov 14, 2024

tjruwase commented Nov 15, 2024

lzy-edu commented Nov 19, 2024

About offload stage3 source code learning problems #6735

About offload stage3 source code learning problems #6735

Comments

lzy-edu commented Nov 9, 2024

jomayeri commented Nov 14, 2024

tjruwase commented Nov 15, 2024

lzy-edu commented Nov 19, 2024