Unable to access model gradients with DeepSpeed and Accelerate #3184

shouyezhe · 2024-10-22T05:08:19Z

System Info

- `Accelerate` version: 0.34.0
- Platform: Linux-5.15.0-117-generic-x86_64-with-glibc2.17
- `accelerate` bash location: /home/miao/anaconda3/envs/train/bin/accelerate
- Python version: 3.8.20
- Numpy version: 1.24.3
- PyTorch version (GPU?): 2.4.0 (True)
- PyTorch XPU available: False
- PyTorch NPU available: False
- PyTorch MLU available: False
- PyTorch MUSA available: False
- System RAM: 503.55 GB
- GPU type: NVIDIA GeForce RTX 3090
- `Accelerate` default config:
        - compute_environment: LOCAL_MACHINE
        - distributed_type: DEEPSPEED
        - mixed_precision: no
        - use_cpu: False
        - debug: False
        - num_processes: 4
        - machine_rank: 0
        - num_machines: 1
        - gpu_ids: 0,1,2,3
        - rdzv_backend: static
        - same_network: True
        - main_training_function: main
        - enable_cpu_affinity: False
        - downcast_bf16: no
        - tpu_use_cluster: False
        - tpu_use_sudo: False
        - tpu_env: []

Information

The official example scripts
My own modified scripts

Tasks

One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
My own task or dataset (give details below)

Reproduction

When using the official training script for Diffusers (https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image.py) with DeepSpeed and ZeRO-2, I'm trying to save the gradients of the model at each training step. However, I'm encountering difficulties due to the way Accelerate wraps DeepSpeed's operations.

Current Behavior

I modified the code between accelerator.backward(loss) (line 1030) and optimizer.step() (line 1033) as follows:

from deepspeed.utils import safe_get_full_grad
for n, lp in unet.named_parameters():
    # 1. Access the full states
    #  1.1) gradient lookup
    # For zero1 and zero2, gradient lookup must be called after `backward` and before `step`
    # For zero3, gradient lookup must be called after `backward`
    hp_grad = safe_get_full_grad(lp)

Problem

The current implementation of Accelerate wraps both DeepSpeed's backward and step operations into a single accelerator.backward call. This prevents users from accessing the gradients between these two operations, which is necessary for gradient analysis or custom gradient processing.

Suggested Solution

Modify Accelerate's DeepSpeed integration to allow users to access gradients between the backward and step operations. This could be achieved by:

Separating the backward and step operations in Accelerate's DeepSpeed wrapper. By the way, I don't understand why DeepSpeed's backward and step are coupled together.
A temporary solution to access full gradients when using DeepSpeed with Accelerate. I modified the code in accelerate.utils.deepspeed.py line 178 and accelerate.accelerator.py line 2188.

self.engine.backward(loss, **kwargs)
# Deepspeed's `engine.step` performs the following operations:
# - gradient accumulation check
# - gradient clipping
# - optimizer step
# - zero grad
# - checking overflow
# - lr_scheduler step (only if engine.lr_scheduler is not None)
if gradients != None:
	from deepspeed.utils import safe_get_full_grad
	import torch
	with torch.no_grad():
		for n, lp in self.engine.module.named_parameters():
			# 1. Access the full states
			#  1.1) gradient lookup
			# For zero1 and zero2, gradient lookup must be called after `backward` and before `step`
			# For zero3, gradient lookup must be called after `backward`
			if lp.grad is None:
				gradients[n] = safe_get_full_grad(lp)
			else:
				gradients[n] = lp.grad
self.engine.step()
if gradients != None:
	return gradients

if kwargs.get("gradients") != None:
	return self.deepspeed_engine_wrapped.backward(loss, **kwargs)
else:
	self.deepspeed_engine_wrapped.backward(loss)

Finally, I obtained the desired gradients using the following code.

gradients = accelerator.backward(loss, gradients=gradients)

Expected behavior

I should be able to access and save the full gradients of the model parameters at each training step when using DeepSpeed with ZeRO-2.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to access model gradients with DeepSpeed and Accelerate #3184

Unable to access model gradients with DeepSpeed and Accelerate #3184

shouyezhe commented Oct 22, 2024

Unable to access model gradients with DeepSpeed and Accelerate #3184

Unable to access model gradients with DeepSpeed and Accelerate #3184

Comments

shouyezhe commented Oct 22, 2024

System Info

Information

Tasks

Reproduction

Current Behavior

Problem

Suggested Solution

Expected behavior