[Bug] The clip_grad_norm of xla fsdp is not right #3180

hanwen-sun · 2024-10-21T03:02:25Z

System Info

Use transformers after this commit https://github.com/huggingface/transformers/issues/34176.
You can refer to the discussion in this issue.

Information

The official example scripts
My own modified scripts

Tasks

One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
My own task or dataset (give details below)

Reproduction

We can't correctly use clip_grad_norm_ in accelerator for xla fsdp. We now check the env arg "accelerator use fsdp" in clip_grad_norm_:

accelerate/src/accelerate/accelerator.py

Line 2375 in a84327e

if os.environ.get("ACCELERATE_USE_FSDP", "false") == "true":

. However, accelerator donot integrate fsdp, so we should not set this env arg to true.I want to know how we can modify the code to make it correct.

Expected behavior

Modify the clip_grad_norm_ to make it correct for xla fsdp.Is there any plan to integrate XLA FSDP for the accelerator? I believe this would be a better approach to solving the problem, and it would work for any scenario that uses the accelerator.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] The clip_grad_norm of xla fsdp is not right #3180

[Bug] The clip_grad_norm of xla fsdp is not right #3180

hanwen-sun commented Oct 21, 2024

[Bug] The clip_grad_norm of xla fsdp is not right #3180

[Bug] The clip_grad_norm of xla fsdp is not right #3180

Comments

hanwen-sun commented Oct 21, 2024

System Info

Information

Tasks

Reproduction

Expected behavior