Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] The clip_grad_norm of xla fsdp is not right #3180

Open
4 tasks
hanwen-sun opened this issue Oct 21, 2024 · 0 comments
Open
4 tasks

[Bug] The clip_grad_norm of xla fsdp is not right #3180

hanwen-sun opened this issue Oct 21, 2024 · 0 comments

Comments

@hanwen-sun
Copy link
Contributor

System Info

Use transformers after this commit https://github.com/huggingface/transformers/issues/34176.
You can refer to the discussion in this issue.

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
  • My own task or dataset (give details below)

Reproduction

We can't correctly use clip_grad_norm_ in accelerator for xla fsdp. We now check the env arg "accelerator use fsdp" in clip_grad_norm_:

if os.environ.get("ACCELERATE_USE_FSDP", "false") == "true":
. However, accelerator donot integrate fsdp, so we should not set this env arg to true.I want to know how we can modify the code to make it correct.

Expected behavior

Modify the clip_grad_norm_ to make it correct for xla fsdp.Is there any plan to integrate XLA FSDP for the accelerator? I believe this would be a better approach to solving the problem, and it would work for any scenario that uses the accelerator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant