Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tracking] FLUX T5 XXL model produces NaN when on CUDA and using F16 #2480

Open
EricLBuehler opened this issue Sep 16, 2024 · 1 comment · May be fixed by #2481
Open

[Tracking] FLUX T5 XXL model produces NaN when on CUDA and using F16 #2480

EricLBuehler opened this issue Sep 16, 2024 · 1 comment · May be fixed by #2481

Comments

@EricLBuehler
Copy link
Member

Perhaps we can use clamping, as per:

https://github.com/huggingface/transformers/blob/main/src/transformers/models/t5/modeling_t5.py#L748-L755

Using BF16 works on CUDA.

@EricLBuehler EricLBuehler changed the title FLUX T5 XXL model produces NaN when on CUDA and using F16 [Tracking] FLUX T5 XXL model produces NaN when on CUDA and using F16 Sep 16, 2024
@EricLBuehler
Copy link
Member Author

Interesting find: F16 fails (produces NaN) on an A100, but not an H100.

@EricLBuehler EricLBuehler linked a pull request Sep 17, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant