[Tracking] FLUX T5 XXL model produces NaN when on CUDA and using F16 #2480

EricLBuehler · 2024-09-16T15:49:55Z

Perhaps we can use clamping, as per:

https://github.com/huggingface/transformers/blob/main/src/transformers/models/t5/modeling_t5.py#L748-L755

Using BF16 works on CUDA.

EricLBuehler · 2024-09-17T14:39:16Z

Interesting find: F16 fails (produces NaN) on an A100, but not an H100.

EricLBuehler changed the title ~~FLUX T5 XXL model produces NaN when on CUDA and using F16~~ [Tracking] FLUX T5 XXL model produces NaN when on CUDA and using F16 Sep 16, 2024

EricLBuehler linked a pull request Sep 17, 2024 that will close this issue

Clamping t5 hidden states to avoid F16 NaNs #2481

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tracking] FLUX T5 XXL model produces NaN when on CUDA and using F16 #2480

[Tracking] FLUX T5 XXL model produces NaN when on CUDA and using F16 #2480

EricLBuehler commented Sep 16, 2024

EricLBuehler commented Sep 17, 2024

[Tracking] FLUX T5 XXL model produces NaN when on CUDA and using F16 #2480

[Tracking] FLUX T5 XXL model produces NaN when on CUDA and using F16 #2480

Comments

EricLBuehler commented Sep 16, 2024

EricLBuehler commented Sep 17, 2024