You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried to go though the exmaple provided by the offcial document, while V100 does not support bfloat16,I change the trainer.precision to 16-mixed.At the every first iteration, I run into the gradient NaN issue.
Steps/Code to reproduce bug
Follow the step as shown in the official document to format the data.
Change trainer.precision to 16-mixed as shown below:
Describe the bug
I tried to go though the exmaple provided by the offcial document, while V100 does not support
bfloat16
,I change thetrainer.precision
to16-mixed
.At the every first iteration, I run into the gradient NaN issue.Steps/Code to reproduce bug
trainer.precision
to16-mixed
as shown below:Environment overview (please complete the following information)
Officail docker images is used :
nvcr.io/nvidia/nemo:24.12
What should I to solve this problem? Thanks
The text was updated successfully, but these errors were encountered: