[QUESTION]Using FP8 OOM, otherwise --bf16 works well #1241
Unanswered
yanchenmochen
asked this question in
Q&A
Replies: 1 comment
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
When I train a 7B model on H100 GPU using FP8, it turns out OOM, while the same parameters using - bf16 can be trained fine, what is the possible problem?
I tried to reduce memory by --recompute-granularity selective, but it failed.
the error info is
Beta Was this translation helpful? Give feedback.
All reactions