33b Model on 4xA100 (40GB) OOM #666
Answered
by
psinger
AlexanderZhk
asked this question in
Q&A
-
Trying to LORA fine-tune a 33b model on 4xA100 (40GB) and getting OOM errors. Using fp16. From my understanding, this hardware should be enough for this task, am I missing something? Training config:
|
Beta Was this translation helpful? Give feedback.
Answered by
psinger
Apr 9, 2024
Replies: 1 comment 5 replies
-
Did you try int4 with LoRA and without deepspeed? |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I have never seen any real performance degredation doing lora in int4. The final weights will be merged back and you can put the model into production in any precision.
Deepspeed has issues with generation inference, I would recommend switching the metric to
Perplexity
, which will do raw logit evaluation. This should speed up the validation speed significantly.