Cuda OOM error when merging LoRA weights after 4bit training #130
Replies: 2 comments
-
Thanks a lot for trying out H2O LLM Studio (even on a developer branch!). I have changed loading+merging of weights in #131 to not use GPUs but CPU. With this, downloading the model (in Huggingface Format, with LORA layers merged) should work (the PR will also be reviewed). Hope this helps and please reach out if there are any questions open. |
Beta Was this translation helpful? Give feedback.
-
Also notice that there are currently some memory overhead issues for 4bit/8bit training, see here. |
Beta Was this translation helpful? Give feedback.
-
Hi all,
I recently used the branch
max/4bit
to train a backbone with 35b parameters which I actually succeeded in. I am able to load the model via H2o-llmstudio since4bit
is used for the backbone here aswell.Now I want to push the resulting model to HF which results in merging the LoRA weights back into the base model. This results in the known CUDA OOM Error since the base model is quite large and
float16
is used for the backbone here.I am working with a system that has 8 x V100 80GB and 512 GB RAM which is pretty much the maximum I can get.
Do you have any ideas on how to accomplish the merge? I am fully aware of the fact that I might face the very same problem with the finished merged model aswell but that put aside I need to find a solution for the prior problem first.
Kind regards
Julian
Beta Was this translation helpful? Give feedback.
All reactions