Cuda OOM error when merging LoRA weights after 4bit training #130

JulianGerhard21 · 2023-06-01T10:04:18Z

JulianGerhard21
Jun 1, 2023

Hi all,

I recently used the branch max/4bit to train a backbone with 35b parameters which I actually succeeded in. I am able to load the model via H2o-llmstudio since 4bit is used for the backbone here aswell.

Now I want to push the resulting model to HF which results in merging the LoRA weights back into the base model. This results in the known CUDA OOM Error since the base model is quite large and float16 is used for the backbone here.

I am working with a system that has 8 x V100 80GB and 512 GB RAM which is pretty much the maximum I can get.

Do you have any ideas on how to accomplish the merge? I am fully aware of the fact that I might face the very same problem with the finished merged model aswell but that put aside I need to find a solution for the prior problem first.

Kind regards
Julian

maxjeblick · 2023-06-01T15:26:49Z

maxjeblick
Jun 1, 2023

Thanks a lot for trying out H2O LLM Studio (even on a developer branch!).
The issue you are describing is also described in peft (see issue here). That is, it seems currently not possible to merge LORA layers with the base model while keeping 4 bit quantization (I think this sounds reasonable, as adding base weights + LORA needs higher precision).

I have changed loading+merging of weights in #131 to not use GPUs but CPU. With this, downloading the model (in Huggingface Format, with LORA layers merged) should work (the PR will also be reviewed).
Once the model has been merged, it can be loaded in 4bit again, allowing it to be deployed to the same hardware. Alternatively, you could use this function (with LORA layers not been merged) that we use for chat functionality.

Hope this helps and please reach out if there are any questions open.

0 replies

maxjeblick · 2023-06-02T09:06:22Z

maxjeblick
Jun 2, 2023

Also notice that there are currently some memory overhead issues for 4bit/8bit training, see here.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cuda OOM error when merging LoRA weights after 4bit training #130

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Cuda OOM error when merging LoRA weights after 4bit training #130

JulianGerhard21 Jun 1, 2023

Replies: 2 comments

maxjeblick Jun 1, 2023

maxjeblick Jun 2, 2023

JulianGerhard21
Jun 1, 2023

maxjeblick
Jun 1, 2023

maxjeblick
Jun 2, 2023