Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load LoRA Adapter #1492

Open
hessaAlawwad opened this issue Jan 1, 2025 · 12 comments
Open

load LoRA Adapter #1492

hessaAlawwad opened this issue Jan 1, 2025 · 12 comments

Comments

@hessaAlawwad
Copy link

Hello,

I have finetuned a model using unsloth tutorial (Llama 3.2 Vision finetuning - Radiography use case) then uplodaded the LoRA adapter to hf,
now I want to further train this adapter by loading it:

base_model = AutoModel.from_pretrained("meta-llama/Llama-3.2-11B-Vision-Instruct")
lora_model = PeftModel.from_pretrained(base_model, "Hessa/MMTQA_lora3")
lora_model.eval()

gives me error:

ValueError: Unrecognized configuration class <class 'transformers.models.mllama.configuration_mllama.MllamaConfig'> for this kind of AutoModel: AutoModel.
Model type should be one of AlbertConfig, AlignConfig, AltCLIPConfig, ASTConfig, AutoformerConfig, BarkConfig, BartConfig, BeitConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegasusConfig, BioGptConfig, BitConfig, BlenderbotConfig, BlenderbotSmallConfig, BlipConfig, Blip2Config, BloomConfig, BridgeTowerConfig, BrosConfig, CamembertConfig, CanineConfig, ChameleonConfig, ChineseCLIPConfig, ChineseCLIPVisionConfig, ClapConfig, CLIPConfig, CLIPTextConfig, CLIPVisionConfig, CLIPSegConfig, ClvpConfig, LlamaConfig, CodeGenConfig, CohereConfig, ConditionalDetrConfig, ConvBertConfig, ConvNextConfig, ConvNextV2Config, CpmAntConfig, CTRLConfig, CvtConfig, DacConfig, Data2VecAudioConfig, Data2VecTextConfig, Data2VecVisionConfig, DbrxConfig, DebertaConfig, DebertaV2Config, DecisionTransformerConfig, DeformableDetrConfig, DeiTConfig, DetaConfig, DetrConfig, DinatConfig, Dinov2Config, DistilBertConfig, DonutSwinConfig, DPRConfig, DPTConfig, EfficientFormerConfig, EfficientNetConfig, ElectraConfig, EncodecConfig, ErnieConfig, ErnieMConfig, EsmConfig, FalconConfig, FalconMambaConfig, FastSpeech2ConformerConfig, FlaubertConfig, FlavaConfig, FNetConfig, FocalNetConfig, FSMTConfig, FunnelConfig, GemmaConfig, Gemma2Config, GitConfig, GlmConfig, GLPNConfig, GPT2Config, GPT2Config, GPTBigCodeConfig, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GPTJConfig, GPTSanJapaneseConfig, GraniteConfig, GraniteMoeConfig, GraphormerConfig, GroundingDinoConfig, GroupViTConfig, HieraCon...
@KareemMusleh
Copy link

KareemMusleh commented Jan 1, 2025

Try using AutoModelForCausalLM.from_pretrained instead

@hessaAlawwad
Copy link
Author

thank you for your reply @KareemMusleh. I got the following error:

/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_auth.py:94: UserWarning: 
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
  warnings.warn(
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
`low_cpu_mem_usage` was None, now default to True since model is quantized.
Loading checkpoint shards:   0%
 0/2 [00:00<?, ?it/s]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-9-9b45f1afa1ae>](https://localhost:8080/#) in <cell line: 4>()
      2 from transformers import AutoModelForCausalLM
      3 
----> 4 lora_model = AutoModelForCausalLM.from_pretrained("Hessa/MMTQA_lora3", torch_dtype=torch.float16)

4 frames
[/usr/local/lib/python3.10/dist-packages/transformers/quantizers/quantizer_bnb_4bit.py](https://localhost:8080/#) in create_quantized_param(self, model, param_value, param_name, target_device, state_dict, unexpected_keys)
    205                 param_name + ".quant_state.bitsandbytes__nf4" not in state_dict
    206             ):
--> 207                 raise ValueError(
    208                     f"Supplied state dict for {param_name} does not contain `bitsandbytes__*` and possibly other `quantized_stats` components."
    209                 )

ValueError: Supplied state dict for model.layers.0.mlp.down_proj.weight does not contain `bitsandbytes__*` and possibly other `quantized_stats` components.

@mosama1994
Copy link

mosama1994 commented Jan 1, 2025

can you share your adapter_config.json? You must have trained after quantization right? Then open the config and check "base_model_name_or_path" key. Set this to the base model you want to merge the adapter to. Unsloth hugging face repo also has the unquantized models either from there or the original repo, from there you can use for the base model.

Can you also try the following:

from peft import AutoPeftModelForCausalLM
import torch

peft_model = AutoPeftModelForCausalLM.from_pretrained(
    "this is the path to the adapter",
    low_cpu_mem_usage=True,
    torch_dtype=torch.bfloat16
)
merged_model = peft_model.merge_and_unload()

Now merged_model will be your final merged model that you can save and also use.

@hessaAlawwad
Copy link
Author

thank you very much @mosama1994
I wanted to load the lora_model to further train it so I di not want to merge it with the base model.
Is it possible?

@mosama1994
Copy link

You will have to merge and do that i believe. Right now, when you will create a LoRA adapter, it will be created with random weights. So, if you dont merge, you should be able to use that for finetuning actually. I have never tried that. So, the peft_model, that should be further trainable, I believe. Test it out and let me know as well.

@mosama1994
Copy link

However, if you will create the peft model like this for finetuning, it will not be optimized by unsloth. Because when you do FastLanguageModel.get_peft_model from unsloth the also patches some layers to make it faster.

@mosama1994
Copy link

mosama1994 commented Jan 6, 2025

Hi Hessa,

I found out how you can load your LoRA adapter for further tuning. The following is the code:

from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "mosama/Qwen2.5-0.5B-Pretraining-ar-eng-urd-LoRA-Adapters", # YOUR PEFT MODEL PATH WILL COME HERE
    max_seq_length = 2048,
    dtype = torch.bfloat16,
    load_in_4bit = True,
)

# NOW WHEN YOU WILL DO THIS STEP, IT WILL SAY THAT YOU ALREADY HAVE A PEFT MODEL AND YOU CAN GO ON WITH THE REST OF YOUR CODE, PROBLEM IS THERE IS A SMALL BUG IN UNSLOTH CODE FOR WHICH I HAVE OPENED A PULL REQUEST TO RESOLVE. PULL REQUEST IS BELOW. YOU CAN SET THE SAME CONFIG THAT THE MODEL WAS LOADED WITH PREVIOUSLY WHEN YOU TRAINED.
model = FastLanguageModel.get_peft_model(
    model,
    r = 32, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj", 
                      "embed_tokens","lm_head" # YOU CAN REMOVE THESE, THESE ARE WHAT I ADDED FOR PRETRAINING A MODEL
                      ],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = True,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

PULL REQUEST:
#1509

@hessaAlawwad
Copy link
Author

thank you @mosama1994

So I can use model (the lora adapter) for further training without this step?

model = FastLanguageModel.get_peft_model(
    model,
    r = 32, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj", 
                      "embed_tokens","lm_head" # YOU CAN REMOVE THESE, THESE ARE WHAT I ADDED FOR PRETRAINING A MODEL
                      ],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = True,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

@mosama1994
Copy link

mosama1994 commented Jan 6, 2025

You have to do this step as this will load the model with the trained LORA adapter that you gave the path to properly. Just change the settings to your LORA adapter that you used in the first training. The pull request i created you, there is a llama.py file in unsloth models folder which needs changes. But i believe you are training vision model so check after these two steps if you get any error and let me know.

The first part actually loads the PEFT model but yeah the second step basically check the config and make the model ready for training. So, need to run both same as last time when you trained, just in the first part change the path to the LoRA checkpoint. Also, after these 2 steps run this:

model = model.to("cuda:0")

@hessaAlawwad
Copy link
Author

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[<ipython-input-16-f0f542f37d53>](https://localhost:8080/#) in <cell line: 1>()
----> 1 model = FastVisionModel.get_peft_model(
      2     model,
      3     finetune_vision_layers     = True, # False if not finetuning vision layers
      4     finetune_language_layers   = True, # False if not finetuning language layers
      5     finetune_attention_modules = True, # False if not finetuning attention layers

[/usr/local/lib/python3.10/dist-packages/unsloth/models/vision.py](https://localhost:8080/#) in get_peft_model(model, r, target_modules, lora_alpha, lora_dropout, bias, finetune_vision_layers, finetune_language_layers, finetune_attention_modules, finetune_mlp_modules, layers_to_transform, layers_pattern, use_gradient_checkpointing, random_state, max_seq_length, use_rslora, modules_to_save, init_lora_weights, loftq_config, temporary_location, **kwargs)
    237 
    238         if isinstance(model, PeftModelForCausalLM):
--> 239             raise RuntimeError("Unsloth: You already added LoRA adapters to your model!")
    240 
    241         if target_modules == "all-linear":

RuntimeError: Unsloth: You already added LoRA adapters to your model!

@hessaAlawwad
Copy link
Author

those are the peft configuration:

model = FastVisionModel.get_peft_model(
model,
finetune_vision_layers = True, # False if not finetuning vision layers
finetune_language_layers = True, # False if not finetuning language layers
finetune_attention_modules = True, # False if not finetuning attention layers
finetune_mlp_modules = True, # False if not finetuning MLP layers

r = 16,           # The larger, the higher the accuracy, but might overfit
lora_alpha = 16,  # Recommended alpha == r at least
lora_dropout = 0,
bias = "none",
random_state = 3407,
use_rslora = False,  # We support rank stabilized LoRA
loftq_config = None, # And LoftQ
# target_modules = "all-linear", # Optional now! Can specify a list if needed

)

@mosama1994
Copy link

Yes, then in this case just keep the first step. Then run:

model.print_trainable_parameters() # This is just to check that indeed we are using less parameters it will print number and percentage.
model = model.to("cuda:0")
# Then the rest of your script, skipping the get_peft_model

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants