-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
To save model in HF format after supervised-fine-tune-qlora #139
Comments
Hello @gianlucamacri Thank you for sharing #123. Is saving model using |
Yeah the difference is in the fact that ’model.base_model.save_pretrained(training_args.output_dir)’ is saving the model weights as they are in the base model augmented with the Lora layer since it will only bypass the Lora model object-wrapper (that's the cause of the .base_model notation), since calling the save method on the model itself would only save the weights of the Lora adapters (which is what the trainer does under the hood). This will allow you to store the modified weights of the model as well as all the others but in an "unmerged structure". On the other hand using ’merge_lora_weights_and_save_hf_model.py’ you are required to provide also the trainable parameters that are not saved by the Lora adapters, i.e. normalization and embedding layers that in the basic configuration of the script are left trainable. However note that the workaround I suggested in #123 was rather rudimental, indeed the solution of grimulkan is much more elegant and convenient as it works seamlessly with a variety of configurations (while mine worked just for the 8-bit quantization) and allows to save multiple usable checkpoints with a reduced memory footprint as it stores just the trainable parameters. So I would strongly suggest to use that solution 😄 |
Thank you @gianlucamacri for sharing these insights. I tried #123's approach to save trainable_params.bin |
So I assume you used grimulkan-like code on #123, but reporting it here may be helpful in finding any possible issues. Regarding the first issue, what do you mean that the inference quality has deteriorated? Did you make sure to compare the performance on the same data? I ask this since you may usually expect performances to be worse on data that was unseen during the training process but on the same data (train or validation) the model should perform the same (or better based on authors' choice to switch to full-attention rather than s2 attention for the inference) if the model weights are the same, which should happen with a correct merge. Another trivial check would be to make sure to use the correct base model and checkpoint folder. Regarding the second point, I would say that it may be useful to try a previous checkpoint only if you saved the trainable weights for it, which I assume is not the case based on what you wrote, otherwise it would make very little mathematical sense to use weights derived from different time steps. |
yes, I did use #123 like approach and did save trainable_params This is how I tried merging and inference - answer = !python3 inference.py |
My apologies for the late update. I re-ran the training (supervised-fine-tune-qlora.py) and this time I did get zero_to_fp32.py in the "path_to_saving_checkpoints/checkpoint-<>" folder.
Query : Now can I assume my merging of weights (LORA and base model) is properly done? Note : checkpoint-<> - This is the last checkpoint. Hello @yukang2017 , can you please confirm that I did proper merging of the base and adaptor weights above and now I need not use same BitsAndBytesConfig at the inference time i.e. inference.py is good enough to use the model after merging? |
Hello @yukang2017 , I used supervised-fine-tune-qlora.py and did get the checkpoints at path_to_saving_checkpoints. After this step, I tried to run - merge_lora_weights_and_save_hf_model.py on the last checkpoint as I did not get the final model files (pytorch_model.bin) in path_to_saving_checkpoints. Did I miss something or is this the correct way to save hf model?
!python3 merge_lora_weights_and_save_hf_model.py \ --base_model "meta-llama/Llama-2-13b-hf" \ --peft_model "../path_to_saving_checkpoints/checkpoint-6000" \ --context_size 8192 \ --save_path "../path_to_saving_merged_model"
I also tried to find zero_to_fp32.py, somehow I could not find this script in the code. Is it necessary to run this script OR just the merge_lora_weights_and_save_hf_model.py is sufficient?
After merge_lora_weights_and_save_hf_model.py, I tried inference as below, which seems to be referring some.txt in the answers so I am assuming my steps are okay, please do let me know if I have missed any important step here.
!python3 inference.py
--base_model "../path_to_saving_merged_model"
--question "some question"
--context_size 32768
--max_gen_len 512
--flash_attn True
--material "some.txt"
Thank you again!
The text was updated successfully, but these errors were encountered: