Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

To save model in HF format after supervised-fine-tune-qlora #139

Open
MyBruso opened this issue Nov 16, 2023 · 7 comments
Open

To save model in HF format after supervised-fine-tune-qlora #139

MyBruso opened this issue Nov 16, 2023 · 7 comments

Comments

@MyBruso
Copy link

MyBruso commented Nov 16, 2023

Hello @yukang2017 , I used supervised-fine-tune-qlora.py and did get the checkpoints at path_to_saving_checkpoints. After this step, I tried to run - merge_lora_weights_and_save_hf_model.py on the last checkpoint as I did not get the final model files (pytorch_model.bin) in path_to_saving_checkpoints. Did I miss something or is this the correct way to save hf model?

!python3 merge_lora_weights_and_save_hf_model.py \ --base_model "meta-llama/Llama-2-13b-hf" \ --peft_model "../path_to_saving_checkpoints/checkpoint-6000" \ --context_size 8192 \ --save_path "../path_to_saving_merged_model"

I also tried to find zero_to_fp32.py, somehow I could not find this script in the code. Is it necessary to run this script OR just the merge_lora_weights_and_save_hf_model.py is sufficient?

After merge_lora_weights_and_save_hf_model.py, I tried inference as below, which seems to be referring some.txt in the answers so I am assuming my steps are okay, please do let me know if I have missed any important step here.

!python3 inference.py
--base_model "../path_to_saving_merged_model"
--question "some question"
--context_size 32768
--max_gen_len 512
--flash_attn True
--material "some.txt"

Thank you again!

@gianlucamacri
Copy link
Contributor

@MyBruso hi, have a look at #123 and specifically to the workaround of grimulkan which should be the best option. Unfortunately as far as I know you will need to repeat the training.

@MyBruso
Copy link
Author

MyBruso commented Nov 16, 2023

Hello @gianlucamacri Thank you for sharing #123. Is saving model using model.base_model.save_pretrained(training_args.output_dir) different than merge_lora_weights_and_save_hf_model.py? I did notice the difference; however I see pytorch_model.bin files are getting generated using merge_lora_weights_and_save_hf_model.py. It is loading PEFT model and then saving it; however I am curious to know the reason you are suggesting to explicitly add model.base_model.save_pretrained() in supervised-fine-tune-qlora.py.

@gianlucamacri
Copy link
Contributor

Hello @gianlucamacri Thank you for sharing #123. Is saving model using model.base_model.save_pretrained(training_args.output_dir) different than merge_lora_weights_and_save_hf_model.py? I did notice the difference; however I see pytorch_model.bin files are getting generated using merge_lora_weights_and_save_hf_model.py. It is loading PEFT model and then saving it; however I am curious to know the reason you are suggesting to explicitly add model.base_model.save_pretrained() in supervised-fine-tune-qlora.py.

Yeah the difference is in the fact that ’model.base_model.save_pretrained(training_args.output_dir)’ is saving the model weights as they are in the base model augmented with the Lora layer since it will only bypass the Lora model object-wrapper (that's the cause of the .base_model notation), since calling the save method on the model itself would only save the weights of the Lora adapters (which is what the trainer does under the hood). This will allow you to store the modified weights of the model as well as all the others but in an "unmerged structure". On the other hand using ’merge_lora_weights_and_save_hf_model.py’ you are required to provide also the trainable parameters that are not saved by the Lora adapters, i.e. normalization and embedding layers that in the basic configuration of the script are left trainable.

However note that the workaround I suggested in #123 was rather rudimental, indeed the solution of grimulkan is much more elegant and convenient as it works seamlessly with a variety of configurations (while mine worked just for the 8-bit quantization) and allows to save multiple usable checkpoints with a reduced memory footprint as it stores just the trainable parameters. So I would strongly suggest to use that solution 😄

@MyBruso
Copy link
Author

MyBruso commented Nov 29, 2023

Thank you @gianlucamacri for sharing these insights. I tried #123's approach to save trainable_params.bin on_train_end. After this I ran merge_lora_weights_and_save_hf_model.py to generate pytorch_model.bin files. Now this script is able to use the trainable_params.
One of the observation is - with this model, inference quality is deteriorated. Do you think I should try the intermediate checkpoints to generate the final model?
OR
With no trainable_params.bin supplied, merge_lora_weights_and_save_hf_model.py skips the model.load_state_dict(torch.load(trainable_params, map_location=model.device), strict=False) and loads the model from path path_to_saving_checkpoints. Is it fine if I use this loaded model for inference?

@gianlucamacri
Copy link
Contributor

gianlucamacri commented Nov 29, 2023

Thank you @gianlucamacri for sharing these insights. I tried #123's approach to save trainable_params.bin on_train_end. After this I ran merge_lora_weights_and_save_hf_model.py to generate pytorch_model.bin files. Now this script is able to use the trainable_params. One of the observation is - with this model, inference quality is deteriorated. Do you think I should try the intermediate checkpoints to generate the final model? OR With no trainable_params.bin supplied, merge_lora_weights_and_save_hf_model.py skips the model.load_state_dict(torch.load(trainable_params, map_location=model.device), strict=False) and loads the model from path path_to_saving_checkpoints. Is it fine if I use this loaded model for inference?

So I assume you used grimulkan-like code on #123, but reporting it here may be helpful in finding any possible issues.

Regarding the first issue, what do you mean that the inference quality has deteriorated? Did you make sure to compare the performance on the same data? I ask this since you may usually expect performances to be worse on data that was unseen during the training process but on the same data (train or validation) the model should perform the same (or better based on authors' choice to switch to full-attention rather than s2 attention for the inference) if the model weights are the same, which should happen with a correct merge. Another trivial check would be to make sure to use the correct base model and checkpoint folder.

Regarding the second point, I would say that it may be useful to try a previous checkpoint only if you saved the trainable weights for it, which I assume is not the case based on what you wrote, otherwise it would make very little mathematical sense to use weights derived from different time steps.

@MyBruso
Copy link
Author

MyBruso commented Nov 29, 2023

Thank you @gianlucamacri for sharing these insights. I tried #123's approach to save trainable_params.bin on_train_end. After this I ran merge_lora_weights_and_save_hf_model.py to generate pytorch_model.bin files. Now this script is able to use the trainable_params. One of the observation is - with this model, inference quality is deteriorated. Do you think I should try the intermediate checkpoints to generate the final model? OR With no trainable_params.bin supplied, merge_lora_weights_and_save_hf_model.py skips the model.load_state_dict(torch.load(trainable_params, map_location=model.device), strict=False) and loads the model from path path_to_saving_checkpoints. Is it fine if I use this loaded model for inference?

So I assume you used grimulkan-like code on #123, but reporting it here may be helpful in finding any possible issues.

Regarding the first issue, what do you mean that the inference quality has deteriorated? Did you make sure to compare the performance on the same data? I ask this since you may usually expect performances to be worse on data that was unseen during the training process but on the same data (train or validation) the model should perform the same (or better based on authors' choice to switch to full-attention rather than s2 attention for the inference) if the model weights are the same, which should happen with a correct merge. Another trivial check would be to make sure to use the correct base model and checkpoint folder.

Regarding the second point, I would say that it may be useful to try a previous checkpoint only if you saved the trainable weights for it, which I assume is not the case based on what you wrote, otherwise it would make very little mathematical sense to use weights derived from different time steps.

yes, I did use #123 like approach and did save trainable_params on_save as well as on_train_end.
Here are my observations -
what do you mean that the inference quality has deteriorated? - I see with same training data and inference on selected samples (from training data), I used to get better output without using 'trainable_parameters.bin' in merge_lora_weights_and_save_hf_model.py. The output was in context with its supplied source while answering the question at inference time.
With 'trainable_parameters.bin' passed to merge_lora_weights_and_save_hf_model.py, I see the same questions are missing the main details from the supplied context.

This is how I tried merging and inference -
!python3 merge_lora_weights_and_save_hf_model.py
--base_model "meta-llama/Llama-2-13b-hf"
--peft_model "path_to_saving_checkpoints"
--context_size 32768
--save_path "path_to_saving_merged_model"

answer = !python3 inference.py
--base_model "path_to_saving_merged_model"
--question "sample question"
--context_size 32768
--max_gen_len 8096 \ # I am trying various options here from 1024 ..
--flash_attn True
--material "some.txt"

@MyBruso
Copy link
Author

MyBruso commented Dec 12, 2023

My apologies for the late update.

I re-ran the training (supervised-fine-tune-qlora.py) and this time I did get zero_to_fp32.py in the "path_to_saving_checkpoints/checkpoint-<>" folder.
So I followed the sequence given on ReadMe as below-

  1. !python3 zero_to_fp32.py -t path_to_saving_checkpoints/checkpoint-<>/global_step<> . pytorch_model.bin
    This did create pytorch_model.bin file in the same checkpoint folder.

  2. Then I used get_trainable_weights.py to get the trainable_params.bin -
    !python3 get_trainable_weights.py --checkpoint_path path_to_saving_checkpoints/checkpoint-<> --trainable_params "embed,norm"
    Query : Can I rely on the trainable_params.bin generated using this script or I need to use Saving pytorch_model.bin with QLORA #123's approach only?

  3. Finally I used merge_lora_weights_and_save_hf_model.py to merge the weights -

!python3 merge_lora_weights_and_save_hf_model.py \
        --base_model "meta-llama/Llama-2-13b-hf" \
        --peft_model "path_to_saving_checkpoints/checkpoint-<>" \
        --context_size 8192 \
        --save_path "path_to_saving_merged_model"

Query : Now can I assume my merging of weights (LORA and base model) is properly done?

Note : checkpoint-<> - This is the last checkpoint.

Hello @yukang2017 , can you please confirm that I did proper merging of the base and adaptor weights above and now I need not use same BitsAndBytesConfig at the inference time i.e. inference.py is good enough to use the model after merging?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants