Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot Convert Checkpint to Trainable Model #133

Open
believewhat opened this issue Nov 12, 2023 · 3 comments
Open

Cannot Convert Checkpint to Trainable Model #133

believewhat opened this issue Nov 12, 2023 · 3 comments

Comments

@believewhat
Copy link

believewhat commented Nov 12, 2023

Hi authors,

When I am trying to merge checkpoint (lora) with base model(merge_lora_weights_and_save_hf_model.py) I encounter this issue:

You are resizing the embedding layer without providing a pad_to_multiple_of parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as Tensor Cores will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc

And I cannot get the final model.

@believewhat believewhat changed the title You are resizing the embedding layer without providing a pad_to_multiple_of parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc Cannot Convert Checkpint to Trainable Model Nov 13, 2023
@weicheng113
Copy link
Contributor

It seems the above is a warning or so. The code still runs with the above output in my end.

@believewhat
Copy link
Author

But I only got one checkpoint, not 15 checkpoints for llama70b(only pytorch_model.bin, not pytorch_model-00001-of-00015.bin...).

@believewhat
Copy link
Author

Anotehr problem is that I found the size of adapter_model.bin is only 500B.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants