-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add script to write Llama's HF-formatted config.json for vLLM #936
Conversation
87d8a58
to
cf6a589
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just left two nit comments.
@@ -18,3 +19,9 @@ def _setup_llama_cli(cli: Cli) -> None: | |||
handler=ConvertCheckpointCommandHandler(), | |||
help="convert fairseq2 LLaMA checkpoints to reference checkpoints", | |||
) | |||
|
|||
group.add_command( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine, but would be great if we could also have a way to automatically dump this config.json in LLM finetuning recipes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least the content of the PR looks good to me. Assuming that the generated config.json works with vLLM, I think it is good to g.
looking great! @MartinGleize We might need to test this when we have experiment that started from some other experiment checkpoint (using resume from ckpt dir argument). In this case the model name might be just "checkpoint_step_N", and smth could be not retrieved as planned into the final config. We have a bunch of such examples on our side, can help with testing in this branch if needed. |
Let me merge AssetCard.flatten() tomorrow morning. It has been ready for a long time. I was aiming to merge it with recipe refactoring, but can do it separately for this PR. |
Ok, looks like a bug in the reference implementation: pytorch/executorch#7265 |
also discussed here (and not resolved looks like): meta-llama/llama-models#241 |
will produce an asset card with all base metadata merged into one. |
@MartinGleize Let me know if you need a hand with making RoPE scaling factor configurable. I believe it is a good nit exercise to work on, but happy to help if you need bandwidth for other parts of the project. |
Should be no problem for me! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, looks good to me.
I'm still wondering about tie_word_embeddings=true in the config.json (was trying to get rid of the last use of "is_llama_3_2"). Our impl and the ref impl don't have any mention of this or code (as far as I can see), but on HG they indeed have this setting to true in their config. Am I missing something or should we hard-code it to false on our side? That seems pretty important. |
We definitely do not use tied weights for LLaMA. I am not sure how HG came up with that setting as I don't see any weight tying in the reference implementation either. I believe we have to set it to |
What does this PR do? Please describe:
Add script to write HF-format
config.json
for Llama. Thisconfig.json
is then used to load fairseq2 Llama checkpoints directly in vLLM.Does your PR introduce any breaking changes? If yes, please list them:
No
Check list: