Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add script to write Llama's HF-formatted config.json for vLLM #936

Merged
merged 7 commits into from
Jan 7, 2025

Conversation

MartinGleize
Copy link
Contributor

What does this PR do? Please describe:
Add script to write HF-format config.json for Llama. This config.json is then used to load fairseq2 Llama checkpoints directly in vLLM.

Does your PR introduce any breaking changes? If yes, please list them:
No

Check list:

  • Was the content of this PR discussed and approved via a GitHub issue? (no need for typos or documentation improvements)
  • Did you read the contributor guideline?
  • Did you make sure that your PR does only one thing instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests?
  • Did you verify new and existing tests pass locally with your changes?
  • Did you update the CHANGELOG? (no need for typos, documentation, or minor internal changes)

@MartinGleize MartinGleize self-assigned this Dec 23, 2024
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 23, 2024
@MartinGleize MartinGleize marked this pull request as ready for review December 24, 2024 11:06
Copy link
Contributor

@cbalioglu cbalioglu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just left two nit comments.

src/fairseq2/models/llama/integ.py Show resolved Hide resolved
src/fairseq2/models/llama/integ.py Outdated Show resolved Hide resolved
@@ -18,3 +19,9 @@ def _setup_llama_cli(cli: Cli) -> None:
handler=ConvertCheckpointCommandHandler(),
help="convert fairseq2 LLaMA checkpoints to reference checkpoints",
)

group.add_command(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine, but would be great if we could also have a way to automatically dump this config.json in LLM finetuning recipes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed!

Copy link
Contributor

@cbalioglu cbalioglu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least the content of the PR looks good to me. Assuming that the generated config.json works with vLLM, I think it is good to g.

@cbalioglu cbalioglu requested a review from uralik December 31, 2024 19:13
@uralik
Copy link
Contributor

uralik commented Dec 31, 2024

looking great! @MartinGleize We might need to test this when we have experiment that started from some other experiment checkpoint (using resume from ckpt dir argument). In this case the model name might be just "checkpoint_step_N", and smth could be not retrieved as planned into the final config. We have a bunch of such examples on our side, can help with testing in this branch if needed.

@cbalioglu
Copy link
Contributor

looking great! @MartinGleize We might need to test this when we have experiment that started from some other experiment checkpoint (using resume from ckpt dir argument). In this case the model name might be just "checkpoint_step_N", and smth could be not retrieved as planned into the final config. We have a bunch of such examples on our side, can help with testing in this branch if needed.

Let me merge AssetCard.flatten() tomorrow morning. It has been ready for a long time. I was aiming to merge it with recipe refactoring, but can do it separately for this PR.

@cbalioglu
Copy link
Contributor

Ok, looks like a bug in the reference implementation: pytorch/executorch#7265

@uralik
Copy link
Contributor

uralik commented Jan 1, 2025

Ok, looks like a bug in the reference implementation: pytorch/executorch#7265

also discussed here (and not resolved looks like): meta-llama/llama-models#241

@cbalioglu
Copy link
Contributor

flatten(). fyi.

card = card.flatten()

will produce an asset card with all base metadata merged into one.

@cbalioglu
Copy link
Contributor

@MartinGleize Let me know if you need a hand with making RoPE scaling factor configurable. I believe it is a good nit exercise to work on, but happy to help if you need bandwidth for other parts of the project.

@MartinGleize
Copy link
Contributor Author

@MartinGleize Let me know if you need a hand with making RoPE scaling factor configurable. I believe it is a good nit exercise to work on, but happy to help if you need bandwidth for other parts of the project.

Should be no problem for me!

Copy link
Contributor

@cbalioglu cbalioglu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, looks good to me.

@MartinGleize
Copy link
Contributor Author

I'm still wondering about tie_word_embeddings=true in the config.json (was trying to get rid of the last use of "is_llama_3_2"). Our impl and the ref impl don't have any mention of this or code (as far as I can see), but on HG they indeed have this setting to true in their config. Am I missing something or should we hard-code it to false on our side? That seems pretty important.

@cbalioglu
Copy link
Contributor

We definitely do not use tied weights for LLaMA. I am not sure how HG came up with that setting as I don't see any weight tying in the reference implementation either. I believe we have to set it to False, since our checkpoints will have different weights for the embeddings and final projection layer.

@cbalioglu cbalioglu merged commit f900674 into facebookresearch:main Jan 7, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants