Add script to write Llama's HF-formatted config.json for vLLM #936

MartinGleize · 2024-12-23T23:02:04Z

What does this PR do? Please describe:
Add script to write HF-format config.json for Llama. This config.json is then used to load fairseq2 Llama checkpoints directly in vLLM.

Does your PR introduce any breaking changes? If yes, please list them:
No

Check list:

Was the content of this PR discussed and approved via a GitHub issue? (no need for typos or documentation improvements)
Did you read the contributor guideline?
Did you make sure that your PR does only one thing instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests?
Did you verify new and existing tests pass locally with your changes?
Did you update the CHANGELOG? (no need for typos, documentation, or minor internal changes)

cbalioglu

LGTM, just left two nit comments.

src/fairseq2/models/llama/integ.py

cbalioglu · 2024-12-24T13:58:21Z

src/fairseq2/recipes/llama/__init__.py

@@ -18,3 +19,9 @@ def _setup_llama_cli(cli: Cli) -> None:
        handler=ConvertCheckpointCommandHandler(),
        help="convert fairseq2 LLaMA checkpoints to reference checkpoints",
    )
+
+    group.add_command(


I think this is fine, but would be great if we could also have a way to automatically dump this config.json in LLM finetuning recipes.

cbalioglu

At least the content of the PR looks good to me. Assuming that the generated config.json works with vLLM, I think it is good to g.

uralik · 2024-12-31T19:34:28Z

looking great! @MartinGleize We might need to test this when we have experiment that started from some other experiment checkpoint (using resume from ckpt dir argument). In this case the model name might be just "checkpoint_step_N", and smth could be not retrieved as planned into the final config. We have a bunch of such examples on our side, can help with testing in this branch if needed.

cbalioglu · 2024-12-31T22:36:58Z

looking great! @MartinGleize We might need to test this when we have experiment that started from some other experiment checkpoint (using resume from ckpt dir argument). In this case the model name might be just "checkpoint_step_N", and smth could be not retrieved as planned into the final config. We have a bunch of such examples on our side, can help with testing in this branch if needed.

Let me merge AssetCard.flatten() tomorrow morning. It has been ready for a long time. I was aiming to merge it with recipe refactoring, but can do it separately for this PR.

cbalioglu · 2024-12-31T22:47:57Z

Ok, looks like a bug in the reference implementation: pytorch/executorch#7265

uralik · 2025-01-01T19:19:38Z

Ok, looks like a bug in the reference implementation: pytorch/executorch#7265

also discussed here (and not resolved looks like): meta-llama/llama-models#241

cbalioglu · 2025-01-01T21:39:09Z

flatten(). fyi.

card = card.flatten()

will produce an asset card with all base metadata merged into one.

cbalioglu · 2025-01-01T21:41:43Z

@MartinGleize Let me know if you need a hand with making RoPE scaling factor configurable. I believe it is a good nit exercise to work on, but happy to help if you need bandwidth for other parts of the project.

MartinGleize · 2025-01-01T21:59:29Z

@MartinGleize Let me know if you need a hand with making RoPE scaling factor configurable. I believe it is a good nit exercise to work on, but happy to help if you need bandwidth for other parts of the project.

Should be no problem for me!

src/fairseq2/models/llama/factory.py

src/fairseq2/models/llama/integ.py

cbalioglu

Nice, looks good to me.

MartinGleize · 2025-01-02T15:45:53Z

I'm still wondering about tie_word_embeddings=true in the config.json (was trying to get rid of the last use of "is_llama_3_2"). Our impl and the ref impl don't have any mention of this or code (as far as I can see), but on HG they indeed have this setting to true in their config. Am I missing something or should we hard-code it to false on our side? That seems pretty important.

cbalioglu · 2025-01-02T16:25:25Z

We definitely do not use tied weights for LLaMA. I am not sure how HG came up with that setting as I don't see any weight tying in the reference implementation either. I believe we have to set it to False, since our checkpoints will have different weights for the embeddings and final projection layer.

…yers

MartinGleize self-assigned this Dec 23, 2024

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 23, 2024

MartinGleize marked this pull request as ready for review December 24, 2024 11:06

MartinGleize requested a review from cbalioglu as a code owner December 24, 2024 11:06

MartinGleize added 2 commits December 24, 2024 12:06

Add script to write HF-format config.json for Llama

6dff50d

Clean up code

cf6a589

MartinGleize force-pushed the vllm_compat branch from 87d8a58 to cf6a589 Compare December 24, 2024 11:06

Lint

aa45961

cbalioglu approved these changes Dec 24, 2024

View reviewed changes

cbalioglu approved these changes Dec 31, 2024

View reviewed changes

cbalioglu requested a review from uralik December 31, 2024 19:13

MartinGleize added 2 commits January 2, 2025 14:37

Make RoPE scaling configurable in Llama

ee7cd2d

Lint

866773e

cbalioglu reviewed Jan 2, 2025

View reviewed changes

src/fairseq2/models/llama/factory.py Show resolved Hide resolved

src/fairseq2/models/llama/integ.py Outdated Show resolved Hide resolved

Switch high_freq_factor default to 4

2f56427

cbalioglu approved these changes Jan 2, 2025

View reviewed changes

Set tie_word_embeddings to False since our training never ties the la…

7ef4771

…yers

cbalioglu merged commit f900674 into facebookresearch:main Jan 7, 2025
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add script to write Llama's HF-formatted config.json for vLLM #936

Add script to write Llama's HF-formatted config.json for vLLM #936

MartinGleize commented Dec 23, 2024

cbalioglu left a comment

cbalioglu Dec 24, 2024

MartinGleize Dec 31, 2024

cbalioglu left a comment

uralik commented Dec 31, 2024

cbalioglu commented Dec 31, 2024

cbalioglu commented Dec 31, 2024

uralik commented Jan 1, 2025

cbalioglu commented Jan 1, 2025

cbalioglu commented Jan 1, 2025

MartinGleize commented Jan 1, 2025

cbalioglu left a comment

MartinGleize commented Jan 2, 2025

cbalioglu commented Jan 2, 2025

Add script to write Llama's HF-formatted config.json for vLLM #936

Add script to write Llama's HF-formatted config.json for vLLM #936

Conversation

MartinGleize commented Dec 23, 2024

cbalioglu left a comment

Choose a reason for hiding this comment

cbalioglu Dec 24, 2024

Choose a reason for hiding this comment

MartinGleize Dec 31, 2024

Choose a reason for hiding this comment

cbalioglu left a comment

Choose a reason for hiding this comment

uralik commented Dec 31, 2024

cbalioglu commented Dec 31, 2024

cbalioglu commented Dec 31, 2024

uralik commented Jan 1, 2025

cbalioglu commented Jan 1, 2025

cbalioglu commented Jan 1, 2025

MartinGleize commented Jan 1, 2025

cbalioglu left a comment

Choose a reason for hiding this comment

MartinGleize commented Jan 2, 2025

cbalioglu commented Jan 2, 2025