Skip to content

[FEAT][TRAINING] Add fine-tuning support for EAGLE3 from HF Hub#268

Draft
VincentG1234 wants to merge 13 commits intovllm-project:mainfrom
VincentG1234:eagle-finetuning
Draft

[FEAT][TRAINING] Add fine-tuning support for EAGLE3 from HF Hub#268
VincentG1234 wants to merge 13 commits intovllm-project:mainfrom
VincentG1234:eagle-finetuning

Conversation

@VincentG1234
Copy link

Summary

Adds support for fine-tuning EAGLE3 models from pretrained checkpoints, enabling users to initialize training from existing models stored locally or on HuggingFace Hub.

Changes

  • Added --pretrained-model-path CLI argument to scripts/train.py
  • Implemented load_safetensors_state_dict() function supporting:
    • Local single/sharded safetensors files
    • Automatic download from HuggingFace Hub
  • Automatic extraction of d2t/t2d vocab mappings from pretrained models
  • Automatic derivation of draft_vocab_size from loaded mappings

Usage

Fine-tune from HuggingFace Hub:

python scripts/train.py
--verifier-name-or-path meta-llama/Llama-3.1-8B-Instruct
--pretrained-model-path RedHatAI/Llama-3.1-8B-Instruct-speculator.eagle3
--data-path ./new_data
--epochs 3
--lr 5e-5

@fynnsu
Copy link
Collaborator

fynnsu commented Feb 2, 2026

Hi @VincentG1234, this is great, thank you for working on this (and opening the related issue)!

My main feedback at this stage is that it would be good to move the loading logic + t2d/d2t extraction logic into utility functions inside the speculators package. We actually have some similar code for loading from hf/local safetensor files already in src/speculators/utils/loading.py, so maybe this could be consolidated with what you've added (although our existing loading utils are mostly targeted at more focused on extracting a single tensor, like the lm head or token embedding).

Additionally, we will want at least one test that loads an existing checkpoint from HF before merging.

Let me know if I can help!

Copy link
Collaborator

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution! Just out of curiosity, how much data was needed to fine tune to the llama3 draft model?

@dsikka dsikka added enhancement New feature or request eagle3 labels Feb 2, 2026
@VincentG1234
Copy link
Author

Thank you for your contribution! Just out of curiosity, how much data was needed to fine tune to the llama3 draft model?

I haven't conducted a full fine-tuning yet. My aim is to enhance the model in French while maintaining performance in English. I will share my results!

Hi @VincentG1234, this is great, thank you for working on this (and opening the related issue)!

My main feedback at this stage is that it would be good to move the loading logic + t2d/d2t extraction logic into utility functions inside the speculators package. We actually have some similar code for loading from hf/local safetensor files already in src/speculators/utils/loading.py, so maybe this could be consolidated with what you've added (although our existing loading utils are mostly targeted at more focused on extracting a single tensor, like the lm head or token embedding).

Additionally, we will want at least one test that loads an existing checkpoint from HF before merging.

Let me know if I can help!

Thank you for the feedback! I'll fix this very soon.

@mergify
Copy link

mergify bot commented Feb 5, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @VincentG1234.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
@VincentG1234
Copy link
Author

VincentG1234 commented Feb 6, 2026

Hello guys @dsikka @fynnsu, just to keep you in touch. I come with good news:

  1. I investigated the acceptance drop after a light fine-tuning of an EAGLE-3 1B draft model (I test with this one: RedHatAI/Llama-3.1-8B-Instruct-speculator.eagle3). Although the weights were nearly identical to the HF checkpoint (only fp-level differences), the issue was caused by a config.json mismatch between my local setup and the HF model. In particular, rope_theta (and related RoPE scaling parameters) differed significantly, which changed the verifier’s hidden states and broke draft–verifier alignment, leading to a large drop in mean acceptance length (2.6 to 1.9).
    After aligning the RoPE configuration with the HF config, acceptance metrics returned to expected values. I will fix that in the code and I think we will be good !

  2. I exported the loading_weights funtions in the right script as suggested

  3. I’m now validating the full set of changes in my PR that enable fine-tuning EAGLE(-3) models end-to-end (the earlier config.json/RoPE mismatch should be the last issue). I could use guidance on which tests you’d consider sufficient/most relevant for this PR. btw, I don't know why, but running tox locally is currently difficult because downloading model weights takes hours on my setup...

have a nice week end

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

eagle3 enhancement New feature or request training

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants