[FEAT][TRAINING] Add fine-tuning support for EAGLE3 from HF Hub by VincentG1234 · Pull Request #268 · vllm-project/speculators

VincentG1234 · 2026-01-31T12:31:19Z

Summary

Adds support for fine-tuning EAGLE3 models from pretrained checkpoints, enabling users to initialize training from existing models stored locally or on HuggingFace Hub.

Changes

Added --pretrained-model-path CLI argument to scripts/train.py
Implemented load_safetensors_state_dict() function supporting:
- Local single/sharded safetensors files
- Automatic download from HuggingFace Hub
Automatic extraction of d2t/t2d vocab mappings from pretrained models
Automatic derivation of draft_vocab_size from loaded mappings

Usage

Fine-tune from HuggingFace Hub:

python scripts/train.py
--verifier-name-or-path meta-llama/Llama-3.1-8B-Instruct
--pretrained-model-path RedHatAI/Llama-3.1-8B-Instruct-speculator.eagle3
--data-path ./new_data
--epochs 3
--lr 5e-5

fynnsu · 2026-02-02T15:05:21Z

Hi @VincentG1234, this is great, thank you for working on this (and opening the related issue)!

My main feedback at this stage is that it would be good to move the loading logic + t2d/d2t extraction logic into utility functions inside the speculators package. We actually have some similar code for loading from hf/local safetensor files already in src/speculators/utils/loading.py, so maybe this could be consolidated with what you've added (although our existing loading utils are mostly targeted at more focused on extracting a single tensor, like the lm head or token embedding).

Additionally, we will want at least one test that loads an existing checkpoint from HF before merging.

Let me know if I can help!

dsikka

Thank you for your contribution! Just out of curiosity, how much data was needed to fine tune to the llama3 draft model?

VincentG1234 · 2026-02-03T21:58:28Z

Thank you for your contribution! Just out of curiosity, how much data was needed to fine tune to the llama3 draft model?

I haven't conducted a full fine-tuning yet. My aim is to enhance the model in French while maintaining performance in English. I will share my results!

Hi @VincentG1234, this is great, thank you for working on this (and opening the related issue)!

My main feedback at this stage is that it would be good to move the loading logic + t2d/d2t extraction logic into utility functions inside the speculators package. We actually have some similar code for loading from hf/local safetensor files already in src/speculators/utils/loading.py, so maybe this could be consolidated with what you've added (although our existing loading utils are mostly targeted at more focused on extracting a single tensor, like the lm head or token embedding).

Additionally, we will want at least one test that loads an existing checkpoint from HF before merging.

Let me know if I can help!

Thank you for the feedback! I'll fix this very soon.

mergify · 2026-02-05T20:37:34Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @VincentG1234.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>

VincentG1234 · 2026-02-06T19:19:30Z

Hello guys @dsikka @fynnsu, just to keep you in touch. I come with good news:

I investigated the acceptance drop after a light fine-tuning of an EAGLE-3 1B draft model (I test with this one: RedHatAI/Llama-3.1-8B-Instruct-speculator.eagle3). Although the weights were nearly identical to the HF checkpoint (only fp-level differences), the issue was caused by a config.json mismatch between my local setup and the HF model. In particular, rope_theta (and related RoPE scaling parameters) differed significantly, which changed the verifier’s hidden states and broke draft–verifier alignment, leading to a large drop in mean acceptance length (2.6 to 1.9).
After aligning the RoPE configuration with the HF config, acceptance metrics returned to expected values. I will fix that in the code and I think we will be good !
I exported the loading_weights funtions in the right script as suggested
I’m now validating the full set of changes in my PR that enable fine-tuning EAGLE(-3) models end-to-end (the earlier config.json/RoPE mismatch should be the last issue). I could use guidance on which tests you’d consider sufficient/most relevant for this PR. btw, I don't know why, but running tox locally is currently difficult because downloading model weights takes hours on my setup...

have a nice week end

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>

VincentG1234 force-pushed the eagle-finetuning branch from ae09b3b to a36d8e9 Compare January 31, 2026 12:32

VincentG1234 mentioned this pull request Jan 31, 2026

Feature Request: Support warm-start/fine-tuning from pretrained EAGLE3 models #269

Open

dsikka reviewed Feb 2, 2026

View reviewed changes

dsikka added enhancement New feature or request eagle3 labels Feb 2, 2026

mergify bot added the needs-rebase label Feb 5, 2026

dsikka added the training label Feb 6, 2026

VincentG1234 force-pushed the eagle-finetuning branch from 82ee844 to 4c5de19 Compare February 6, 2026 14:16

VincentG1234 added 10 commits February 6, 2026 18:07

enable finetuning

d1d9c68

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>

allow download from HF

4a9fa84

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>

improve the logs

a154ca6

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>

withdraw debugging cli

1091338

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>

linting

4c5c791

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>

export functions from train.py to loading.py

2e791c9

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>

add tests for pretrained loading

396ffed

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>

fix keys in safetensor

2cfd9c2

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>

simplify the extraction of d2t and t2d

fa73d5b

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>

precommit-linting fix

f09a235

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>

VincentG1234 force-pushed the eagle-finetuning branch from 4c5de19 to f09a235 Compare February 6, 2026 17:17

mergify bot removed the needs-rebase label Feb 6, 2026

VincentG1234 added 3 commits February 7, 2026 13:49

keep the same config.json in the finetuning case

d4d1f46

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>

add simple_flex_attention for ft config

8e58df5

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>

delete the old missplaced test

8e50496

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT][TRAINING] Add fine-tuning support for EAGLE3 from HF Hub#268

[FEAT][TRAINING] Add fine-tuning support for EAGLE3 from HF Hub#268
VincentG1234 wants to merge 13 commits intovllm-project:mainfrom
VincentG1234:eagle-finetuning

VincentG1234 commented Jan 31, 2026

Uh oh!

fynnsu commented Feb 2, 2026

Uh oh!

dsikka left a comment

Uh oh!

VincentG1234 commented Feb 3, 2026

Uh oh!

mergify bot commented Feb 5, 2026

Uh oh!

VincentG1234 commented Feb 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

VincentG1234 commented Jan 31, 2026

Summary

Changes

Usage

Fine-tune from HuggingFace Hub:

Uh oh!

fynnsu commented Feb 2, 2026

Uh oh!

dsikka left a comment

Choose a reason for hiding this comment

Uh oh!

VincentG1234 commented Feb 3, 2026

Uh oh!

mergify bot commented Feb 5, 2026

Uh oh!

VincentG1234 commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

VincentG1234 commented Feb 6, 2026 •

edited

Loading