Skip to content

qwen3_8b eagle3 model train and vllm ckpt export #628

@Jim2016713

Description

@Jim2016713

Before submitting an issue, please make sure it hasn't been already addressed by searching through the existing and past issues.

Describe the bug

Encountered such an issue when deploying the trained eagle3 model via vllm. vllm version is 0.11.2
(EngineCore_DP0 pid=37627) Traceback (most recent call last):
(EngineCore_DP0 pid=37627) File "/opt/miniconda/envs/vllm/lib/python3.10/multiprocessing/process.py", line 315, in _bootstrap
(EngineCore_DP0 pid=37627) self.run()
(EngineCore_DP0 pid=37627) File "/opt/miniconda/envs/vllm/lib/python3.10/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=37627) self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=37627) File "/opt/miniconda/envs/vllm/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 846, in run_engine_core
(EngineCore_DP0 pid=37627) raise e
(EngineCore_DP0 pid=37627) File "/opt/miniconda/envs/vllm/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 833, in run_engine_core
(EngineCore_DP0 pid=37627) engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=37627) File "/opt/miniconda/envs/vllm/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 606, in init
(EngineCore_DP0 pid=37627) super().init(
(EngineCore_DP0 pid=37627) File "/opt/miniconda/envs/vllm/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 102, in init
(EngineCore_DP0 pid=37627) self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=37627) File "/opt/miniconda/envs/vllm/lib/python3.10/site-packages/vllm/v1/executor/abstract.py", line 101, in init
(EngineCore_DP0 pid=37627) self._init_executor()
(EngineCore_DP0 pid=37627) File "/opt/miniconda/envs/vllm/lib/python3.10/site-packages/vllm/v1/executor/uniproc_executor.py", line 48, in _init_executor
(EngineCore_DP0 pid=37627) self.driver_worker.load_model()
(EngineCore_DP0 pid=37627) File "/opt/miniconda/envs/vllm/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 273, in load_model
(EngineCore_DP0 pid=37627) self.model_runner.load_model(eep_scale_up=eep_scale_up)
(EngineCore_DP0 pid=37627) File "/opt/miniconda/envs/vllm/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3285, in load_model
(EngineCore_DP0 pid=37627) self.drafter.load_model(self.model)
(EngineCore_DP0 pid=37627) File "/opt/miniconda/envs/vllm/lib/python3.10/site-packages/vllm/v1/spec_decode/eagle.py", line 940, in load_model
(EngineCore_DP0 pid=37627) self.model = get_model(
(EngineCore_DP0 pid=37627) File "/opt/miniconda/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/model_loader/init.py", line 130, in get_model
(EngineCore_DP0 pid=37627) return loader.load_model(vllm_config=vllm_config, model_config=model_config)
(EngineCore_DP0 pid=37627) File "/opt/miniconda/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/model_loader/base_loader.py", line 55, in load_model
(EngineCore_DP0 pid=37627) self.load_weights(model, model_config)
(EngineCore_DP0 pid=37627) File "/opt/miniconda/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/model_loader/default_loader.py", line 292, in load_weights
(EngineCore_DP0 pid=37627) loaded_weights = model.load_weights(
(EngineCore_DP0 pid=37627) File "/opt/miniconda/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/models/llama_eagle3.py", line 367, in load_weights
(EngineCore_DP0 pid=37627) loader.load_weights(model_weights.items())
(EngineCore_DP0 pid=37627) File "/opt/miniconda/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 332, in load_weights
(EngineCore_DP0 pid=37627) autoloaded_weights = set(self._load_module("", self.module, weights))
(EngineCore_DP0 pid=37627) File "/opt/miniconda/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 286, in _load_module
(EngineCore_DP0 pid=37627) yield from self._load_module(
(EngineCore_DP0 pid=37627) File "/opt/miniconda/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 259, in _load_module
(EngineCore_DP0 pid=37627) loaded_params = module_load_weights(weights)
(EngineCore_DP0 pid=37627) File "/opt/miniconda/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/models/llama_eagle3.py", line 251, in load_weights
(EngineCore_DP0 pid=37627) param = params_dict[name]
(EngineCore_DP0 pid=37627) KeyError: 'eagle_module.fc.weight'

Steps/Code to reproduce bug

  • ?

Expected behavior

Who can help?

  • ?

System information

  • Container used (if applicable): ?
  • OS (e.g., Ubuntu 22.04, CentOS 7, Windows 10): ?
  • CPU architecture (x86_64, aarch64): ?
  • GPU name (e.g. H100, A100, L40S): ?
  • GPU memory size: ?
  • Number of GPUs: ?
  • Library versions (if applicable):
    • Python: ?
    • ModelOpt version or commit hash: ?
    • CUDA: ?
    • PyTorch: ?
    • Transformers: ?
    • TensorRT-LLM: ?
    • ONNXRuntime: ?
    • TensorRT: ?
  • Any other details that may help: ?

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions