How to import qwen model from hf? #14480

RunjiaChen · 2025-08-14T05:01:56Z

RunjiaChen
Aug 14, 2025

Hello guys, I tried to import qwen from hf as follows:
`from nemo.collections import llm

llm.import_ckpt(model=llm.Qwen3Model(llm.Qwen3Config8B()), source='hf://Qwen/Qwen3-8B')`

I got the error message (pasted below).

From what i can tell when i diff the left and right side, it seems that qk layernorm is enabled on the left but not the right. There is a flag in the config, which i set as False, in the modified script below, but still get the same error. How do i solve this? Thank you so much!

`
from nemo.collections import llm

cfg = llm.Qwen3Config8B()
cfg.qk_layernorm=False
print(cfg.qk_layernorm)
llm.import_ckpt(model=llm.Qwen3Model(cfg), source='hf://Qwen/Qwen3-8B')
`

I ended up getting this error:
dtype mismatch between source and target state dicts. Left side is {'decoder.layers.0.input_layernorm.weight': torch.float32, 'decoder.layers.0.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.0.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.0.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.1.input_layernorm.weight': torch.float32, 'decoder.layers.1.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.1.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.1.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.2.input_layernorm.weight': torch.float32, 'decoder.layers.2.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.2.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.2.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.3.input_layernorm.weight': torch.float32, 'decoder.layers.3.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.3.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.3.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.4.input_layernorm.weight': torch.float32, 'decoder.layers.4.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.4.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.4.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.5.input_layernorm.weight': torch.float32, 'decoder.layers.5.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.5.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.5.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.6.input_layernorm.weight': torch.float32, 'decoder.layers.6.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.6.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.6.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.7.input_layernorm.weight': torch.float32, 'decoder.layers.7.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.7.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.7.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.8.input_layernorm.weight': torch.float32, 'decoder.layers.8.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.8.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.8.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.9.input_layernorm.weight': torch.float32, 'decoder.layers.9.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.9.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.9.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.10.input_layernorm.weight': torch.float32, 'decoder.layers.10.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.10.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.10.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.11.input_layernorm.weight': torch.float32, 'decoder.layers.11.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.11.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.11.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.12.input_layernorm.weight': torch.float32, 'decoder.layers.12.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.12.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.12.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.13.input_layernorm.weight': torch.float32, 'decoder.layers.13.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.13.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.13.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.14.input_layernorm.weight': torch.float32, 'decoder.layers.14.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.14.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.14.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.15.input_layernorm.weight': torch.float32, 'decoder.layers.15.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.15.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.15.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.16.input_layernorm.weight': torch.float32, 'decoder.layers.16.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.16.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.16.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.17.input_layernorm.weight': torch.float32, 'decoder.layers.17.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.17.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.17.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.18.input_layernorm.weight': torch.float32, 'decoder.layers.18.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.18.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.18.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.19.input_layernorm.weight': torch.float32, 'decoder.layers.19.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.19.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.19.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.20.input_layernorm.weight': torch.float32, 'decoder.layers.20.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.20.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.20.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.21.input_layernorm.weight': torch.float32, 'decoder.layers.21.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.21.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.21.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.22.input_layernorm.weight': torch.float32, 'decoder.layers.22.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.22.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.22.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.23.input_layernorm.weight': torch.float32, 'decoder.layers.23.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.23.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.23.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.24.input_layernorm.weight': torch.float32, 'decoder.layers.24.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.24.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.24.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.25.input_layernorm.weight': torch.float32, 'decoder.layers.25.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.25.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.25.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.26.input_layernorm.weight': torch.float32, 'decoder.layers.26.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.26.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.26.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.27.input_layernorm.weight': torch.float32, 'decoder.layers.27.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.27.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.27.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.28.input_layernorm.weight': torch.float32, 'decoder.layers.28.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.28.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.28.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.29.input_layernorm.weight': torch.float32, 'decoder.layers.29.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.29.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.29.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.30.input_layernorm.weight': torch.float32, 'decoder.layers.30.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.30.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.30.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.31.input_layernorm.weight': torch.float32, 'decoder.layers.31.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.31.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.31.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.32.input_layernorm.weight': torch.float32, 'decoder.layers.32.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.32.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.32.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.33.input_layernorm.weight': torch.float32, 'decoder.layers.33.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.33.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.33.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.34.input_layernorm.weight': torch.float32, 'decoder.layers.34.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.34.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.34.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.35.input_layernorm.weight': torch.float32, 'decoder.layers.35.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.35.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.35.pre_mlp_layernorm.weight': torch.float32, 'decoder.final_layernorm.weight': torch.float32}, Right side is {'decoder.layers.0.input_layernorm.weight': torch.float32, 'decoder.layers.0.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.1.input_layernorm.weight': torch.float32, 'decoder.layers.1.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.2.input_layernorm.weight': torch.float32, 'decoder.layers.2.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.3.input_layernorm.weight': torch.float32, 'decoder.layers.3.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.4.input_layernorm.weight': torch.float32, 'decoder.layers.4.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.5.input_layernorm.weight': torch.float32, 'decoder.layers.5.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.6.input_layernorm.weight': torch.float32, 'decoder.layers.6.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.7.input_layernorm.weight': torch.float32, 'decoder.layers.7.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.8.input_layernorm.weight': torch.float32, 'decoder.layers.8.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.9.input_layernorm.weight': torch.float32, 'decoder.layers.9.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.10.input_layernorm.weight': torch.float32, 'decoder.layers.10.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.11.input_layernorm.weight': torch.float32, 'decoder.layers.11.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.12.input_layernorm.weight': torch.float32, 'decoder.layers.12.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.13.input_layernorm.weight': torch.float32, 'decoder.layers.13.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.14.input_layernorm.weight': torch.float32, 'decoder.layers.14.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.15.input_layernorm.weight': torch.float32, 'decoder.layers.15.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.16.input_layernorm.weight': torch.float32, 'decoder.layers.16.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.17.input_layernorm.weight': torch.float32, 'decoder.layers.17.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.18.input_layernorm.weight': torch.float32, 'decoder.layers.18.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.19.input_layernorm.weight': torch.float32, 'decoder.layers.19.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.20.input_layernorm.weight': torch.float32, 'decoder.layers.20.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.21.input_layernorm.weight': torch.float32, 'decoder.layers.21.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.22.input_layernorm.weight': torch.float32, 'decoder.layers.22.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.23.input_layernorm.weight': torch.float32, 'decoder.layers.23.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.24.input_layernorm.weight': torch.float32, 'decoder.layers.24.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.25.input_layernorm.weight': torch.float32, 'decoder.layers.25.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.26.input_layernorm.weight': torch.float32, 'decoder.layers.26.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.27.input_layernorm.weight': torch.float32, 'decoder.layers.27.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.28.input_layernorm.weight': torch.float32, 'decoder.layers.28.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.29.input_layernorm.weight': torch.float32, 'decoder.layers.29.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.30.input_layernorm.weight': torch.float32, 'decoder.layers.30.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.31.input_layernorm.weight': torch.float32, 'decoder.layers.31.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.32.input_layernorm.weight': torch.float32, 'decoder.layers.32.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.33.input_layernorm.weight': torch.float32, 'decoder.layers.33.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.34.input_layernorm.weight': torch.float32, 'decoder.layers.34.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.35.input_layernorm.weight': torch.float32, 'decoder.layers.35.pre_mlp_layernorm.weight': torch.float32}

gaojingwei · 2025-08-26T06:39:01Z

gaojingwei
Aug 26, 2025

you can find the error log in /opt/NeMo/nemo/lightning/io/state.py，then add this line to slave this problem.
_target = _target.to(dtype=torch.bfloat16)
for example:

or use this way in HFQwen3Importer:;apply() function，change torch_type to bf16：

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to import qwen model from hf? #14480

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to import qwen model from hf? #14480

Uh oh!

RunjiaChen Aug 14, 2025

Replies: 1 comment

Uh oh!

gaojingwei Aug 26, 2025

RunjiaChen
Aug 14, 2025

gaojingwei
Aug 26, 2025