How to import qwen model from hf? #14480
Unanswered
RunjiaChen
asked this question in
Q&A
Replies: 1 comment
-
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello guys, I tried to import qwen from hf as follows:
`from nemo.collections import llm
llm.import_ckpt(model=llm.Qwen3Model(llm.Qwen3Config8B()), source='hf://Qwen/Qwen3-8B')`
I got the error message (pasted below).
From what i can tell when i diff the left and right side, it seems that qk layernorm is enabled on the left but not the right. There is a flag in the config, which i set as False, in the modified script below, but still get the same error. How do i solve this? Thank you so much!
`
from nemo.collections import llm
cfg = llm.Qwen3Config8B()
cfg.qk_layernorm=False
print(cfg.qk_layernorm)
llm.import_ckpt(model=llm.Qwen3Model(cfg), source='hf://Qwen/Qwen3-8B')
`
I ended up getting this error:
dtype mismatch between source and target state dicts. Left side is {'decoder.layers.0.input_layernorm.weight': torch.float32, 'decoder.layers.0.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.0.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.0.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.1.input_layernorm.weight': torch.float32, 'decoder.layers.1.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.1.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.1.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.2.input_layernorm.weight': torch.float32, 'decoder.layers.2.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.2.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.2.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.3.input_layernorm.weight': torch.float32, 'decoder.layers.3.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.3.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.3.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.4.input_layernorm.weight': torch.float32, 'decoder.layers.4.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.4.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.4.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.5.input_layernorm.weight': torch.float32, 'decoder.layers.5.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.5.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.5.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.6.input_layernorm.weight': torch.float32, 'decoder.layers.6.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.6.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.6.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.7.input_layernorm.weight': torch.float32, 'decoder.layers.7.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.7.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.7.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.8.input_layernorm.weight': torch.float32, 'decoder.layers.8.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.8.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.8.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.9.input_layernorm.weight': torch.float32, 'decoder.layers.9.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.9.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.9.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.10.input_layernorm.weight': torch.float32, 'decoder.layers.10.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.10.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.10.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.11.input_layernorm.weight': torch.float32, 'decoder.layers.11.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.11.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.11.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.12.input_layernorm.weight': torch.float32, 'decoder.layers.12.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.12.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.12.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.13.input_layernorm.weight': torch.float32, 'decoder.layers.13.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.13.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.13.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.14.input_layernorm.weight': torch.float32, 'decoder.layers.14.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.14.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.14.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.15.input_layernorm.weight': torch.float32, 'decoder.layers.15.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.15.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.15.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.16.input_layernorm.weight': torch.float32, 'decoder.layers.16.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.16.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.16.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.17.input_layernorm.weight': torch.float32, 'decoder.layers.17.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.17.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.17.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.18.input_layernorm.weight': torch.float32, 'decoder.layers.18.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.18.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.18.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.19.input_layernorm.weight': torch.float32, 'decoder.layers.19.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.19.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.19.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.20.input_layernorm.weight': torch.float32, 'decoder.layers.20.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.20.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.20.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.21.input_layernorm.weight': torch.float32, 'decoder.layers.21.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.21.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.21.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.22.input_layernorm.weight': torch.float32, 'decoder.layers.22.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.22.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.22.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.23.input_layernorm.weight': torch.float32, 'decoder.layers.23.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.23.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.23.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.24.input_layernorm.weight': torch.float32, 'decoder.layers.24.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.24.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.24.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.25.input_layernorm.weight': torch.float32, 'decoder.layers.25.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.25.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.25.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.26.input_layernorm.weight': torch.float32, 'decoder.layers.26.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.26.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.26.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.27.input_layernorm.weight': torch.float32, 'decoder.layers.27.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.27.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.27.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.28.input_layernorm.weight': torch.float32, 'decoder.layers.28.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.28.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.28.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.29.input_layernorm.weight': torch.float32, 'decoder.layers.29.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.29.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.29.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.30.input_layernorm.weight': torch.float32, 'decoder.layers.30.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.30.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.30.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.31.input_layernorm.weight': torch.float32, 'decoder.layers.31.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.31.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.31.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.32.input_layernorm.weight': torch.float32, 'decoder.layers.32.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.32.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.32.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.33.input_layernorm.weight': torch.float32, 'decoder.layers.33.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.33.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.33.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.34.input_layernorm.weight': torch.float32, 'decoder.layers.34.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.34.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.34.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.35.input_layernorm.weight': torch.float32, 'decoder.layers.35.self_attention.q_layernorm.weight': torch.float32, 'decoder.layers.35.self_attention.k_layernorm.weight': torch.float32, 'decoder.layers.35.pre_mlp_layernorm.weight': torch.float32, 'decoder.final_layernorm.weight': torch.float32}, Right side is {'decoder.layers.0.input_layernorm.weight': torch.float32, 'decoder.layers.0.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.1.input_layernorm.weight': torch.float32, 'decoder.layers.1.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.2.input_layernorm.weight': torch.float32, 'decoder.layers.2.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.3.input_layernorm.weight': torch.float32, 'decoder.layers.3.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.4.input_layernorm.weight': torch.float32, 'decoder.layers.4.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.5.input_layernorm.weight': torch.float32, 'decoder.layers.5.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.6.input_layernorm.weight': torch.float32, 'decoder.layers.6.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.7.input_layernorm.weight': torch.float32, 'decoder.layers.7.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.8.input_layernorm.weight': torch.float32, 'decoder.layers.8.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.9.input_layernorm.weight': torch.float32, 'decoder.layers.9.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.10.input_layernorm.weight': torch.float32, 'decoder.layers.10.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.11.input_layernorm.weight': torch.float32, 'decoder.layers.11.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.12.input_layernorm.weight': torch.float32, 'decoder.layers.12.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.13.input_layernorm.weight': torch.float32, 'decoder.layers.13.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.14.input_layernorm.weight': torch.float32, 'decoder.layers.14.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.15.input_layernorm.weight': torch.float32, 'decoder.layers.15.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.16.input_layernorm.weight': torch.float32, 'decoder.layers.16.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.17.input_layernorm.weight': torch.float32, 'decoder.layers.17.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.18.input_layernorm.weight': torch.float32, 'decoder.layers.18.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.19.input_layernorm.weight': torch.float32, 'decoder.layers.19.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.20.input_layernorm.weight': torch.float32, 'decoder.layers.20.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.21.input_layernorm.weight': torch.float32, 'decoder.layers.21.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.22.input_layernorm.weight': torch.float32, 'decoder.layers.22.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.23.input_layernorm.weight': torch.float32, 'decoder.layers.23.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.24.input_layernorm.weight': torch.float32, 'decoder.layers.24.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.25.input_layernorm.weight': torch.float32, 'decoder.layers.25.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.26.input_layernorm.weight': torch.float32, 'decoder.layers.26.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.27.input_layernorm.weight': torch.float32, 'decoder.layers.27.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.28.input_layernorm.weight': torch.float32, 'decoder.layers.28.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.29.input_layernorm.weight': torch.float32, 'decoder.layers.29.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.30.input_layernorm.weight': torch.float32, 'decoder.layers.30.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.31.input_layernorm.weight': torch.float32, 'decoder.layers.31.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.32.input_layernorm.weight': torch.float32, 'decoder.layers.32.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.33.input_layernorm.weight': torch.float32, 'decoder.layers.33.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.34.input_layernorm.weight': torch.float32, 'decoder.layers.34.pre_mlp_layernorm.weight': torch.float32, 'decoder.layers.35.input_layernorm.weight': torch.float32, 'decoder.layers.35.pre_mlp_layernorm.weight': torch.float32}
Beta Was this translation helpful? Give feedback.
All reactions