Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gemma 2 NeMo 2.0 to HF conversion bug #11951

Open
domenVres opened this issue Jan 24, 2025 · 0 comments
Open

Gemma 2 NeMo 2.0 to HF conversion bug #11951

domenVres opened this issue Jan 24, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@domenVres
Copy link

Describe the bug

HFGemmaExporter has wrong mapping of "decoder.layers.*.mlp.linear_fc1.layer_norm_weight" (it maps to "model.layers.*.post_attention_layernorm.weight" instead of "model.layers.*.pre_feedforward_layernorm.weight". As a result, pre-feedforward layer norm is not transferred (and is set to 0). This causes significantly worse performance for converted HF models.

Steps/Code to reproduce bug

Run Gemma2 NeMo 2.0 export to HF:

Additionally, I modified apply function of the exporter to following to show the problem explicitly:

def apply(self, output_path: Path) -> Path:
        target = self.init()

        source, _ = self.nemo_load(str(self))
        print("Source state (Post FC 2 layernorm) dict:", [value for key, value in source.module.state_dict().items() if key.endswith("mlp.linear_fc2.post_layernorm.weight")])
        print("Source state (FC 1 layernorm) dict:", [value for key, value in source.module.state_dict().items() if key.endswith("mlp.linear_fc1.layer_norm_weight")])
        target = self.convert_state(source, target)
        print("Target state (Post feedforward layernorm) dict after conversion:", [value for key, value in target.state_dict().items() if key.endswith(".post_feedforward_layernorm.weight")])
        print("Target state (Pre feedforward layernorm) dict after conversion:", [value for key, value in target.state_dict().items() if key.endswith(".pre_feedforward_layernorm.weight")])

        target = target.cpu()
        target.save_pretrained(output_path)
        self.tokenizer.save_pretrained(output_path)

        return output_path

The output of the code above (for my continually pretrained version of Gemma 2 9B) is:

Source state (Post FC 2 layernorm) dict: [tensor([-0.4316,  0.5078, -0.2988,  ..., -0.4766, -0.3887, -0.4570]), ...
Source state (FC 1 layernorm) dict: [tensor([0.2451, 5.3438, 0.1875,  ..., 0.2676, 0.2598, 0.4219]), ...


Target state (Post feedforward layernorm) dict after conversion: [tensor([-0.4316,  0.5078, -0.2988,  ..., -0.4766, -0.3887, -0.4570]), ...
Target state (Pre feedforward layernorm) dict after conversion: [tensor([0., 0., 0.,  ..., 0., 0., 0.]), ...

Not the zeros in the target pre-feedforward layer norm.

Expected behavior

All model weights are converted correctly and HF model performs the same as NeMo 2.0 model.

Environment overview (please complete the following information)

I am using official NeMo container - version 24.12. I see that the issue is still present on main branch.

@domenVres domenVres added the bug Something isn't working label Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant