Gemma 2 NeMo 2.0 to HF conversion bug #11951

domenVres · 2025-01-24T16:56:04Z

Describe the bug

HFGemmaExporter has wrong mapping of "decoder.layers.*.mlp.linear_fc1.layer_norm_weight" (it maps to "model.layers.*.post_attention_layernorm.weight" instead of "model.layers.*.pre_feedforward_layernorm.weight". As a result, pre-feedforward layer norm is not transferred (and is set to 0). This causes significantly worse performance for converted HF models.

Steps/Code to reproduce bug

Run Gemma2 NeMo 2.0 export to HF:

Additionally, I modified apply function of the exporter to following to show the problem explicitly:

def apply(self, output_path: Path) -> Path:
        target = self.init()

        source, _ = self.nemo_load(str(self))
        print("Source state (Post FC 2 layernorm) dict:", [value for key, value in source.module.state_dict().items() if key.endswith("mlp.linear_fc2.post_layernorm.weight")])
        print("Source state (FC 1 layernorm) dict:", [value for key, value in source.module.state_dict().items() if key.endswith("mlp.linear_fc1.layer_norm_weight")])
        target = self.convert_state(source, target)
        print("Target state (Post feedforward layernorm) dict after conversion:", [value for key, value in target.state_dict().items() if key.endswith(".post_feedforward_layernorm.weight")])
        print("Target state (Pre feedforward layernorm) dict after conversion:", [value for key, value in target.state_dict().items() if key.endswith(".pre_feedforward_layernorm.weight")])

        target = target.cpu()
        target.save_pretrained(output_path)
        self.tokenizer.save_pretrained(output_path)

        return output_path

The output of the code above (for my continually pretrained version of Gemma 2 9B) is:

Source state (Post FC 2 layernorm) dict: [tensor([-0.4316,  0.5078, -0.2988,  ..., -0.4766, -0.3887, -0.4570]), ...
Source state (FC 1 layernorm) dict: [tensor([0.2451, 5.3438, 0.1875,  ..., 0.2676, 0.2598, 0.4219]), ...


Target state (Post feedforward layernorm) dict after conversion: [tensor([-0.4316,  0.5078, -0.2988,  ..., -0.4766, -0.3887, -0.4570]), ...
Target state (Pre feedforward layernorm) dict after conversion: [tensor([0., 0., 0.,  ..., 0., 0., 0.]), ...

Not the zeros in the target pre-feedforward layer norm.

Expected behavior

All model weights are converted correctly and HF model performs the same as NeMo 2.0 model.

Environment overview (please complete the following information)

I am using official NeMo container - version 24.12. I see that the issue is still present on main branch.

The text was updated successfully, but these errors were encountered:

domenVres added the bug Something isn't working label Jan 24, 2025

domenVres mentioned this issue Jan 24, 2025

Fix for Gemma 2 Pre-feedforward layernorm HF export #11952

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gemma 2 NeMo 2.0 to HF conversion bug #11951

Gemma 2 NeMo 2.0 to HF conversion bug #11951

domenVres commented Jan 24, 2025

Gemma 2 NeMo 2.0 to HF conversion bug #11951

Gemma 2 NeMo 2.0 to HF conversion bug #11951

Comments

domenVres commented Jan 24, 2025