Skip to content

Conversation

maanug-nv
Copy link
Contributor

Add two bridges for Mamba-based models:

MambaBridge which supports any MambaForCausalLM including:

  • state-spaces/mamba-130m-hf
  • state-spaces/mamba-370m-hf
  • state-spaces/mamba-790m-hf
  • state-spaces/mamba-1.4b-hf
  • state-spaces/mamba-2.8b-hf

NemotronHBridge which supports any NemotronHForCausalLM including:

  • nvidia/Nemotron-H-8B-Base-8K
  • nvidia/Nemotron-H-47B-Base-8K
  • nvidia/Nemotron-H-56B-Base-8K
  • nvidia/NVIDIA-Nemotron-Nano-9B-v2
  • nvidia/NVIDIA-Nemotron-Nano-12B-v2

Signed-off-by: Maanu Grover <[email protected]>
Signed-off-by: Maanu Grover <[email protected]>
Comment on lines +30 to +49
class PrunedVocabMapping(AutoMapping):
"""
Smart mapping like AutoMapping that additionally prunes vocab padding.

Intended for embedding and output layers.
"""

def megatron_to_hf(
self,
megatron_weights: Optional[torch.Tensor],
megatron_module: Optional[nn.Module],
) -> dict[str, torch.Tensor]:
"""Prune padding from weight in vocab size dimension, if vocab size is accessible."""
mapping = super().megatron_to_hf(megatron_weights, megatron_module)

if megatron_module is not None:
weight = mapping[str(self.hf_param)]
mapping[str(self.hf_param)] = weight[: megatron_module.vocab_size, :]

return mapping
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was discussing this with @yaoyu-33 . In Nemo 2.0, this pruning is only done for the NemotronH exporter: https://github.com/NVIDIA-NeMo/NeMo/blob/main/nemo/collections/llm/gpt/model/ssm.py#L817-L850

I'm not sure why this wasn't needed for other models.
I think @JRD971000 mentioned that the NemotronH checkpoints were saved with vocab padding so this was a choice made when writing exporters specifically for NemotronH.

Signed-off-by: Maanu Grover <[email protected]>
Signed-off-by: Maanu Grover <[email protected]>
@maanug-nv
Copy link
Contributor Author

Needs #440 to be merged first, for HF remote model support.

Signed-off-by: Maanu Grover <[email protected]>
Signed-off-by: Maanu Grover <[email protected]>
Signed-off-by: Maanu Grover <[email protected]>
@maanug-nv
Copy link
Contributor Author

Closes #500 and #149

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants