Bridges for Mamba-based models #554

maanug-nv · 2025-09-04T17:08:12Z

Add two bridges for Mamba-based models:

MambaBridge which supports any MambaForCausalLM including:

state-spaces/mamba-130m-hf
state-spaces/mamba-370m-hf
state-spaces/mamba-790m-hf
state-spaces/mamba-1.4b-hf
state-spaces/mamba-2.8b-hf

NemotronHBridge which supports any NemotronHForCausalLM including:

nvidia/Nemotron-H-8B-Base-8K
nvidia/Nemotron-H-47B-Base-8K
nvidia/Nemotron-H-56B-Base-8K
nvidia/NVIDIA-Nemotron-Nano-9B-v2
nvidia/NVIDIA-Nemotron-Nano-12B-v2

Signed-off-by: Maanu Grover <[email protected]>

ananthsub · 2025-09-04T17:55:15Z

src/megatron/bridge/models/mamba/mamba_bridge.py

+class PrunedVocabMapping(AutoMapping):
+    """
+    Smart mapping like AutoMapping that additionally prunes vocab padding.
+
+    Intended for embedding and output layers.
+    """
+
+    def megatron_to_hf(
+        self,
+        megatron_weights: Optional[torch.Tensor],
+        megatron_module: Optional[nn.Module],
+    ) -> dict[str, torch.Tensor]:
+        """Prune padding from weight in vocab size dimension, if vocab size is accessible."""
+        mapping = super().megatron_to_hf(megatron_weights, megatron_module)
+
+        if megatron_module is not None:
+            weight = mapping[str(self.hf_param)]
+            mapping[str(self.hf_param)] = weight[: megatron_module.vocab_size, :]
+
+        return mapping


should this be a generic transform made available here? https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/models/conversion/param_mapping.py

Was discussing this with @yaoyu-33 . In Nemo 2.0, this pruning is only done for the NemotronH exporter: https://github.com/NVIDIA-NeMo/NeMo/blob/main/nemo/collections/llm/gpt/model/ssm.py#L817-L850

I'm not sure why this wasn't needed for other models.
I think @JRD971000 mentioned that the NemotronH checkpoints were saved with vocab padding so this was a choice made when writing exporters specifically for NemotronH.

Signed-off-by: Maanu Grover <[email protected]>

maanug-nv · 2025-09-04T22:35:20Z

Needs #440 to be merged first, for HF remote model support.

Signed-off-by: Maanu Grover <[email protected]>

maanug-nv · 2025-09-05T19:39:07Z

Closes #500 and #149

maanug-nv added 3 commits September 2, 2025 17:05

impl mapping registry for mamba bridge

c16c7ef

Signed-off-by: Maanu Grover <[email protected]>

add nemotronh bridge

0011be4

Signed-off-by: Maanu Grover <[email protected]>

update mamba bridge

1aacd62

Signed-off-by: Maanu Grover <[email protected]>

copy-pr-bot bot had a problem deploying to nemo-ci September 4, 2025 17:08 Failure

copy-pr-bot bot temporarily deployed to nemo-ci September 4, 2025 17:08 Inactive

maanug-nv requested review from ananthsub, yaoyu-33 and JRD971000 September 4, 2025 17:08

copy-pr-bot bot temporarily deployed to test September 4, 2025 17:08 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci September 4, 2025 17:08 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci September 4, 2025 17:18 Inactive

ananthsub reviewed Sep 4, 2025

View reviewed changes

maanug-nv added 2 commits September 4, 2025 15:27

add functional tests

d3f63e9

Signed-off-by: Maanu Grover <[email protected]>

fixes for remote code

356b8d8

Signed-off-by: Maanu Grover <[email protected]>

copy-pr-bot bot temporarily deployed to nemo-ci September 4, 2025 22:33 Inactive

copy-pr-bot bot temporarily deployed to test September 4, 2025 22:33 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci September 4, 2025 22:34 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci September 4, 2025 22:53 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci September 4, 2025 22:53 Failure

add unit tests

2ee5031

Signed-off-by: Maanu Grover <[email protected]>

copy-pr-bot bot temporarily deployed to nemo-ci September 5, 2025 18:39 Inactive

copy-pr-bot bot temporarily deployed to test September 5, 2025 18:39 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci September 5, 2025 18:39 Error

dtype test fix

35e91e1

Signed-off-by: Maanu Grover <[email protected]>

copy-pr-bot bot temporarily deployed to nemo-ci September 5, 2025 18:47 Inactive

copy-pr-bot bot temporarily deployed to test September 5, 2025 18:47 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci September 5, 2025 18:47 Error

fix fixture

f1fee94

Signed-off-by: Maanu Grover <[email protected]>

copy-pr-bot bot temporarily deployed to nemo-ci September 5, 2025 18:56 Inactive

copy-pr-bot bot temporarily deployed to test September 5, 2025 18:56 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci September 5, 2025 18:56 Failure

maanug-nv mentioned this pull request Sep 5, 2025

Bidirectional converters for Nano v2 #500

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bridges for Mamba-based models #554

Bridges for Mamba-based models #554

Uh oh!

maanug-nv commented Sep 4, 2025

Uh oh!

ananthsub Sep 4, 2025

Uh oh!

maanug-nv Sep 4, 2025

Uh oh!

maanug-nv commented Sep 4, 2025

Uh oh!

maanug-nv commented Sep 5, 2025

Uh oh!

Uh oh!

Bridges for Mamba-based models #554

Are you sure you want to change the base?

Bridges for Mamba-based models #554

Uh oh!

Conversation

maanug-nv commented Sep 4, 2025

Uh oh!

ananthsub Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

maanug-nv Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

maanug-nv commented Sep 4, 2025

Uh oh!

maanug-nv commented Sep 5, 2025

Uh oh!

Uh oh!