-
Notifications
You must be signed in to change notification settings - Fork 16
Bridges for Mamba-based models #554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Maanu Grover <[email protected]>
Signed-off-by: Maanu Grover <[email protected]>
Signed-off-by: Maanu Grover <[email protected]>
class PrunedVocabMapping(AutoMapping): | ||
""" | ||
Smart mapping like AutoMapping that additionally prunes vocab padding. | ||
|
||
Intended for embedding and output layers. | ||
""" | ||
|
||
def megatron_to_hf( | ||
self, | ||
megatron_weights: Optional[torch.Tensor], | ||
megatron_module: Optional[nn.Module], | ||
) -> dict[str, torch.Tensor]: | ||
"""Prune padding from weight in vocab size dimension, if vocab size is accessible.""" | ||
mapping = super().megatron_to_hf(megatron_weights, megatron_module) | ||
|
||
if megatron_module is not None: | ||
weight = mapping[str(self.hf_param)] | ||
mapping[str(self.hf_param)] = weight[: megatron_module.vocab_size, :] | ||
|
||
return mapping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be a generic transform made available here? https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/models/conversion/param_mapping.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was discussing this with @yaoyu-33 . In Nemo 2.0, this pruning is only done for the NemotronH exporter: https://github.com/NVIDIA-NeMo/NeMo/blob/main/nemo/collections/llm/gpt/model/ssm.py#L817-L850
I'm not sure why this wasn't needed for other models.
I think @JRD971000 mentioned that the NemotronH checkpoints were saved with vocab padding so this was a choice made when writing exporters specifically for NemotronH.
Signed-off-by: Maanu Grover <[email protected]>
Signed-off-by: Maanu Grover <[email protected]>
Needs #440 to be merged first, for HF remote model support. |
Signed-off-by: Maanu Grover <[email protected]>
Signed-off-by: Maanu Grover <[email protected]>
Signed-off-by: Maanu Grover <[email protected]>
Add two bridges for Mamba-based models:
MambaBridge which supports any
MambaForCausalLM
including:NemotronHBridge which supports any
NemotronHForCausalLM
including: