Skip to content

Conversation

@shaltielshmid
Copy link
Contributor

What does this PR do ?

Fixes inconsistencies in MCoreBertModelWrapper which cause a crash using specific BertConfigs.

Collection: NLP

Changelog

  • Disable post_process when calling MCoreBert initializer from the wrapper
  • Added missing config.add_lm_head check.

Pre checks:

  • [v] Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • [v] Bugfix

Copy link
Collaborator

@suiyoubi suiyoubi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just curious under what scenario (what specific BERT config) does the original post_process crash in your case.

@shaltielshmid
Copy link
Contributor Author

Sure ! Here is the config:

@dataclass
class NeoBertMediumConfig(MegatronBertBaseConfig):
    """
    NeMo's BERT model variant adjusted to match NeoBERT implementation
    """
    num_layers: int = 28
    normalization: str = 'RMSNorm'
    layernorm_epsilon: float = 1e-6
    position_embedding_type: str = 'rope'
    seq_length: int = 4096
    gated_linear_unit: int = True
    activation_func: Callable = F.silu
    add_lm_head: bool = False
    share_embeddings_and_output_weights: bool = False
    add_bias_linear: bool = False
    add_qkv_bias: bool = False

The issue is caused because the LM-Head doesn't support the RMSNorm normalization, and when post_process isn't disabled, the super() class tries to create an LM-Head and throw a not supported exception.

@ericharper ericharper requested a review from suiyoubi August 28, 2025 20:06
@shaltielshmid
Copy link
Contributor Author

Hey - following up, is there anything waiting on me here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants