Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mllama patch modifies nn.LayerNorm globally #315

Open
tyler-romero opened this issue Oct 19, 2024 · 3 comments
Open

mllama patch modifies nn.LayerNorm globally #315

tyler-romero opened this issue Oct 19, 2024 · 3 comments

Comments

@tyler-romero
Copy link
Collaborator

🐛 Describe the bug

Instead of only patching the transformers mllama module (transformers.models.mllama.modeling_mllama), apply_liger_kernel_to_mllama modifies torch.nn.LayerNorm globally.

The issue is here.

The fix would be to:
(1) Not patch LayerNorm in Liger by assigning to modeling_mllama.nn.LayerNorm
(2) Change transformers.models.mllama.modeling_mllama to not use from torch import nn and to instead just import layernorm like from torch.nn import LayerNorm
(3) instead patch layernorm in Liger by assigning to modeling_mllama.LayerNorm

Reproduce

pip install transformers==4.45 liger-kernel-nightly
from liger_kernel.transformers import apply_liger_kernel_to_mllama
from torch import nn

apply_liger_kernel_to_mllama()
print(nn.LayerNorm)
<class 'liger_kernel.transformers.layer_norm.LigerLayerNorm'>

Versions

Environment Report:

Operating System: Linux-6.1.85+-x86_64-with-glibc2.35
Python version: 3.10.12
PyTorch version: 2.4.1+cu121
CUDA version: Not available
Triton version: 3.1.0
Transformers version: 4.45.0

@ByronHsu
Copy link
Collaborator

curious does it require to change transformer source code? i think we can maybe raise a request

@tyler-romero
Copy link
Collaborator Author

Yeah the proposed fix would require a change to transformers unfortunately. How mllama was implemented differs very slightly from the conventions in other transformers modeling files.

@ByronHsu
Copy link
Collaborator

sounds good. let us try to send a PR there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants