Skip to content

Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Zeta

License

Notifications You must be signed in to change notification settings

kyegomez/MoE-Mamba

Repository files navigation

Multi-Modality

MoE Mamba

Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Zeta. The SwitchMoE architecture is from the Switch Transformer paper. And, I still need help with it. If you want to help please join the Agora discord and server and help in the MoE Mamba channel.

PAPER LINK

Install

pip install moe-mamba

Usage

MoEMambaBlock

import torch 
from moe_mamba import MoEMambaBlock

x = torch.randn(1, 10, 512)
model = MoEMambaBlock(
    dim=512,
    depth=6,
    d_state=128,
    expand=4,
    num_experts=4,
)
out = model(x)
print(out)

MoEMamba

import torch 
from moe_mamba.model import MoEMamba 


# Create a tensor of shape (1, 1024, 512)
x = torch.randint(0, 10000, (1, 512))

# Create a MoEMamba model
model = MoEMamba(
    num_tokens=10000,
    dim=512,
    depth=1,
    d_state=512,
    causal=True,
    shared_qk=True,
    exact_window_size=True,
    dim_head=64,
    m_expand=4,
    num_experts=4,
)

# Forward pass
out = model(x)

# Print the shape of the output tensor
print(out)

Code Quality 🧹

  • make style to format the code
  • make check_code_quality to check code quality (PEP8 basically)
  • black .
  • ruff . --fix

Citation

@misc{pióro2024moemamba,
    title={MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts}, 
    author={Maciej Pióro and Kamil Ciebiera and Krystian Król and Jan Ludziejewski and Sebastian Jaszczur},
    year={2024},
    eprint={2401.04081},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

License

MIT

About

Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Zeta

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

 

Packages

No packages published

Contributors 2

  •  
  •