Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

additional auxiliary loss for moe? #12

Closed
maxin-cn opened this issue Oct 15, 2024 · 2 comments
Closed

additional auxiliary loss for moe? #12

maxin-cn opened this issue Oct 15, 2024 · 2 comments

Comments

@maxin-cn
Copy link

Thanks for the previous responses! The LongLLaVA model utilizes the Mixture of Experts (MoE), but it seems that MoE has a "winner-takes-all" issue (i.e., Expert Balancing). Was any additional regularization applied to MoE during model training?

@wangxidong06
Copy link
Contributor

Yes we use router_aux_loss_coef with 1e-3 and employ modeling.py in here

@maxin-cn
Copy link
Author

Thanks for your reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants