additional auxiliary loss for moe? #12

maxin-cn · 2024-10-15T09:52:26Z

Thanks for the previous responses! The LongLLaVA model utilizes the Mixture of Experts (MoE), but it seems that MoE has a "winner-takes-all" issue (i.e., Expert Balancing). Was any additional regularization applied to MoE during model training?

wangxidong06 · 2024-10-15T10:36:47Z

Yes we use router_aux_loss_coef with 1e-3 and employ modeling.py in here

maxin-cn · 2024-10-15T11:14:45Z

Thanks for your reply!

wangxidong06 closed this as completed Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

additional auxiliary loss for moe? #12

additional auxiliary loss for moe? #12

maxin-cn commented Oct 15, 2024

wangxidong06 commented Oct 15, 2024

maxin-cn commented Oct 15, 2024

additional auxiliary loss for moe? #12

additional auxiliary loss for moe? #12

Comments

maxin-cn commented Oct 15, 2024

wangxidong06 commented Oct 15, 2024

maxin-cn commented Oct 15, 2024