You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
From the Deepseek V3, it appears that models that predict multiple future tokens can exhibit significantly greater sample efficiency than models trained only on next-token prediction, plus the extra token heads can be used to implement speculative decoding to speed up inference (up to 3X in their experiments), without the need for a draft model.
It would be amazing to see multi-token prediction implemented in NeMo, as it would allow the community to easily experiment with this promising technique.
The text was updated successfully, but these errors were encountered:
From the Deepseek V3, it appears that models that predict multiple future tokens can exhibit significantly greater sample efficiency than models trained only on next-token prediction, plus the extra token heads can be used to implement speculative decoding to speed up inference (up to 3X in their experiments), without the need for a draft model.
It would be amazing to see multi-token prediction implemented in NeMo, as it would allow the community to easily experiment with this promising technique.
The text was updated successfully, but these errors were encountered: