Implement multi-token prediction option for models #12133

zhaoyang-star · 2025-02-11T09:25:46Z

From the Deepseek V3, it appears that models that predict multiple future tokens can exhibit significantly greater sample efficiency than models trained only on next-token prediction, plus the extra token heads can be used to implement speculative decoding to speed up inference (up to 3X in their experiments), without the need for a draft model.

It would be amazing to see multi-token prediction implemented in NeMo, as it would allow the community to easily experiment with this promising technique.

zhaoyang-star assigned okuchaiev Feb 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement multi-token prediction option for models #12133

Implement multi-token prediction option for models #12133

zhaoyang-star commented Feb 11, 2025

Implement multi-token prediction option for models #12133

Implement multi-token prediction option for models #12133

Comments

zhaoyang-star commented Feb 11, 2025