- [2025/09] XTuner V1 Released! A Next-Generation Training Engine Built for Ultra-Large MoE Models
XTuner V1 is a next-generation LLM training engine specifically designed for ultra-large-scale MoE models. Unlike traditional 3D parallel training architectures, XTuner V1 is optimized for the mainstream MoE training scenarios prevalent in today's academic research.
๐ Dropless Training
- Scalable without complexity: Train 200B-scale MoE models without expert parallelism; 600B models require only intra-node expert parallelism
- Optimized parallelism strategy: Smaller expert parallelism dimension compared to traditional 3D approaches, enabling more efficient Dropless training
๐ Long Sequence Support
- Memory-efficient design: Train 200B MoE models on 64k sequence lengths without sequence parallelism through advanced memory optimization techniques
- Flexible scaling: Full support for DeepSpeed Ulysses sequence parallelism with linearly scalable maximum sequence length
- Robust performance: Maintains stability despite expert load imbalance during long sequence training
โก Superior Efficiency
- Massive scale: Supports MoE training up to 1T parameters
- Breakthrough performance: First to achieve FSDP training throughput that surpasses traditional 3D parallel schemes for MoE models above 200B scale
- Hardware optimization: Achieves training efficiency on Ascend A3 Supernode that exceeds NVIDIA H800
XTuner V1 is committed to continuously improving training efficiency for pre-training, instruction fine-tuning, and reinforcement learning of ultra-large MoE models, with special focus on Ascend NPU optimization.
Our vision is to establish XTuner V1 as a versatile training backend that seamlessly integrates with the broader open-source ecosystem.
Model | GPU(FP8) | GPU(BF16) | NPU(BF16) |
---|---|---|---|
Intern S1 | โ | โ | โ |
Intern VL | โ | โ | โ |
Qwen3 Dense | โ | โ | โ |
Qwen3 MoE | โ | โ | โ |
GPT OSS | โ | โ | ๐ง |
Deepseek V3 | โ | โ | ๐ง |
KIMI K2 | โ | โ | ๐ง |
The algorithm component is actively evolving. We welcome community contributions - with XTuner V1, scale your algorithms to unprecedented sizes!
Implemented
- โ Multimodal Pre-training - Full support for vision-language model training
- โ Multimodal Supervised Fine-tuning - Optimized for instruction following
- โ GRPO - Group Relative Policy Optimization
Coming Soon
- ๐ MPO - Mixed Preference Optimization
- ๐ DAPO - Dynamic Sampling Policy Optimization
- ๐ Multi-turn Agentic RL - Advanced agent training capabilities
Seamless deployment with leading inference frameworks:
- LMDeploy
- vLLM
- SGLang
- You can use GraphGen to create synthetic data for fine-tuning.
We appreciate all contributions to XTuner. Please refer to CONTRIBUTING.md for the contributing guideline.
The development of XTuner V1's training engine has been greatly inspired by and built upon the excellent work of the open-source community. We extend our sincere gratitude to the following pioneering projects:
Training Engine:
- Torchtitan - A PyTorch native platform for training generative AI models
- Deepspeed - Microsoft's deep learning optimization library
- MindSpeed - Ascend's high-performance training acceleration library
- Megatron - NVIDIA's large-scale transformer training framework
Reinforcement Learning:
XTuner V1's reinforcement learning capabilities have been enhanced through insights and best practices from:
- veRL - Volcano Engine Reinforcement Learning for LLMs
- SLIME - THU's scalable RLHF implementation
- AReal - Ant Reasoning Reinforcement Learning for LLMs
- OpenRLHF - An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray
We are deeply grateful to all contributors and maintainers of these projects for advancing the field of large-scale model training.
@misc{2023xtuner,
title={XTuner: A Toolkit for Efficiently Fine-tuning LLM},
author={XTuner Contributors},
howpublished = {\url{https://github.com/InternLM/xtuner}},
year={2023}
}
This project is released under the Apache License 2.0. Please also adhere to the Licenses of models and datasets being used.