Skip to content

InternLM/xtuner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation



GitHub Repo stars license PyPI Downloads issue resolution open issues

๐Ÿ‘‹ join us on Static Badge Static Badge Static Badge

๐Ÿ” Explore our models on Static Badge Static Badge Static Badge Static Badge

English | ็ฎ€ไฝ“ไธญๆ–‡

๐Ÿš€ Speed Benchmark

๐ŸŽ‰ News

  • [2025/09] XTuner V1 Released! A Next-Generation Training Engine Built for Ultra-Large MoE Models

๐Ÿ“– XTuner V1

XTuner V1 is a next-generation LLM training engine specifically designed for ultra-large-scale MoE models. Unlike traditional 3D parallel training architectures, XTuner V1 is optimized for the mainstream MoE training scenarios prevalent in today's academic research.

Key Features

๐Ÿ“Š Dropless Training

  • Scalable without complexity: Train 200B-scale MoE models without expert parallelism; 600B models require only intra-node expert parallelism
  • Optimized parallelism strategy: Smaller expert parallelism dimension compared to traditional 3D approaches, enabling more efficient Dropless training

๐Ÿ“ Long Sequence Support

  • Memory-efficient design: Train 200B MoE models on 64k sequence lengths without sequence parallelism through advanced memory optimization techniques
  • Flexible scaling: Full support for DeepSpeed Ulysses sequence parallelism with linearly scalable maximum sequence length
  • Robust performance: Maintains stability despite expert load imbalance during long sequence training

โšก Superior Efficiency

  • Massive scale: Supports MoE training up to 1T parameters
  • Breakthrough performance: First to achieve FSDP training throughput that surpasses traditional 3D parallel schemes for MoE models above 200B scale
  • Hardware optimization: Achieves training efficiency on Ascend A3 Supernode that exceeds NVIDIA H800

๐Ÿ”ฅ Roadmap

XTuner V1 is committed to continuously improving training efficiency for pre-training, instruction fine-tuning, and reinforcement learning of ultra-large MoE models, with special focus on Ascend NPU optimization.

๐Ÿš€ Training Engine

Our vision is to establish XTuner V1 as a versatile training backend that seamlessly integrates with the broader open-source ecosystem.

Model GPU(FP8) GPU(BF16) NPU(BF16)
Intern S1 โœ… โœ… โœ…
Intern VL โœ… โœ… โœ…
Qwen3 Dense โœ… โœ… โœ…
Qwen3 MoE โœ… โœ… โœ…
GPT OSS โœ… โœ… ๐Ÿšง
Deepseek V3 โœ… โœ… ๐Ÿšง
KIMI K2 โœ… โœ… ๐Ÿšง

๐Ÿง  Algorithm

The algorithm component is actively evolving. We welcome community contributions - with XTuner V1, scale your algorithms to unprecedented sizes!

Implemented

  • โœ… Multimodal Pre-training - Full support for vision-language model training
  • โœ… Multimodal Supervised Fine-tuning - Optimized for instruction following
  • โœ… GRPO - Group Relative Policy Optimization

Coming Soon

  • ๐Ÿ”„ MPO - Mixed Preference Optimization
  • ๐Ÿ”„ DAPO - Dynamic Sampling Policy Optimization
  • ๐Ÿ”„ Multi-turn Agentic RL - Advanced agent training capabilities

โšก Inference Engine Integration

Seamless deployment with leading inference frameworks:

  • LMDeploy
  • vLLM
  • SGLang

Data Preparation

  • You can use GraphGen to create synthetic data for fine-tuning.

๐Ÿค Contributing

We appreciate all contributions to XTuner. Please refer to CONTRIBUTING.md for the contributing guideline.

๐Ÿ™ Acknowledgement

The development of XTuner V1's training engine has been greatly inspired by and built upon the excellent work of the open-source community. We extend our sincere gratitude to the following pioneering projects:

Training Engine:

  • Torchtitan - A PyTorch native platform for training generative AI models
  • Deepspeed - Microsoft's deep learning optimization library
  • MindSpeed - Ascend's high-performance training acceleration library
  • Megatron - NVIDIA's large-scale transformer training framework

Reinforcement Learning:

XTuner V1's reinforcement learning capabilities have been enhanced through insights and best practices from:

  • veRL - Volcano Engine Reinforcement Learning for LLMs
  • SLIME - THU's scalable RLHF implementation
  • AReal - Ant Reasoning Reinforcement Learning for LLMs
  • OpenRLHF - An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray

We are deeply grateful to all contributors and maintainers of these projects for advancing the field of large-scale model training.

๐Ÿ–Š๏ธ Citation

@misc{2023xtuner,
    title={XTuner: A Toolkit for Efficiently Fine-tuning LLM},
    author={XTuner Contributors},
    howpublished = {\url{https://github.com/InternLM/xtuner}},
    year={2023}
}

License

This project is released under the Apache License 2.0. Please also adhere to the Licenses of models and datasets being used.