MaxText

MaxText is a high performance, highly scalable, open-source LLM library and reference implementation written in pure Python/JAX and targeting Google Cloud TPUs and GPUs for training.

MaxText provides a library of high performance models to choose from, including Gemma, Llama, DeepSeek, Qwen, and Mistral. For each of these models, MaxText supports pre-training (up to tens of thousands of chips) and scalable post-training, with popular techniques like Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO, a type of Reinforcement Learning).

MaxText achieves high Model FLOPs Utilization (MFU) and tokens/second from single host to very large clusters while staying simple and largely "optimization-free" thanks to the power of JAX and the XLA compiler.

MaxText is the launching point for ambitious LLM projects both in research and production. We encourage you to start by experimenting with MaxText out of the box and then fork and modify MaxText to meet your needs.

Check out our Read The Docs site or directly Get Started with your first MaxText run. If you’re interested in Diffusion models (Wan 2.1, Flux, etc), see the MaxDiffusion repository in our AI Hypercomputer GitHub organization.

🔥 Latest news 🔥

[September 5, 2025] MaxText has moved to an src layout as part of RESTRUCTURE.md. For existing environments, please run pip install -e . from MaxText root.
[August 13, 2025] The Qwen3 2507 MoE family of models is now supported: MoEs: 235B Thinking & 280B Coder as well as existing dense models: 0.6B, 4B, 8B, 14B, and 32B.
[July 27, 2025] Updated TFLOPS/s calculation (PR) to account for causal attention, dividing the attention flops in half. Accounted for sliding window and chunked attention reduced attention flops in PR and PR. Changes impact large sequence configs, as explained in this doc
[July 16, 2025] We will be restructuring the MaxText repository for improved organization and clarity. Please review the proposed structure and provide feedback.
[July 11, 2025] Multi-Token Prediction (MTP) training support! Adds an auxiliary loss based on predicting multiple future tokens, inspired by DeepSeek-V3 paper, to enhance training efficiency.
[June 25, 2025] DeepSeek R1-0528 variant is now supported
[April 24, 2025] Llama 4 Maverick models are now supported

Use cases

MaxText provides a library of models and demonstrates how to perform pre-training or post-training with high performance and scale.

MaxText leverages JAX AI libraries and presents a cohesive and comprehensive demonstration of training at scale by using Flax (neural networks), Tunix (post-training), Orbax (checkpointing), Optax (optimization), and Grain (dataloading).

In addition to pure text-based LLMs, we also support multi-modal training with Gemma 3 and Llama 4 VLMs.

Pre-training

If you’re building models from scratch, MaxText can serve as a reference implementation for experimentation, ideation, and inspiration - just fork and modify MaxText to train your model, whether it’s a small dense model like Llama 8B, or a large MoE like DeepSeek-V3. Experiment with configs and model design to build the most efficient model on TPU or GPU.

MaxText provides opinionated implementations for how to achieve optimal performance across a wide variety of dimensions like sharding, quantization, and checkpointing.

Post-training

If you are post-training a model, whether it is proprietary or open source, MaxText provides a scalable framework using Tunix. For RL (like GRPO), we leverage vLLM for sampling and Pathways (soon) for multi-host.

Our goal is to provide a variety of models (dimension “a”) and techniques (dimension “b”), so you can easily explore (a) * (b) combinations and efficiently train the perfect model for your use case.

Check out these getting started guides:

SFT (Supervised Fine Tuning)
GRPO (Group Relative Policy Optimization)

Model library

MaxText aims to provide you with the best OSS models, whether as a reference implementation, or to post-train and then serve with vLLM.

Supported JAX models in MaxText

Google
- Gemma 3 (4B, 12B, 27B)
- Gemma 2 (2B, 9B, 27B)
- Gemma 1 (2B, 7B)
Alibaba
- Qwen 3 MoE 2507 (235B, 480B)
- Qwen 3 MoE (30B, 235B)
- Qwen 3 Dense (0.6B, 1.7B, 4B, 8B, 14B, 32B)
DeepSeek
- DeepSeek-V2 (16B, 236B)
- DeepSeek-V3 0528 (671B)
Meta
- Llama 4 Scout (109B) & Maverick (400B)
- Llama 3.3 70B, 3.1 (8B, 70B, 405B), 3.0 (8B, 70B, 405B)
- Llama 2 (7B, 13B, 70B)
Open AI
- GPT3 (52k, 6B, 22B, 175B)
Mistral
- Mixtral (8x7B, 8x22B)
- Mistral (7B)
Diffusion Models
- See MaxDiffusion (Wan 2.1, Flux, SDXL, etc)

Get involved

Please join our Discord Channel and if you have feedback, you can file a feature request, documentation request, or bug report here.

Name		Name	Last commit message	Last commit date
Latest commit History 3,377 Commits
.github		.github
.vscode		.vscode
assets		assets
benchmarks		benchmarks
docs		docs
end_to_end		end_to_end
pedagogical_examples		pedagogical_examples
src/MaxText		src/MaxText
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
AUTHORS		AUTHORS
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
PREFLIGHT.md		PREFLIGHT.md
README.md		README.md
RESTRUCTURE.md		RESTRUCTURE.md
code_style.sh		code_style.sh
docker_build_dependency_image.sh		docker_build_dependency_image.sh
docker_upload_runner.sh		docker_upload_runner.sh
download_dataset.sh		download_dataset.sh
gpu_multi_process_run.sh		gpu_multi_process_run.sh
maxtext_custom_wheels.Dockerfile		maxtext_custom_wheels.Dockerfile
maxtext_db_dependencies.Dockerfile		maxtext_db_dependencies.Dockerfile
maxtext_dependencies.Dockerfile		maxtext_dependencies.Dockerfile
maxtext_gpu_dependencies.Dockerfile		maxtext_gpu_dependencies.Dockerfile
maxtext_jax_ai_image.Dockerfile		maxtext_jax_ai_image.Dockerfile
maxtext_libtpu_path.Dockerfile		maxtext_libtpu_path.Dockerfile
maxtext_runner.Dockerfile		maxtext_runner.Dockerfile
multihost_job.py		multihost_job.py
multihost_runner.py		multihost_runner.py
preflight.sh		preflight.sh
pylintrc		pylintrc
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
requirements_docs.txt		requirements_docs.txt
requirements_with_jax_ai_image.txt		requirements_with_jax_ai_image.txt
requirements_with_jax_stable_stack_0_6_1_pipreqs.txt		requirements_with_jax_stable_stack_0_6_1_pipreqs.txt
rto_setup.sh		rto_setup.sh
setup.sh		setup.sh
setup_gcsfuse.sh		setup_gcsfuse.sh
setup_with_retries.sh		setup_with_retries.sh
unit_test_and_lint.sh		unit_test_and_lint.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MaxText

🔥 Latest news 🔥

Use cases

Pre-training

Post-training

Model library

Get involved

About

Uh oh!

Releases 8

Packages

Uh oh!

Contributors 158

Uh oh!

Languages

License

AI-Hypercomputer/maxtext

Folders and files

Latest commit

History

Repository files navigation

MaxText

🔥 Latest news 🔥

Use cases

Pre-training

Post-training

Model library

Get involved

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 8

Packages 0

Uh oh!

Contributors 158

Uh oh!

Languages

Packages