(Draft) [Roadmap] DeepSpeed Roadmap Q2 2026

This is a draft roadmap for DeepSpeed Q2 2026. Feedback is welcome — please leave comments on this issue or join the `#2026q2-roadmap` channel on the [DeepSpeed Slack](https://join.slack.com/t/deepspeedworkspace/shared_invite/zt-3a8pjd8dd-PCj2hMvR4Y2syPwVnjEoww).

# New feature and enhancement

## AutoEP support

AutoEP enables Expert Parallelism (EP) for major Mixture-of-Experts (MoE) models out of the box, eliminating the need for users to write model-specific parallelization code. By automatically distributing expert layers across devices, AutoEP allows users to scale MoE training with minimal configuration changes.

A [prototype implementation](https://github.com/tohtana/DeepSpeedExamples/tree/tohtana/add_auto_ep/training/expert_parallel) has been validated on 8xH100, achieving ~5x throughput improvement over ZeRO-3 baselines. We will build on this work to extend AutoEP support to production readiness in Q2.

- [ ] **Convergence validation**: Verify training convergence matches non-EP baselines across latest MoE model architectures
- [ ] **Model coverage**: Add support for additional MoE architectures (e.g., Qwen-MoE)
- [ ] **ZeRO-3 support**: Extend AutoEP to work with ZeRO Stage 3
- [ ] **AutoTP integration**: Combine AutoEP with AutoTP for hybrid expert/tensor parallelism
- [ ] **Benchmarking**: Publish throughput, memory, and scaling efficiency numbers across model sizes and GPU counts
- [ ] **Universal Checkpoint support**: Enable saving and resuming from Universal Checkpoints with AutoEP


## AutoTP extension

AutoTP was significantly revamped in Q1 ([PR #7806](https://github.com/deepspeedai/DeepSpeed/pull/7806)), introducing a flexible, configuration-driven API for custom layer partitioning patterns. In Q2, we will extend this foundation to support a broader range of models and scales.

- [ ] **HuggingFace `tp_plan` support**: Leverage the `base_model_tp_plan` metadata provided by HuggingFace Transformers models to automatically derive partitioning configurations, enabling out-of-the-box TP for any model that ships with a tp_plan
- [ ] **Combination with AutoEP**: Support parallel folding for hybrid expert/tensor parallelism
- [ ] **Universal Checkpoint support**: Enable saving and resuming from Universal Checkpoints with AutoTP


## AutoSP Integration

AutoSP ([ICLR 2026](https://iclr.cc/virtual/2026/poster/10011914)) is a compiler-based approach that automatically applies sequence parallelism via DeepSpeed Ulysses, removing the need for manual partitioning of sequence dimensions.

- [ ] **Initial integration**: The initial PR (#7860) is ready
- [ ] **Model coverage**: Improve coverage for major model families (e.g., Qwen, Llama)
- [ ] **Multimodal model support**: Multimodal models involve significantly longer sequence lengths, making sequence parallelism critical for training efficiency ([blog post](https://www.anyscale.com/blog/30-faster-multimodal-ai-training-with-ray-and-disaggregated-hybrid)). However, existing frameworks such as Megatron-LM do not support sequence parallelism for ViT encoders, and manually implementing it requires substantial engineering effort. AutoSP aims to automate this, enabling DeepSpeed Ulysses-based sequence parallelism for multimodal architectures out of the box.


## Compiler Integration Enhancement (Optional)
- [ ] "DTensor mode" for less graph break and stable graph tracing
- [ ] DeepCompile enhancement
  - [ ] Support multi-stage optimization passes for PyTorch v2.9+
  - Compiler pass enhancement
    - [ ] AutoTP support
    - [ ] AutoEP support
  - [ ] AMD support

## New Accelerator Support (Q2)
- [ ] Planning (Scope, target accelerators)

## RL training specific Optimization for DeepSpeed-Inference 
- [ ] Systems Design, prototyping and benchmarking


# Stability (Q2)
- [ ] Performance regression test
- [ ] Enable nightly full test
  - [ ] CUDA
  - [ ] AMD 
  - [ ] Intel XPU
  - [ ] Intel Gaudi
  - [ ] NPU

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(Draft) [Roadmap] DeepSpeed Roadmap Q2 2026 #7861

New feature and enhancement

AutoEP support

AutoTP extension

AutoSP Integration

Compiler Integration Enhancement (Optional)

New Accelerator Support (Q2)

RL training specific Optimization for DeepSpeed-Inference

Stability (Q2)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

(Draft) [Roadmap] DeepSpeed Roadmap Q2 2026 #7861

Description

New feature and enhancement

AutoEP support

AutoTP extension

AutoSP Integration

Compiler Integration Enhancement (Optional)

New Accelerator Support (Q2)

RL training specific Optimization for DeepSpeed-Inference

Stability (Q2)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions