feat: Add HybridEP support for MoE expert parallelism by seonjinn · Pull Request #1942 · NVIDIA-NeMo/RL

seonjinn · 2026-02-13T19:37:01Z

Update DeepEP dependency to hybrid-ep branch for HybridEP support
- automodel, vllm, mcore dependency groups updated
Add HybridEP configuration options in _apply_moe_config():
- moe_flex_dispatcher_backend: Flex dispatcher backend (e.g., 'hybridep')
- moe_hybridep_num_sms: Number of SMs for HybridEP operations

Usage in config:
policy.megatron_cfg.moe_token_dispatcher_type=flex
policy.megatron_cfg.moe_flex_dispatcher_backend=hybridep
policy.megatron_cfg.moe_hybridep_num_sms=32

See: https://github.com/deepseek-ai/DeepEP/tree/hybrid-ep

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

Summary by CodeRabbit

New Features
- Added configuration support for additional Mixture of Experts (MoE) model parameters, including dispatcher backend and HybridEP settings.
Dependencies
- Updated DeepEP dependency reference to use the hybrid-ep branch across multiple dependency groups.

- Update DeepEP dependency to hybrid-ep branch for HybridEP support - automodel, vllm, mcore dependency groups updated - Add HybridEP configuration options in _apply_moe_config(): - moe_flex_dispatcher_backend: Flex dispatcher backend (e.g., 'hybridep') - moe_hybridep_num_sms: Number of SMs for HybridEP operations Usage in config: policy.megatron_cfg.moe_token_dispatcher_type=flex policy.megatron_cfg.moe_flex_dispatcher_backend=hybridep policy.megatron_cfg.moe_hybridep_num_sms=32 See: https://github.com/deepseek-ai/DeepEP/tree/hybrid-ep

coderabbitai · 2026-02-13T19:41:08Z

📝 Walkthrough

Walkthrough

The changes add two conditional configuration hooks to the MoE setup function for optional dispatcher and HybridEP parameters, and update the DeepEP dependency reference from a specific commit to the hybrid-ep branch across multiple dependency groups in the project configuration.

Changes

Cohort / File(s)	Summary
MoE Configuration Hooks `nemo_rl/models/megatron/setup.py`	Adds conditional assignments for `moe_flex_dispatcher_backend` and `moe_hybridep_num_sms` configuration parameters in the `_apply_moe_config` function.
Dependency Updates `pyproject.toml`	Updates DeepEP git dependency reference from commit `bfded34800dfec415b71503f8205181de90b2480` to the `hybrid-ep` branch across automodel, vllm, and mcore dependency groups with explanatory comments.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Merge Conflict Detection	⚠️ Warning	❌ Merge conflicts detected (238 files): ⚔️ `.gitmodules` (content) ⚔️ `3rdparty/Megatron-Bridge-workspace/Megatron-Bridge` (content) ⚔️ `3rdparty/Megatron-Bridge-workspace/setup.py` (content) ⚔️ `3rdparty/Megatron-LM-workspace/Megatron-LM` (content) ⚔️ `3rdparty/Megatron-LM-workspace/setup.py` (content) ⚔️ `CODING_GUIDELINES.md` (content) ⚔️ `README.md` (content) ⚔️ `docker/Dockerfile` (content) ⚔️ `docs/about/algorithms/dapo.md` (content) ⚔️ `docs/about/algorithms/grpo.md` (content) ⚔️ `docs/about/algorithms/on-policy-distillation.md` (content) ⚔️ `docs/about/installation.md` (content) ⚔️ `docs/about/performance-summary.md` (content) ⚔️ `docs/about/quick-start.md` (content) ⚔️ `docs/cluster.md` (content) ⚔️ `docs/conf.py` (content) ⚔️ `docs/design-docs/dependency-management.md` (content) ⚔️ `docs/design-docs/fsdp2-parallel-plan.md` (content) ⚔️ `docs/docker.md` (content) ⚔️ `docs/guides/dapo.md` (content) ⚔️ `docs/guides/dpo.md` (content) ⚔️ `docs/guides/dtensor-tp-accuracy.md` (content) ⚔️ `docs/guides/environments.md` (content) ⚔️ `docs/guides/ft-launcher-guide.md` (content) ⚔️ `docs/guides/grpo-deepscaler.md` (content) ⚔️ `docs/guides/grpo.md` (content) ⚔️ `docs/guides/rm.md` (content) ⚔️ `docs/guides/sft.md` (content) ⚔️ `docs/guides/use-custom-vllm.md` (content) ⚔️ `docs/index.md` (content) ⚔️ `docs/local-workstation.md` (content) ⚔️ `docs/nsys-profiling.md` (content) ⚔️ `docs/testing.md` (content) ⚔️ `docs/versions1.json` (content) ⚔️ `examples/configs/distillation_math.yaml` (content) ⚔️ `examples/configs/distillation_math_megatron.yaml` (content) ⚔️ `examples/configs/dpo.yaml` (content) ⚔️ `examples/configs/grpo_math_1B.yaml` (content) ⚔️ `examples/configs/grpo_math_1B_megatron.yaml` (content) ⚔️ `examples/configs/grpo_math_70B_megatron.yaml` (content) ⚔️ `examples/configs/recipes/llm/dpo-llama3.1-8b-tulu3-1n8g-fsdp2tp1.yaml` (content) ⚔️ `examples/configs/recipes/llm/grpo-gemma3-27b-it-8n8g-fsdp2tp8-actckpt-long.yaml` (content) ⚔️ `examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-2n8g-megatron-fp8-e2e.yaml` (content) ⚔️ `examples/configs/recipes/llm/grpo-moonlight-16ba3b-4n8g-megatron-fp8-e2e.yaml` (content) ⚔️ `examples/configs/recipes/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8-actckpt-long.v3.yaml` (content) ⚔️ `examples/configs/recipes/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8-actckpt.v3.yaml` (content) ⚔️ `examples/configs/recipes/llm/grpo-qwen2.5-7b-instruct-4n8g-fsdp2tp4.v3.yaml` (content) ⚔️ `examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.yaml` (content) ⚔️ `examples/configs/recipes/llm/sft-nanov3-30BA3B-2n8g-fsdp2-lora.yaml` (content) ⚔️ `examples/configs/rm.yaml` (content) ⚔️ `examples/configs/sft.yaml` (content) ⚔️ `examples/configs/sft_openmathinstruct2.yaml` (content) ⚔️ `examples/configs/sft_openmathinstruct2_megatron.yaml` (content) ⚔️ `examples/configs/vlm_grpo_3B.yaml` (content) ⚔️ `examples/configs/vlm_grpo_3B_megatron.yaml` (content) ⚔️ `examples/nemo_gym/grpo_workplace_assistant_nemotron_nano_v2_9b.yaml` (content) ⚔️ `examples/nemo_gym/run_grpo_nemo_gym.py` (content) ⚔️ `examples/run_dpo.py` (content) ⚔️ `examples/run_grpo.py` (content) ⚔️ `examples/run_grpo_sliding_puzzle.py` (content) ⚔️ `examples/run_rm.py` (content) ⚔️ `examples/run_sft.py` (content) ⚔️ `examples/run_vlm_grpo.py` (content) ⚔️ `nemo_rl/algorithms/distillation.py` (content) ⚔️ `nemo_rl/algorithms/dpo.py` (content) ⚔️ `nemo_rl/algorithms/grpo.py` (content) ⚔️ `nemo_rl/algorithms/loss_functions.py` (content) ⚔️ `nemo_rl/algorithms/reward_functions.py` (content) ⚔️ `nemo_rl/algorithms/rm.py` (content) ⚔️ `nemo_rl/algorithms/sft.py` (content) ⚔️ `nemo_rl/data/__init__.py` (content) ⚔️ `nemo_rl/data/collate_fn.py` (content) ⚔️ `nemo_rl/data/datasets/preference_datasets/__init__.py` (content) ⚔️ `nemo_rl/data/datasets/preference_datasets/binary_preference_dataset.py` (content) ⚔️ `nemo_rl/data/datasets/preference_datasets/helpsteer3.py` (content) ⚔️ `nemo_rl/data/datasets/preference_datasets/preference_dataset.py` (content) ⚔️ `nemo_rl/data/datasets/preference_datasets/tulu3.py` (content) ⚔️ `nemo_rl/data/datasets/processed_dataset.py` (content) ⚔️ `nemo_rl/data/datasets/raw_dataset.py` (content) ⚔️ `nemo_rl/data/datasets/response_datasets/__init__.py` (content) ⚔️ `nemo_rl/data/datasets/response_datasets/oai_format_dataset.py` (content) ⚔️ `nemo_rl/data/datasets/response_datasets/response_dataset.py` (content) ⚔️ `nemo_rl/data/datasets/utils.py` (content) ⚔️ `nemo_rl/data/interfaces.py` (content) ⚔️ `nemo_rl/data/processors.py` (content) ⚔️ `nemo_rl/data/utils.py` (content) ⚔️ `nemo_rl/environments/nemo_gym.py` (content) ⚔️ `nemo_rl/environments/utils.py` (content) ⚔️ `nemo_rl/experience/rollouts.py` (content) ⚔️ `nemo_rl/models/automodel/data.py` (content) ⚔️ `nemo_rl/models/generation/__init__.py` (content) ⚔️ `nemo_rl/models/generation/interfaces.py` (content) ⚔️ `nemo_rl/models/generation/vllm/utils.py` (content) ⚔️ `nemo_rl/models/generation/vllm/vllm_generation.py` (content) ⚔️ `nemo_rl/models/generation/vllm/vllm_worker.py` (content) ⚔️ `nemo_rl/models/generation/vllm/vllm_worker_async.py` (content) ⚔️ `nemo_rl/models/megatron/config.py` (content) ⚔️ `nemo_rl/models/megatron/data.py` (content) ⚔️ `nemo_rl/models/megatron/setup.py` (content) ⚔️ `nemo_rl/models/policy/lm_policy.py` (content) ⚔️ `nemo_rl/models/policy/workers/dtensor_policy_worker.py` (content) ⚔️ `nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py` (content) ⚔️ `nemo_rl/models/policy/workers/megatron_policy_worker.py` (content) ⚔️ `nemo_rl/utils/checkpoint.py` (content) ⚔️ `nemo_rl/utils/config.py` (content) ⚔️ `nemo_rl/utils/logger.py` (content) ⚔️ `nemo_rl/utils/native_checkpoint.py` (content) ⚔️ `nemo_rl/utils/prefetch_venvs.py` (content) ⚔️ `pyproject.toml` (content) ⚔️ `pyrefly.toml` (content) ⚔️ `research/template_project/configs/grpo_math_1B.yaml` (content) ⚔️ `research/template_project/single_update.py` (content) ⚔️ `tests/check_metrics.py` (content) ⚔️ `tests/functional/L1_Functional_Tests_GPU.sh` (content) ⚔️ `tests/functional/distillation.sh` (content) ⚔️ `tests/functional/distillation_megatron.sh` (content) ⚔️ `tests/functional/dpo_megatron.sh` (content) ⚔️ `tests/functional/grpo.sh` (content) ⚔️ `tests/functional/grpo_async.sh` (content) ⚔️ `tests/functional/grpo_automodel_lora.sh` (content) ⚔️ `tests/functional/grpo_automodel_lora_async.sh` (content) ⚔️ `tests/functional/grpo_automodel_lora_non_colocated.sh` (content) ⚔️ `tests/functional/grpo_frozen_env.sh` (content) ⚔️ `tests/functional/grpo_megatron.sh` (content) ⚔️ `tests/functional/grpo_megatron_generation.sh` (content) ⚔️ `tests/functional/grpo_non_colocated.sh` (content) ⚔️ `tests/functional/grpo_rm_env.sh` (content) ⚔️ `tests/functional/grpo_sglang.sh` (content) ⚔️ `tests/functional/test_converter_roundtrip.py` (content) ⚔️ `tests/test_suites/llm/dapo-qwen2.5-7b-16n4g-fsdp2cp2.sh` (content) ⚔️ `tests/test_suites/llm/dapo-qwen2.5-7b.sh` (content) ⚔️ `tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n4g-fsdp2tp1.v1.sh` (content) ⚔️ `tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n4g-megatron-tp1pp2cp2-pack.sh` (content) ⚔️ `tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-fsdp2tp1.v1.sh` (content) ⚔️ `tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh` (content) ⚔️ `tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-1n8g-fsdp2tp2-dynamicbatch.v1.sh` (content) ⚔️ `tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n4g-fsdp2tp1-long.v1.sh` (content) ⚔️ `tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.sh` (content) ⚔️ `tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-seqpack.v1.sh` (content) ⚔️ `tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp8-noncolocated.v1.sh` (content) ⚔️ `tests/test_suites/llm/grpo-dapomath17k-dsv3-32n4g-megatron.sh` (content) ⚔️ `tests/test_suites/llm/grpo-dapomath17k-dsv3-megatron.sh` (content) ⚔️ `tests/test_suites/llm/grpo-deepscaler-1.5b-16K.sh` (content) ⚔️ `tests/test_suites/llm/grpo-deepscaler-1.5b-1n4g-8K.sh` (content) ⚔️ `tests/test_suites/llm/grpo-deepscaler-1.5b-24K.sh` (content) ⚔️ `tests/test_suites/llm/grpo-deepscaler-1.5b-8K.sh` (content) ⚔️ `tests/test_suites/llm/grpo-gemma3-1b-it-1n4g-fsdp2tp1.sh` (content) ⚔️ `tests/test_suites/llm/grpo-gemma3-1b-it-1n8g-fsdp2tp1.sh` (content) ⚔️ `tests/test_suites/llm/grpo-gemma3-27b-it-8n4g-fsdp2tp4-actckpt-long.sh` (content) ⚔️ `tests/test_suites/llm/grpo-gemma3-27b-it-8n8g-fsdp2tp8-actckpt-long.sh` (content) ⚔️ `tests/test_suites/llm/grpo-gptoss-20b-8n4g-megatron.sh` (content) ⚔️ `tests/test_suites/llm/grpo-gptoss-20b-8n8g-megatron.sh` (content) ⚔️ `tests/test_suites/llm/grpo-gspo-deepscaler-1.5b-8K.sh` (content) ⚔️ `tests/test_suites/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-rollouts.v3.sh` (content) ⚔️ `tests/test_suites/llm/grpo-llama3.1-8b-instruct-2n4g-fsdp2tp1-noncolocated.sh` (content) ⚔️ `tests/test_suites/llm/grpo-llama3.1-8b-instruct-2n8g-fsdp2tp1-noncolocated.sh` (content) ⚔️ `tests/test_suites/llm/grpo-llama3.1-8b-instruct-2n8g-megatron-fp8-e2e.sh` (content) ⚔️ `tests/test_suites/llm/grpo-llama3.1-8b-instruct-4n4g-fsdp2tp1-long.v3.sh` (content) ⚔️ `tests/test_suites/llm/grpo-llama3.1-8b-instruct-4n8g-fsdp2tp1-long.v3.sh` (content) ⚔️ `tests/test_suites/llm/grpo-llama3.2-1b-instruct-1n4g-fsdp2tp1.v3.sh` (content) ⚔️ `tests/test_suites/llm/grpo-llama3.2-1b-instruct-1n4g-megatron.sh` (content) ⚔️ `tests/test_suites/llm/grpo-llama3.2-1b-instruct-1n4g-megatron_generation.sh` (content) ⚔️ `tests/test_suites/llm/grpo-llama3.2-1b-instruct-1n8g-fsdp2tp1.v3.sh` (content) ⚔️ `tests/test_suites/llm/grpo-llama3.2-1b-instruct-1n8g-megatron.sh` (content) ⚔️ `tests/test_suites/llm/grpo-llama3.2-1b-instruct-1n8g-megatron_generation.sh` (content) ⚔️ `tests/test_suites/llm/grpo-math-llama-nemotron-super-49b-v.5-4n8g-fsdp2tp8.sh.disabled` (content) ⚔️ `tests/test_suites/llm/grpo-math-qwen3-30ba3b-megatron-tp4-32k.sh` (content) ⚔️ `tests/test_suites/llm/grpo-moonlight-16b-automodel-1n8g-ep8.sh` (content) ⚔️ `tests/test_suites/llm/grpo-moonlight-16ba3b-4n4g-megatron.sh` (content) ⚔️ `tests/test_suites/llm/grpo-moonlight-16ba3b-4n8g-megatron-fp8-e2e.sh` (content) ⚔️ `tests/test_suites/llm/grpo-moonlight-16ba3b-4n8g-megatron.sh` (content) ⚔️ `tests/test_suites/llm/grpo-nano-v2-12b-1n4g-megatron.sh` (content) ⚔️ `tests/test_suites/llm/grpo-nano-v2-12b-1n8g-megatron.sh` (content) ⚔️ `tests/test_suites/llm/grpo-nano-v2-12b-2n4g-fsdp2tp1.sh` (content) ⚔️ `tests/test_suites/llm/grpo-nano-v2-12b-2n8g-fsdp2tp1.sh` (content) ⚔️ `tests/test_suites/llm/grpo-qwen2.5-32b-32n4g-fsdp2tp4-actckpt-long.v3.sh` (content) ⚔️ `tests/test_suites/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8-actckpt-long.v3.sh` (content) ⚔️ `tests/test_suites/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8-actckpt.v3.sh` (content) ⚔️ `tests/test_suites/llm/grpo-qwen2.5-7b-instruct-4n4g-fsdp2tp2.v3.sh` (content) ⚔️ `tests/test_suites/llm/grpo-qwen2.5-7b-instruct-4n4g-megatron.sh` (content) ⚔️ `tests/test_suites/llm/grpo-qwen2.5-7b-instruct-4n8g-fsdp2tp4.v3.sh` (content) ⚔️ `tests/test_suites/llm/grpo-qwen2.5-7b-instruct-4n8g-megatron.sh` (content) ⚔️ `tests/test_suites/llm/grpo-qwen2.5-math-1.5b-instruct-1n4g-fsdp2tp1.v3.sh` (content) ⚔️ `tests/test_suites/llm/grpo-qwen2.5-math-1.5b-instruct-1n8g-fsdp2tp1-sglang.sh` (content) ⚔️ `tests/test_suites/llm/grpo-qwen2.5-math-1.5b-instruct-1n8g-fsdp2tp1.v3.sh` (content) ⚔️ `tests/test_suites/llm/grpo-qwen3-0.6b-1n8g-sglang.sh` (content) ⚔️ `tests/test_suites/llm/grpo-qwen3-30ba3b-8n4g-megatron.sh` (content) ⚔️ `tests/test_suites/llm/grpo-qwen3-30ba3b-8n8g-megatron.sh` (content) ⚔️ `tests/test_suites/llm/grpo-qwen3-8B-base-1n8g-fsdp2-lora.sh` (content) ⚔️ `tests/test_suites/llm/grpo-qwen3-8b-base-1n8g-fp8-kvcache-megatron.sh` (content) ⚔️ `tests/test_suites/llm/performance/dapo-deepseek-v3-64n8g.sh` (content) ⚔️ `tests/test_suites/llm/performance/grpo-deepseek-v3-32n4g.sh` (content) ⚔️ `tests/test_suites/llm/performance/grpo-deepseek-v3-32n8g.sh` (content) ⚔️ `tests/test_suites/llm/performance/grpo-deepseek-v3-64n4g-async-1off.sh` (content) ⚔️ `tests/test_suites/llm/performance/grpo-deepseek-v3-64n8g-async-1off.sh` (content) ⚔️ `tests/test_suites/llm/performance/grpo-deepseek-v3-64n8g-fp8-async-1off.sh` (content) ⚔️ `tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n4g-async-1off.sh` (content) ⚔️ `tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n4g.sh` (content) ⚔️ `tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.sh` (content) ⚔️ `tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.sh` (content) ⚔️ `tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g.sh` (content) ⚔️ `tests/test_suites/llm/performance/grpo-qwen3-235b-16n4g.sh` (content) ⚔️ `tests/test_suites/llm/performance/grpo-qwen3-235b-16n8g.sh` (content) ⚔️ `tests/test_suites/llm/performance/grpo-qwen3-235b-32n4g-async-1off.sh` (content) ⚔️ `tests/test_suites/llm/performance/grpo-qwen3-235b-32n8g-async-1off.sh` (content) ⚔️ `tests/test_suites/llm/performance/grpo-qwen3-30ba3b-24n8g-async-8off.sh` (content) ⚔️ `tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n4g.sh` (content) ⚔️ `tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g-40K.sh` (content) ⚔️ `tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g-async-1off.sh` (content) ⚔️ `tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g.sh` (content) ⚔️ `tests/test_suites/llm/performance/grpo-qwen3-30ba3b-8n4g-async-1off.sh` (content) ⚔️ `tests/test_suites/llm/performance/grpo-qwen3-32b-4n4g.sh` (content) ⚔️ `tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g.sh` (content) ⚔️ `tests/test_suites/llm/performance/grpo-qwen3-32b-8n4g-async-1off.sh` (content) ⚔️ `tests/test_suites/llm/performance/grpo-qwen3-32b-8n8g-async-1off.sh` (content) ⚔️ `tests/test_suites/nightly.txt` (content) ⚔️ `tests/unit/algorithms/test_distillation.py` (content) ⚔️ `tests/unit/algorithms/test_dpo.py` (content) ⚔️ `tests/unit/algorithms/test_grpo.py` (content) ⚔️ `tests/unit/algorithms/test_loss_functions.py` (content) ⚔️ `tests/unit/algorithms/test_reward_functions.py` (content) ⚔️ `tests/unit/algorithms/test_rm.py` (content) ⚔️ `tests/unit/algorithms/test_sft.py` (content) ⚔️ `tests/unit/data/datasets/test_preference_dataset.py` (content) ⚔️ `tests/unit/data/datasets/test_response_dataset.py` (content) ⚔️ `tests/unit/experience/test_rollouts.py` (content) ⚔️ `tests/unit/models/generation/test_vllm_utils.py` (content) ⚔️ `tests/unit/models/megatron/test_megatron_data.py` (content) ⚔️ `tests/unit/models/megatron/test_megatron_setup.py` (content) ⚔️ `tests/unit/models/policy/test_megatron_worker.py` (content) ⚔️ `tests/unit/test_config_validation.py` (content) ⚔️ `tests/unit/test_recipes_and_test_suites.py` (content) ⚔️ `tests/unit/utils/test_checkpoint.py` (content) ⚔️ `tests/unit/utils/test_logger.py` (content) ⚔️ `tests/unit/utils/test_native_checkpoint.py` (content) ⚔️ `tools/config_cli.py` (content) ⚔️ `tools/launch` (content) ⚔️ `uv.lock` (content) These conflicts must be resolved before merging into `main`.	Resolve conflicts locally and push changes to this branch.
Test Results For Major Changes	⚠️ Warning	PR adds major HybridEP feature for MoE expert parallelism with dependency updates across multiple groups, but PR description lacks test results, validation evidence, or regression testing information.	Add test results demonstrating HybridEP configurations work correctly, existing MoE functionality remains unaffected, training convergence is not impacted, and include performance metrics or links to test runs.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding HybridEP support for MoE expert parallelism. It directly reflects the core objective of updating DeepEP dependency and adding new MoE configuration options.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch sj/hybridep-support

⚔️ Resolve merge conflicts (beta)

Auto-commit resolved conflicts to branch sj/hybridep-support
Create stacked PR with resolved conflicts
Post resolved changes as copyable diffs in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

pyproject.toml (1)

324-327: ⚠️ Potential issue | 🟠 Major

Stale dependency-metadata version for deep_ep.

The dependency is pinned to the hybrid-ep branch (which dynamically generates its version from the current commit hash via git rev-parse --short HEAD), but the dependency-metadata version is statically set to v1.2.1+bfded34. This means the metadata version will become stale whenever the branch advances, potentially causing uv resolver failures.

Either:

Update to pin to a specific commit hash instead of a branch, or

Update the metadata version to match the current HEAD of hybrid-ep and regenerate it whenever the dependency updates

🤖 Fix all issues with AI agents

In `@nemo_rl/models/megatron/setup.py`:
- Around line 405-412: The new runtime keys moe_flex_dispatcher_backend and
moe_hybridep_num_sms are missing from the MegatronConfig TypedDict and from
example configs; add both to the MegatronConfig definition in
nemo_rl/models/policy/__init__.py as NotRequired entries (use the exact symbol
name MegatronConfig) with short docstrings: "Backend type for MoE flex
dispatcher (HybridEP)" for moe_flex_dispatcher_backend and "Number of SMs for
HybridEP" for moe_hybridep_num_sms, and then update at least one exemplar YAML
in examples/configs (e.g., a megatron MoE config) to include these keys with
sensible defaults (recommended defaults) so they are documented and visible to
users.

🧹 Nitpick comments (1)

pyproject.toml (1)

70-72: Branch ref instead of pinned commit reduces build reproducibility.

All three dependency groups now point to @hybrid-ep (a moving branch) instead of a fixed commit hash. This means builds are not reproducible — a force-push or new commit on that branch silently changes what gets installed. Consider pinning to a specific commit on the hybrid-ep branch once it stabilizes.

coderabbitai · 2026-02-13T19:41:11Z

nemo_rl/models/megatron/setup.py

+    # HybridEP settings for MoE expert parallelism
+    # See: https://github.com/deepseek-ai/DeepEP/tree/hybrid-ep
+    if "moe_flex_dispatcher_backend" in config["megatron_cfg"]:
+        model_cfg.moe_flex_dispatcher_backend = config["megatron_cfg"][
+            "moe_flex_dispatcher_backend"
+        ]
+    if "moe_hybridep_num_sms" in config["megatron_cfg"]:
+        model_cfg.moe_hybridep_num_sms = config["megatron_cfg"]["moe_hybridep_num_sms"]


🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's understand the file structure and find the TypedDict definition find . -type f -name "*.py" -path "*/nemo_rl/*" | head -20

Repository: NVIDIA-NeMo/RL

Length of output: 805

🏁 Script executed:

# Search for megatron_cfg TypedDict definition and the new keys rg -n "moe_flex_dispatcher_backend|moe_hybridep_num_sms" --type=py

Repository: NVIDIA-NeMo/RL

Length of output: 565

🏁 Script executed:

# Check for exemplar YAMLs in examples/configs fd -e yaml -p "examples/configs" | head -20

Repository: NVIDIA-NeMo/RL

Length of output: 856

🏁 Script executed:

# Search for megatron_cfg TypedDict definition rg -n "class.*megatron.*cfg|MegatronCfg" --type=py -i | head -30

Repository: NVIDIA-NeMo/RL

Length of output: 40

🏁 Script executed:

# Search for TypedDict definitions in megatron module or related config files rg -n "TypedDict|class.*Cfg" --type=py nemo_rl/models/megatron/ | head -40

Repository: NVIDIA-NeMo/RL

Length of output: 240

🏁 Script executed:

# Check the setup.py file around the lines in question for context sed -n '380,420p' nemo_rl/models/megatron/setup.py

Repository: NVIDIA-NeMo/RL

Length of output: 1825

🏁 Script executed:

# Check one of the megatron exemplar YAMLs to see if megatron_cfg is documented cat examples/configs/grpo_math_70B_megatron.yaml | head -100

Repository: NVIDIA-NeMo/RL

Length of output: 1826

🏁 Script executed:

# Search for where megatron_cfg is defined as a TypedDict rg -n "megatron_cfg.*:" --type=py | head -20

Repository: NVIDIA-NeMo/RL

Length of output: 1855

🏁 Script executed:

# Look at the config.py file to find MegatronConfig definition cat nemo_rl/models/megatron/config.py

Repository: NVIDIA-NeMo/RL

Length of output: 3082

🏁 Script executed:

# Also check the policy/__init__.py to see the full PolicyConfig TypedDict sed -n '1,300p' nemo_rl/models/policy/__init__.py | grep -A 20 "megatron_cfg"

Repository: NVIDIA-NeMo/RL

Length of output: 828

🏁 Script executed:

# Search for MegatronConfig class/TypedDict definition rg -n "class MegatronConfig|MegatronConfig.*=.*TypedDict" --type=py

Repository: NVIDIA-NeMo/RL

Length of output: 207

🏁 Script executed:

# Check the entire policy/__init__.py file wc -l nemo_rl/models/policy/__init__.py

Repository: NVIDIA-NeMo/RL

Length of output: 95

🏁 Script executed:

# Get the full PolicyConfig definition including imports head -300 nemo_rl/models/policy/__init__.py

Repository: NVIDIA-NeMo/RL

Length of output: 10006

Add moe_flex_dispatcher_backend and moe_hybridep_num_sms to MegatronConfig TypedDict and exemplar YAMLs.

The two new optional keys are correctly handled at runtime with in checks, but per coding guidelines, new config keys must be documented in the TypedDict subclass and reflected in exemplar YAMLs.

Add both keys to MegatronConfig in nemo_rl/models/policy/__init__.py using NotRequired, with docstrings explaining their purpose and valid values (e.g., "Backend type for MoE flex dispatcher (HybridEP)" and "Number of SMs for HybridEP"). Update at least one exemplar YAML under examples/configs/ (e.g., a megatron MoE config) to include these keys with their recommended defaults.

🤖 Prompt for AI Agents

In `@nemo_rl/models/megatron/setup.py` around lines 405 - 412, The new runtime keys moe_flex_dispatcher_backend and moe_hybridep_num_sms are missing from the MegatronConfig TypedDict and from example configs; add both to the MegatronConfig definition in nemo_rl/models/policy/__init__.py as NotRequired entries (use the exact symbol name MegatronConfig) with short docstrings: "Backend type for MoE flex dispatcher (HybridEP)" for moe_flex_dispatcher_backend and "Number of SMs for HybridEP" for moe_hybridep_num_sms, and then update at least one exemplar YAML in examples/configs (e.g., a megatron MoE config) to include these keys with sensible defaults (recommended defaults) so they are documented and visible to users.

…odels Add performance recipes and test scripts for HybridEP and CUDA Graph optimizations: - grpo-qwen3-30ba3b-4n4g-hybridep: HybridEP with flex dispatcher for Qwen3-30B-A3B - grpo-qwen3-30ba3b-4n4g-hybridep-cudagraph: HybridEP + CUDA Graph (attn, moe_router) - grpo-qwen3-235b-16n4g-hybridep: HybridEP for Qwen3-235B-A22B - grpo-qwen3-235b-16n4g-hybridep-cudagraph: HybridEP + CUDA Graph for Qwen3-235B-A22B Key configurations: - moe_token_dispatcher_type: flex - moe_flex_dispatcher_backend: hybridep - moe_hybridep_num_sms: 32 - cuda_graph_impl: transformer_engine - cuda_graph_scope: [attn, moe_router]

…on handling - grpo-qwen3-30ba3b-4n4g-hybridep.yaml: sequence_packing.enabled=false (compat note) - ray.sub: CUDA_HOME/PATH for nvcc, attach shell single-quote fix for uv - common.env: exit_if_max_steps_reached handles missing metrics.json - test scripts: metrics.json existence check before jq Co-authored-by: Cursor <cursoragent@cursor.com>

guyueh1 · 2026-02-14T05:32:25Z

pyproject.toml

-  "deep_ep @ git+https://github.com/deepseek-ai/DeepEP.git@bfded34800dfec415b71503f8205181de90b2480",
+  # HybridEP branch for MoE expert parallelism
+  # See: https://github.com/deepseek-ai/DeepEP/tree/hybrid-ep
+  "deep_ep @ git+https://github.com/deepseek-ai/DeepEP.git@hybrid-ep",


Suggested change

"deep_ep @ git+https://github.com/deepseek-ai/DeepEP.git@hybrid-ep",

"deep_ep @ git+https://github.com/deepseek-ai/DeepEP.git@bfded34800dfec415b71503f8205181de90b2480 ; platform_machine == 'x86_64'",

"deep_ep @ git+https://github.com/deepseek-ai/DeepEP.git@hybrid-ep ; platform_machine == 'aarch64'",

guyueh1 · 2026-02-14T05:33:10Z

pyproject.toml

-  "deep_ep @ git+https://github.com/deepseek-ai/DeepEP.git@bfded34800dfec415b71503f8205181de90b2480",
+  # HybridEP branch for MoE expert parallelism
+  # See: https://github.com/deepseek-ai/DeepEP/tree/hybrid-ep
+  "deep_ep @ git+https://github.com/deepseek-ai/DeepEP.git@hybrid-ep",


Suggested change

"deep_ep @ git+https://github.com/deepseek-ai/DeepEP.git@hybrid-ep",

"deep_ep @ git+https://github.com/deepseek-ai/DeepEP.git@bfded34800dfec415b71503f8205181de90b2480 ; platform_machine == 'x86_64'",

"deep_ep @ git+https://github.com/deepseek-ai/DeepEP.git@hybrid-ep ; platform_machine == 'aarch64'",

guyueh1 · 2026-02-14T05:33:18Z

pyproject.toml

-  "deep_ep @ git+https://github.com/deepseek-ai/DeepEP.git@bfded34800dfec415b71503f8205181de90b2480",
+  # HybridEP branch for MoE expert parallelism
+  # See: https://github.com/deepseek-ai/DeepEP/tree/hybrid-ep
+  "deep_ep @ git+https://github.com/deepseek-ai/DeepEP.git@hybrid-ep",


Suggested change

"deep_ep @ git+https://github.com/deepseek-ai/DeepEP.git@hybrid-ep",

"deep_ep @ git+https://github.com/deepseek-ai/DeepEP.git@bfded34800dfec415b71503f8205181de90b2480 ; platform_machine == 'x86_64'",

"deep_ep @ git+https://github.com/deepseek-ai/DeepEP.git@hybrid-ep ; platform_machine == 'aarch64'",

seonjinn requested review from a team as code owners February 13, 2026 19:37

coderabbitai bot reviewed Feb 13, 2026

View reviewed changes

seonjinn requested a review from guyueh1 February 13, 2026 19:42

seonjinn and others added 3 commits February 13, 2026 15:25

Merge latest main into sj/hybridep-support

9ca5436

seonjinn requested review from a team as code owners February 13, 2026 23:26

guyueh1 reviewed Feb 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add HybridEP support for MoE expert parallelism#1942

feat: Add HybridEP support for MoE expert parallelism#1942
seonjinn wants to merge 4 commits intomainfrom
sj/hybridep-support

seonjinn commented Feb 13, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 13, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 13, 2026

Uh oh!

guyueh1 Feb 14, 2026

Uh oh!

guyueh1 Feb 14, 2026

Uh oh!

guyueh1 Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	"deep_ep @ git+https://github.com/deepseek-ai/DeepEP.git@hybrid-ep",
	"deep_ep @ git+https://github.com/deepseek-ai/DeepEP.git@bfded34800dfec415b71503f8205181de90b2480 ; platform_machine == 'x86_64'",
	"deep_ep @ git+https://github.com/deepseek-ai/DeepEP.git@hybrid-ep ; platform_machine == 'aarch64'",

Conversation

seonjinn commented Feb 13, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 13, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

guyueh1 Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

guyueh1 Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

guyueh1 Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

seonjinn commented Feb 13, 2026 •

edited by coderabbitai bot

Loading