[feat] Atropos integration with GRPO (#1782) #5026

vyomakesh0728 · 2026-01-22T22:45:39Z

What does this PR do?

This PR adds full Atropos environment support to VERL's GRPO training pipeline, addressing issue #1782 from Nous Research.

What's included:

GRPO training that handles token-level advantages from Atropos environments when provided
VERL now spins up vLLM inference servers and registers them with the Atropos API
Policy weight updates are managed by VERL throughout training
Single launcher orchestrates the entire pipeline (Atropos API + vLLM + training)
Tested end-to-end on GSM8K with solid improvements in reward and accuracy

Core verl changes:

Integration documentation: Complete API reference and usage examples
GRPO advantage override support: (verl/trainer/ppo/core_algos.py) - Added token_level_advantages parameter to compute_grpo_outcome_advantage() for Atropos-provided token-level advantages
Multi-turn GRPO handling: (verl/trainer/ppo/ray_trainer.py) - Support for token-level advantages in compute_advantage() and multi-turn conversation masking
vLLM max_model_len configuration: (verl/workers/rollout/vllm_rollout/vllm_async_server.py) - Respect explicit overrides for KV cache memory control
Config template: (verl/trainer/config/atropos_grpo_small.yaml) - Reference GRPO configuration for Atropos integration

Main files under recipe:

atropos/atropos_integration.py - Atropos API client and advantage handling
atropos/grpo_atropos_trainer.py - GRPO trainer with token-level advantage support
atropos/launch_atropos_verl_services.py - Service orchestration
Complete docs and example configs

Atropos recipe PR Link

Test

Tested end-to-end with atropos_grpo_small.yaml on GSM8K dataset on A100 for about 300steps

uv run recipe/atropos/launch_atropos_verl_services.py \
  --config recipe/atropos/config/atropos_grpo_small.yaml

Training shows steady improvements in reward (val-aux/openai/gsm8k/reward/mean@1) and accuracy (val-core/openai/gsm8k/acc/mean@1) over 372 steps
All stability metrics (KL divergence, entropy, gradient norm) remained bounded throughout training
Full W&B run available at wandb

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)
If your PR is related to the recipe submodule, please also update the reference to the submodule commit via git submodule update --remote or cd recipe && git pull origin main.

Closes #1782

gemini-code-assist

Code Review

This pull request integrates Atropos with VERL's GRPO training pipeline, introducing support for token-level advantages from Atropos environments. This is a significant feature addition that allows for more flexible advantage calculation. The changes are well-contained, primarily affecting the PPO trainer logic to handle these external advantages and multi-turn conversations. The inclusion of documentation for the new API and integration examples is a great addition for maintainability and usability. I've identified one critical issue with a dependency version in pyproject.toml that needs to be addressed to ensure correct installation.

pyproject.toml

vyomakesh0728 added 2 commits January 23, 2026 04:06

Add Atropos integration docs and GRPO support

c37a8ec

Update packaging requirements for Atropos integration

474dab4

vyomakesh0728 requested review from PeterSH6, eric-haibin-lin, tongyx361, vermouth1992 and zhaochenyang20 as code owners January 22, 2026 22:45

vyomakesh0728 changed the title ~~[feat] Atropos integration with GRPO (#1782) #5017~~ [feat] Atropos integration with GRPO (#1782) Jan 22, 2026

vyomakesh0728 mentioned this pull request Jan 22, 2026

[feat] Atropos GRPO recipe implementation (Issue #1782) verl-project/verl-recipe#22

Open

gemini-code-assist bot reviewed Jan 22, 2026

View reviewed changes

pyproject.toml Show resolved Hide resolved

vyomakesh0728 added 2 commits January 23, 2026 04:29

fix pre-commit conflicts

fc37ac1

Merge branch 'volcengine:main' into feat/atropos-unified-clean

33c2d7f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] Atropos integration with GRPO (#1782) #5026

[feat] Atropos integration with GRPO (#1782) #5026

vyomakesh0728 commented Jan 22, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[feat] Atropos integration with GRPO (#1782) #5026

Are you sure you want to change the base?

[feat] Atropos integration with GRPO (#1782) #5026

Conversation

vyomakesh0728 commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

What's included:

Core verl changes:

Main files under recipe:

Test

Checklist Before Submitting

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vyomakesh0728 commented Jan 22, 2026 •

edited

Loading