force online difficulty filtering by faresobeid · Pull Request #1885 · PrimeIntellect-ai/prime-rl

faresobeid · 2026-02-25T15:01:49Z

We always have online difficulty filtering but metrics are computed on all rollouts so theyre unchanged.
Basically doesn't ever make sense not to have ODF on, main change this will have is if lots of rollouts have zero advantages, steps will take longer as the batch isn't being filled but this case seems pretty suitable.
This also saves lots of wasted trainer compute which can be used for more efficient training if config is adjusted properly, or can use the idle trainer time for other work

Note

Medium Risk
Changes RL training data selection by filtering out zero-advantage rollouts (except when skip_verification), which can alter effective batch sizes and training dynamics. Also removes a config option and changes a default (rollouts_per_example), which may impact existing experiment behavior.

Overview
Removes orchestrator.buffer.online_difficulty_filtering and updates configs/tests accordingly; the buffer now always stores all incoming rollouts.

Adds advantage-based rollout filtering in the orchestrator: rollouts with ~zero advantage are dropped from training samples by default, while buffer.skip_verification=true bypasses this to keep distillation runs trainable. New difficulty_filter/* metrics are logged and orchestrator.rollouts_per_example default increases from 1 to 4.

^{Written by Cursor Bugbot for commit 93d4f24. This will update automatically on new commits. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

src/prime_rl/orchestrator/orchestrator.py

src/prime_rl/configs/orchestrator.py

Co-authored-by: faresobeid <faresobeid@users.noreply.github.com>

force online difficulty filtering

2074c34

cursor bot reviewed Feb 25, 2026

View reviewed changes

src/prime_rl/orchestrator/orchestrator.py Show resolved Hide resolved

src/prime_rl/configs/orchestrator.py Show resolved Hide resolved

cursoragent and others added 2 commits February 26, 2026 03:40

fix skip_verification zero-advantage filtering

140a0ca

Co-authored-by: faresobeid <faresobeid@users.noreply.github.com>

docs(changelog): document orchestrator config field changes

93d4f24

Co-authored-by: faresobeid <faresobeid@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

force online difficulty filtering#1885

force online difficulty filtering#1885
faresobeid wants to merge 3 commits intomainfrom
force-odf_

faresobeid commented Feb 25, 2026 •

edited by cursor bot

Loading

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

faresobeid commented Feb 25, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

faresobeid commented Feb 25, 2026 •

edited by cursor bot

Loading