Skip to content

force online difficulty filtering#1885

Open
faresobeid wants to merge 3 commits intomainfrom
force-odf_
Open

force online difficulty filtering#1885
faresobeid wants to merge 3 commits intomainfrom
force-odf_

Conversation

@faresobeid
Copy link
Contributor

@faresobeid faresobeid commented Feb 25, 2026

We always have online difficulty filtering but metrics are computed on all rollouts so theyre unchanged.
Basically doesn't ever make sense not to have ODF on, main change this will have is if lots of rollouts have zero advantages, steps will take longer as the batch isn't being filled but this case seems pretty suitable.
This also saves lots of wasted trainer compute which can be used for more efficient training if config is adjusted properly, or can use the idle trainer time for other work


Note

Medium Risk
Changes RL training data selection by filtering out zero-advantage rollouts (except when skip_verification), which can alter effective batch sizes and training dynamics. Also removes a config option and changes a default (rollouts_per_example), which may impact existing experiment behavior.

Overview
Removes orchestrator.buffer.online_difficulty_filtering and updates configs/tests accordingly; the buffer now always stores all incoming rollouts.

Adds advantage-based rollout filtering in the orchestrator: rollouts with ~zero advantage are dropped from training samples by default, while buffer.skip_verification=true bypasses this to keep distillation runs trainable. New difficulty_filter/* metrics are logged and orchestrator.rollouts_per_example default increases from 1 to 4.

Written by Cursor Bugbot for commit 93d4f24. This will update automatically on new commits. Configure here.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

cursoragent and others added 2 commits February 26, 2026 03:40
Co-authored-by: faresobeid <faresobeid@users.noreply.github.com>
Co-authored-by: faresobeid <faresobeid@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants