individual rollouts by faresobeid · Pull Request #1865 · PrimeIntellect-ai/prime-rl

faresobeid · 2026-02-24T05:20:57Z

Re-write the scheduler to do rollouts at the rollout level instead of group level. This way any long tails within groups are handled, so when a rollout within a group finishes it can move on to another group.
Currently if any env needs group scoring, we go back to previous behaviour of group rollouts although ideally we make it so verifiers is closer here and we can do scoring within prime-rl

Note

High Risk
Refactors the core training scheduler from group-level to per-rollout scheduling and changes when/where scoring happens for group-based rubrics, which can affect training throughput and reward correctness.

Overview
Reworks training rollout scheduling to operate at the individual rollout level rather than run_group, keeping max_inflight_rollouts filled while independently tracking per-example groups and only emitting a group once all rollouts_per_example complete.

Adds deferred group scoring for environments whose rubrics require group-level reward functions: orchestrator disables per-rollout scoring (score_rollouts=False) for those tasks and the scheduler scores the completed group via rubric.score_group after aggregation (with warnings for externally-managed env servers).

Introduces vf_utils.run_rollout() wrapper and updates scheduler metrics to distinguish in-flight rollouts vs. total pending samples, with unit tests covering the new metric behavior.

^{Written by Cursor Bugbot for commit e9ad626. This will update automatically on new commits. Configure here.}

src/prime_rl/orchestrator/scheduler.py

mikasenghaas

nice, yea lets do some testing on this to verify that we dont have any async race conditions anywhere but directionally looks good.

i think mid-term we want to move away from the verifiers env group for training envs and make our abstractions "multi-env" aware by default, e.g. smth like having a buffer and scheduler per env goverened a "scheduling" component on top or smth bc i think we will want more and more fine-grained control over how each env behaves (e.g. here whether or not to use gruop scheduling) and its always awkward to code this in an abstraction that handles multiple cases where you need conditional everywhere

src/prime_rl/orchestrator/scheduler.py

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

src/prime_rl/orchestrator/scheduler.py

samsja · 2026-02-25T22:03:11Z

lets wait for @mikasenghaas review before merging

Co-authored-by: faresobeid <faresobeid@users.noreply.github.com>

mikasenghaas · 2026-02-27T09:55:20Z

src/prime_rl/configs/orchestrator.py

-                    raise ValueError(
-                        "max_inflight_rollouts conflicts with oversampling_factor * batch_size"
-                    )
+                    raise ValueError("max_inflight_rollouts conflicts with oversampling_factor * batch_size")


imo, could just deprectea oversampling_factor, no?

mikasenghaas · 2026-02-27T09:56:17Z

src/prime_rl/orchestrator/orchestrator.py

        env_names=train_env_names,
        map_kwargs=dict(writer_batch_size=1),  # set defensively to not error on map operations on large datasets
    )
+    verification_enabled = not config.buffer.skip_verification


wait i never knew abt this arg, what is it for?

mikasenghaas · 2026-02-27T09:57:25Z

src/prime_rl/orchestrator/orchestrator.py

    )
+    verification_enabled = not config.buffer.skip_verification
+
+    def task_uses_group_scoring(task_name: str) -> bool:


would prefer to not have this logic in orch, could maybe put into vf_utils?

mikasenghaas · 2026-02-27T10:02:22Z

src/prime_rl/orchestrator/scheduler.py

+        self.group_examples: dict[int, dict] = {}
+        self.group_rollouts_to_schedule: dict[int, int] = {}
+        self.completed_group_rollouts: dict[int, list[vf.RolloutOutput]] = defaultdict(list)


wonder if we need all of these data structures or if some can be merged. e.g. group_rollouts_to_schedule[group_id] seems redundant with group_size - len(completed_group_rollouts[group_id]

mikasenghaas · 2026-02-27T10:03:29Z

src/prime_rl/orchestrator/scheduler.py

+            off_policy_steps=0, client_config=client_config, group_id=group_id
+        )
+
+    def _inflight_rollout_count(self) -> int:


can make this public imo, also nice to decorate thes with @property imo

mikasenghaas · 2026-02-27T10:05:48Z

tests/unit/orchestrator/test_scheduler.py

+    return scheduler
+
+
+def test_inflight_sample_count_includes_pending_group_rollouts():


not sure these tests are super useful as is haha

cursor bot reviewed Feb 24, 2026

View reviewed changes

src/prime_rl/orchestrator/scheduler.py Show resolved Hide resolved

mikasenghaas reviewed Feb 24, 2026

View reviewed changes

src/prime_rl/orchestrator/scheduler.py Outdated Show resolved Hide resolved

src/prime_rl/orchestrator/scheduler.py Outdated Show resolved Hide resolved

faresobeid added 2 commits February 25, 2026 14:41

individual-rollouts

a533977

fixes

f788c40

faresobeid force-pushed the individual-rollouts branch from b33268a to f788c40 Compare February 25, 2026 14:43

faresobeid changed the base branch from tokne-batch to main February 25, 2026 14:43

faresobeid added 3 commits February 25, 2026 19:49

fix group based scoring envs

574f119

ruff

98f61cf

ruff pls

530d627

cursor bot reviewed Feb 25, 2026

View reviewed changes

src/prime_rl/orchestrator/scheduler.py Show resolved Hide resolved

samsja approved these changes Feb 25, 2026

View reviewed changes

Fix inflight sample metric semantics

e9ad626

Co-authored-by: faresobeid <faresobeid@users.noreply.github.com>

mikasenghaas reviewed Feb 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

individual rollouts#1865

individual rollouts#1865
faresobeid wants to merge 6 commits intomainfrom
individual-rollouts

faresobeid commented Feb 24, 2026 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

mikasenghaas left a comment

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

samsja commented Feb 25, 2026

Uh oh!

mikasenghaas Feb 27, 2026

Uh oh!

mikasenghaas Feb 27, 2026

Uh oh!

mikasenghaas Feb 27, 2026

Uh oh!

mikasenghaas Feb 27, 2026

Uh oh!

mikasenghaas Feb 27, 2026

Uh oh!

mikasenghaas Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		return scheduler


		def test_inflight_sample_count_includes_pending_group_rollouts():

Conversation

faresobeid commented Feb 24, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

mikasenghaas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

samsja commented Feb 25, 2026

Uh oh!

mikasenghaas Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

mikasenghaas Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

mikasenghaas Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

mikasenghaas Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

mikasenghaas Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

mikasenghaas Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

faresobeid commented Feb 24, 2026 •

edited by cursor bot

Loading