Skip to content

Conversation

@guillemgt
Copy link

What does this PR do?

Adds a sampling_kwargs field to RolloutConfig and SamplingConfig, allowing users to pass arbitrary sampling parameters at runtime without modifying hardcoded configurations. This enables experimental flexibility for testing custom sampling strategies.

Checklist Before Starting

Test

This feature has been tested with the stop parameter to verify custom sampling kwargs are properly applied. An alternative test was also run without sampling kwargs to confirm backward compatibility - the code runs normally when sampling_kwargs is not specified.

Manual testing can be done by:

  1. Setting actor_rollout_ref.rollout.sampling_kwargs in config YAML (e.g., with stop: ["\n"])
  2. Running a training job and verifying custom params are applied
  3. Running without sampling_kwargs to verify backward compatibility
  4. Checking validation batches use val_kwargs.sampling_kwargs when set

API and Usage Example

# In your config YAML or via CLI overrides:
actor_rollout_ref:
  rollout:
    sampling_kwargs:
      stop: ["\n"]
      frequency_penalty: 0.5
    val_kwargs:
      sampling_kwargs:
        stop: ["\n", "User:"]

Design & Code Changes

High-level design:

  • Adds optional sampling_kwargs: Optional[dict[str, Any]] to both RolloutConfig and SamplingConfig dataclasses
  • In agent loop workers, custom kwargs are merged with default sampling params using dict merge operator (|)
  • Uses OmegaConf.to_container() to properly convert OmegaConf DictConfig objects to regular dicts before merging
  • Validation batches check config.val_kwargs.sampling_kwargs first, falling back to config.sampling_kwargs

Specific changes:

  1. verl/workers/config/rollout.py: Add sampling_kwargs fields to dataclasses
  2. verl/trainer/config/rollout/rollout.yaml: Document the new fields with comments
  3. Agent loop files: Apply custom kwargs with proper OmegaConf handling
  4. Generated YAML configs updated automatically by pre-commit hook

Checklist Before Submitting

  • Read the Contribute Guide
  • Apply pre-commit checks: pre-commit run --all-files
  • Add documentation (YAML comments included)
  • Tests: This change is difficult to unit test without full rollout infrastructure. Manual testing described above.
  • Request CI in Slack ci-request channel

Adds sampling_kwargs field to RolloutConfig and SamplingConfig, allowing
users to pass arbitrary sampling parameters at runtime without modifying
hardcoded configurations. This enables experimental flexibility for
testing custom sampling strategies.

Changes:
- Add sampling_kwargs: Optional[dict[str, Any]] to config dataclasses
- Apply custom kwargs in agent loop workers with proper OmegaConf handling
- Update rollout.yaml documentation
- Backward compatible (defaults to None)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants