Skip to content

Conversation

@psyloy
Copy link

@psyloy psyloy commented Jan 23, 2026

What does this PR do?

forward max_tokens/max_new_tokens from rollout config to vLLM/sglang backends

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to forward max_tokens and max_new_tokens from the rollout configuration to the vLLM/sglang backends. I've identified a few critical issues that prevent this from working as intended. Specifically, there's a bug in how sampling parameters are updated in verl/experimental/agent_loop/agent_loop.py, an incorrect type hint in verl/workers/config/rollout.py that would cause a runtime error, and the new logic is missing entirely from verl/experimental/fully_async_policy/agent_loop/agent_loop.py. I have provided detailed comments and suggestions to address these problems.

for param_name in ["max_tokens", "max_new_tokens"]:
param_value = getattr(config, param_name, None)
if param_value is not None:
sampling_params[param_value] = param_value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There's a bug in how the sampling parameters are being updated. You're using the parameter's value (param_value) as the dictionary key, but it should be the parameter's name (param_name). This will cause a TypeError if the value is not hashable, or will add an incorrect key to the sampling_params dictionary, preventing the setting from being applied.

Suggested change
sampling_params[param_value] = param_value
sampling_params[param_name] = param_value

Comment on lines 133 to 134
max_tokens: Optional[list] = None
max_new_tokens: Optional[list] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The type hints for max_tokens and max_new_tokens are incorrectly defined as Optional[list]. These parameters should be integers representing the maximum number of tokens. Using a list will cause a TypeError at runtime when trying to add them to the sampling_params dictionary, as lists are not hashable.

Suggested change
max_tokens: Optional[list] = None
max_new_tokens: Optional[list] = None
max_tokens: Optional[int] = None
max_new_tokens: Optional[int] = None

repetition_penalty=1.0,
logprobs=config.calculate_log_probs,
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This file is missing the logic to forward max_tokens and max_new_tokens from the rollout configuration to sampling_params. This was the main goal of the PR and its omission here will lead to inconsistent behavior between the two agent loop implementations. Please add the forwarding logic here as you did in verl/experimental/agent_loop/agent_loop.py.

Suggested change
# configure max generation tokens for vllm/sglang
for param_name in ["max_tokens", "max_new_tokens"]:
param_value = getattr(config, param_name, None)
if param_value is not None:
sampling_params[param_name] = param_value

@psyloy psyloy closed this Jan 23, 2026
@psyloy psyloy changed the title [rollout, vllm, sglang] fix: forward max_tokens/max_new_tokens from rollout config to vLLM/sglang backends [rollou] fix: forward max_tokens from rollout config to vLLM backends Jan 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant