Conversation
📝 WalkthroughWalkthroughThis PR adds documentation and configuration for vLLM speculative decoding support in NeMo-RL, updates the vLLM dependency to support newer versions (0.12+), adjusts custom vLLM build defaults, and comments out a broken sglang build step in the Dockerfile. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@docs/repro-spec-decode-build.md`:
- Around line 26-28: Fenced code blocks in the commit message section (the
triple-backtick blocks around "946862e7 docs: add release runs to front page
readme for 0.5 (`#1879`)") are missing language identifiers; update each of the
affected fenced blocks (the ones around the plain output examples near the
commit header and the longer output block further down) by appending an
appropriate language tag such as ```log or ```text immediately after the opening
backticks so they satisfy MD040 linting.
In `@examples/configs/recipes/llm/grpo-qwen2.5-7b-spec-decode-1n8g.yaml`:
- Around line 1-43: The filename and internal identifiers claim "qwen2.5-7b" but
the policy.model_name (Qwen/Qwen3-4B) and speculative draft model
(vllm_kwargs.speculative_config.model = Qwen/Qwen3-0.6B) indicate Qwen3; rename
the recipe and update internal references to be consistent (e.g., change
filename to grpo-qwen3-4b-spec-decode-1n8g.yaml and update checkpoint_dir,
logger.log_dir, wandb.name, and any other strings that contain "qwen2.5-7b" to
"qwen3-4b"), ensuring the name follows the pattern
<algo>-<model>-<nodes>n<gpus>g-<strategy-and-params>.
In `@pyproject.toml`:
- Line 79: The dependency declaration "vllm>=0.12" lacks an upper bound; change
it to include a conservative upper limit (e.g. "vllm>=0.12,<0.15") in
pyproject.toml to avoid pulling breaking minor releases, then regenerate your
lockfile (run your project’s lock command such as poetry lock or pip-compile)
and run tests/CI to ensure compatibility; update any dependency metadata that
references the old unconstrained spec if present.
In `@tools/build-custom-vllm.sh`:
- Around line 25-28: Update the stale comment to mention the actual PR/commit
(vLLM PR `#24322` targeting 0.14.x) and remove the misleading "v0.10" / git
merge-base text; then make the fallback wheel URL robust by constructing it from
the VLLM_WHEEL_COMMIT variable instead of embedding a hardcoded version string
and add a guard: if a user supplies $3 (overrides VLLM_WHEEL_COMMIT) but does
not set VLLM_PRECOMPILED_WHEEL_LOCATION, print a clear error and exit (or
require they set VLLM_PRECOMPILED_WHEEL_LOCATION) so the commit and wheel URL
cannot get out of sync; reference VLLM_WHEEL_COMMIT and
VLLM_PRECOMPILED_WHEEL_LOCATION in the change.
🧹 Nitpick comments (2)
docker/Dockerfile (1)
117-118: Consider adding a TODO/issue reference for re-enabling the sglang sync.Commenting out the sglang step is a reasonable workaround, but without a tracking issue it risks being forgotten indefinitely. The CMake install block at lines 48-58 (labeled "for sglang build") also becomes dead weight in the image while this is disabled.
docs/repro-spec-decode-build.md (1)
59-84: This doc describes patches already applied in this PR — clarify the intended audience.The "Local Modifications" section documents
sedcommands to patchbuild-custom-vllm.shand the Dockerfile, but these changes are already committed in this PR. This could confuse readers who check out this branch and then try to apply the patches again. Consider adding a note that these modifications are already included in the branch/release, and thesedcommands are only needed when starting from the upstream v0.5.0 tag.
| ``` | ||
| 946862e7 docs: add release runs to front page readme for 0.5 (#1879) | ||
| ``` |
There was a problem hiding this comment.
Add language identifiers to fenced code blocks.
Lines 26, 42, and 245 have fenced code blocks without language specifiers. Use text or log for plain output blocks to satisfy linting (MD040).
Example fix for line 245
-```
+```log
Initializing a V1 LLM engine (v0.14.0rc2.dev156+g4a5299c93.d20260210)Also applies to: 42-44, 245-251
🧰 Tools
🪛 markdownlint-cli2 (0.20.0)
[warning] 26-26: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
In `@docs/repro-spec-decode-build.md` around lines 26 - 28, Fenced code blocks in
the commit message section (the triple-backtick blocks around "946862e7 docs:
add release runs to front page readme for 0.5 (`#1879`)") are missing language
identifiers; update each of the affected fenced blocks (the ones around the
plain output examples near the commit header and the longer output block further
down) by appending an appropriate language tag such as ```log or ```text
immediately after the opening backticks so they satisfy MD040 linting.
| defaults: ../../grpo_math_1B.yaml | ||
| grpo: | ||
| max_num_steps: 200 | ||
| num_prompts_per_step: 32 | ||
| num_generations_per_prompt: 16 | ||
| checkpointing: | ||
| checkpoint_dir: results/grpo-qwen2.5-7b-spec-decode-1n8g | ||
| policy: | ||
| model_name: Qwen/Qwen3-4B | ||
| tokenizer: | ||
| name: Qwen/Qwen3-4B | ||
| train_global_batch_size: 512 | ||
| train_micro_batch_size: 1 | ||
| logprob_batch_size: 1 | ||
| max_total_sequence_length: 2048 | ||
| dynamic_batching: | ||
| enabled: true | ||
| sequence_packing: | ||
| enabled: false | ||
| make_sequence_length_divisible_by: 1 | ||
| generation: | ||
| max_new_tokens: 1024 | ||
| vllm_cfg: | ||
| tensor_parallel_size: 1 | ||
| gpu_memory_utilization: 0.85 | ||
| max_model_len: 2048 | ||
| vllm_kwargs: | ||
| speculative_config: | ||
| method: draft_model | ||
| model: Qwen/Qwen3-0.6B | ||
| num_speculative_tokens: 5 | ||
| draft_tensor_parallel_size: 1 | ||
| data: | ||
| max_input_seq_length: 1024 | ||
| logger: | ||
| log_dir: logs/grpo-qwen2.5-7b-spec-decode-1n8g | ||
| wandb_enabled: false | ||
| tensorboard_enabled: true | ||
| wandb: | ||
| project: nemo-rl | ||
| name: grpo-qwen2.5-7b-spec-decode-1n8g | ||
| cluster: | ||
| gpus_per_node: 8 |
There was a problem hiding this comment.
Filename and config contents are inconsistent — the file says "qwen2.5-7b" but uses Qwen3-4B.
The filename grpo-qwen2.5-7b-spec-decode-1n8g.yaml and internal paths (checkpoint_dir, log_dir, wandb name) all reference "qwen2.5-7b", but the actual model is Qwen/Qwen3-4B (line 9) with draft model Qwen/Qwen3-0.6B (line 30). The repro doc even explains why Qwen3 was chosen over Qwen2.5 (vocab size mismatch).
Rename the file and update internal references to match the actual model, e.g. grpo-qwen3-4b-spec-decode-1n8g.yaml. As per coding guidelines, LLM recipes should follow the naming pattern <algo>-<model>-<nodes>n<gpus>g-<strategy-and-params>[-modifiers].
🤖 Prompt for AI Agents
In `@examples/configs/recipes/llm/grpo-qwen2.5-7b-spec-decode-1n8g.yaml` around
lines 1 - 43, The filename and internal identifiers claim "qwen2.5-7b" but the
policy.model_name (Qwen/Qwen3-4B) and speculative draft model
(vllm_kwargs.speculative_config.model = Qwen/Qwen3-0.6B) indicate Qwen3; rename
the recipe and update internal references to be consistent (e.g., change
filename to grpo-qwen3-4b-spec-decode-1n8g.yaml and update checkpoint_dir,
logger.log_dir, wandb.name, and any other strings that contain "qwen2.5-7b" to
"qwen3-4b"), ensuring the name follows the pattern
<algo>-<model>-<nodes>n<gpus>g-<strategy-and-params>.
| # sudo apt-get install libibverbs-dev | ||
| "deep_ep @ git+https://github.com/deepseek-ai/DeepEP.git@bfded34800dfec415b71503f8205181de90b2480", | ||
| "vllm==0.11.2", | ||
| "vllm>=0.12", |
There was a problem hiding this comment.
Unbounded upper version for vllm is risky.
vllm>=0.12 has no upper bound. vLLM frequently introduces breaking API changes between minor releases. While uv.lock pins the resolved version, anyone syncing without the lockfile (or when the lock is refreshed) could pick up an incompatible version. Consider adding an upper bound, e.g. vllm>=0.12,<0.15.
🤖 Prompt for AI Agents
In `@pyproject.toml` at line 79, The dependency declaration "vllm>=0.12" lacks an
upper bound; change it to include a conservative upper limit (e.g.
"vllm>=0.12,<0.15") in pyproject.toml to avoid pulling breaking minor releases,
then regenerate your lockfile (run your project’s lock command such as poetry
lock or pip-compile) and run tests/CI to ensure compatibility; update any
dependency metadata that references the old unconstrained spec if present.
| # NOTE: VLLM_USE_PRECOMPILED=1 didn't always seem to work since the wheels were sometimes built against an incompatible torch/cuda combo. | ||
| # This commit was chosen as one close to the v0.10 release: git merge-base --fork-point origin/main tags/v0.10.0 | ||
| VLLM_WHEEL_COMMIT=${3:-862f2ef893d9751db0a92bd2d4ae0e3d9677872f} # use full commit hash from the main branch | ||
| export VLLM_PRECOMPILED_WHEEL_LOCATION="https://wheels.vllm.ai/${VLLM_WHEEL_COMMIT}/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl" | ||
| VLLM_WHEEL_COMMIT=${3:-4a5299c93ff97c26def537b92562df5ada530fea} # merge commit of vllm PR #24322 (spec decode with draft models) | ||
| export VLLM_PRECOMPILED_WHEEL_LOCATION="${VLLM_PRECOMPILED_WHEEL_LOCATION:-https://wheels.vllm.ai/${VLLM_WHEEL_COMMIT}/vllm-0.14.0rc2.dev156%2Bg4a5299c93-cp38-abi3-manylinux_2_31_x86_64.whl}" |
There was a problem hiding this comment.
Stale comment and fragile wheel URL default.
Two issues:
-
Stale comment (line 26): The comment still references "close to the v0.10 release" and
git merge-base, but the commit is now from vLLM PR#24322targeting 0.14.x. This is misleading. -
Fragile fallback URL (line 28): The default
VLLM_PRECOMPILED_WHEEL_LOCATIONembeds a hardcoded version string (vllm-0.14.0rc2.dev156%2Bg4a5299c93) that's specific to the default commit. If a user overridesVLLM_WHEEL_COMMITvia$3without also settingVLLM_PRECOMPILED_WHEEL_LOCATION, the fallback URL will point to the wrong wheel. Consider either documenting this coupling more prominently or erroring out if$3is provided without an explicit wheel location.
Suggested comment fix
-# NOTE: VLLM_USE_PRECOMPILED=1 didn't always seem to work since the wheels were sometimes built against an incompatible torch/cuda combo.
-# This commit was chosen as one close to the v0.10 release: git merge-base --fork-point origin/main tags/v0.10.0
+# NOTE: VLLM_USE_PRECOMPILED=1 didn't always seem to work since the wheels were sometimes built against an incompatible torch/cuda combo.
+# WARNING: The default wheel URL below is specific to the default VLLM_WHEEL_COMMIT. If overriding $3, also set VLLM_PRECOMPILED_WHEEL_LOCATION.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # NOTE: VLLM_USE_PRECOMPILED=1 didn't always seem to work since the wheels were sometimes built against an incompatible torch/cuda combo. | |
| # This commit was chosen as one close to the v0.10 release: git merge-base --fork-point origin/main tags/v0.10.0 | |
| VLLM_WHEEL_COMMIT=${3:-862f2ef893d9751db0a92bd2d4ae0e3d9677872f} # use full commit hash from the main branch | |
| export VLLM_PRECOMPILED_WHEEL_LOCATION="https://wheels.vllm.ai/${VLLM_WHEEL_COMMIT}/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl" | |
| VLLM_WHEEL_COMMIT=${3:-4a5299c93ff97c26def537b92562df5ada530fea} # merge commit of vllm PR #24322 (spec decode with draft models) | |
| export VLLM_PRECOMPILED_WHEEL_LOCATION="${VLLM_PRECOMPILED_WHEEL_LOCATION:-https://wheels.vllm.ai/${VLLM_WHEEL_COMMIT}/vllm-0.14.0rc2.dev156%2Bg4a5299c93-cp38-abi3-manylinux_2_31_x86_64.whl}" | |
| # NOTE: VLLM_USE_PRECOMPILED=1 didn't always seem to work since the wheels were sometimes built against an incompatible torch/cuda combo. | |
| # WARNING: The default wheel URL below is specific to the default VLLM_WHEEL_COMMIT. If overriding $3, also set VLLM_PRECOMPILED_WHEEL_LOCATION. | |
| VLLM_WHEEL_COMMIT=${3:-4a5299c93ff97c26def537b92562df5ada530fea} # merge commit of vllm PR `#24322` (spec decode with draft models) | |
| export VLLM_PRECOMPILED_WHEEL_LOCATION="${VLLM_PRECOMPILED_WHEEL_LOCATION:-https://wheels.vllm.ai/${VLLM_WHEEL_COMMIT}/vllm-0.14.0rc2.dev156%2Bg4a5299c93-cp38-abi3-manylinux_2_31_x86_64.whl}" |
🤖 Prompt for AI Agents
In `@tools/build-custom-vllm.sh` around lines 25 - 28, Update the stale comment to
mention the actual PR/commit (vLLM PR `#24322` targeting 0.14.x) and remove the
misleading "v0.10" / git merge-base text; then make the fallback wheel URL
robust by constructing it from the VLLM_WHEEL_COMMIT variable instead of
embedding a hardcoded version string and add a guard: if a user supplies $3
(overrides VLLM_WHEEL_COMMIT) but does not set VLLM_PRECOMPILED_WHEEL_LOCATION,
print a clear error and exit (or require they set
VLLM_PRECOMPILED_WHEEL_LOCATION) so the commit and wheel URL cannot get out of
sync; reference VLLM_WHEEL_COMMIT and VLLM_PRECOMPILED_WHEEL_LOCATION in the
change.
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Issues
List issues that this PR closes (syntax):
Usage
# Add a code snippet demonstrating how to use thisBefore your PR is "Ready for review"
Pre checks:
Additional Information
Summary by CodeRabbit
New Features
Documentation
Chores