-
Notifications
You must be signed in to change notification settings - Fork 797
fix: Update vLLM examples routing policy #601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cda6121 to
a46e17e
Compare
|
This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
WalkthroughThe configuration files for example LLM setups were updated. The routing strategy for distributing requests among workers was changed from random or key-value-based to round-robin in both configurations. Additionally, prefix caching was disabled for VllmWorker in one configuration. Changes
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
examples/llm/configs/agg.yaml (1)
25-40: Suggest unifying router placement across configs.In
disagg_router.yaml,routerlives inCommon(plus aRoutersection), whereas inagg.yamlit’s underProcessor/VllmWorkerwith noRoutersection. Standardizing the hierarchy of routing settings across examples will reduce confusion.
🧹 Nitpick comments (1)
examples/llm/configs/disagg_router.yaml (1)
34-46: Suggest adding an explanatory comment for disagg design.Consider annotating why
conditional-disaggis enabled and prefix caching is disabled (e.g., “max-local-prefill-length is very short, so remote prefill is preferred”). This will aid future maintainers.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
examples/llm/configs/agg.yaml(1 hunks)examples/llm/configs/disagg_router.yaml(1 hunks)
🔇 Additional comments (4)
examples/llm/configs/disagg_router.yaml (2)
17-21: Approve routing strategy update to round-robin.Switching from
"kv"to"round-robin"in theCommonsection ensures requests are evenly distributed across prefill workers.
34-38: Approve disabling prefix caching for disaggregated workers.Removing
enable-prefix-cachingaligns with the intended use of remote prefill when the local prefill capacity is minimal.examples/llm/configs/agg.yaml (2)
25-28: Verify router at the Processor level.You added
router: round-robinunderProcessor. Confirm that the config loader supports router definitions here and not only inCommonor a dedicatedRoutersection.
29-34: Approve VllmWorker routing change.Changing the
VllmWorker’srouterfromrandomtoround-robinimproves predictability and load balancing for aggregated decode.
|
Examples have changed a lot since this. |
Overview:
Update vLLM examples routing policy for better UX and perf
Details:
Related PR: #597
Where should the reviewer start?
N/A
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
N/A
Summary by CodeRabbit