-
Notifications
You must be signed in to change notification settings - Fork 3.1k
[rollout] fix: add note for running deepseek v31 with vllm 0.12.0 #4953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[rollout] fix: add note for running deepseek v31 with vllm 0.12.0 #4953
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces error handling for network issues during get_encoding() and updates a run script with a configuration fix. The changes are generally good, but I've identified a couple of areas for improvement. The shell script has an inconsistent command-line argument that could lead to an error. Additionally, the Python code contains duplicated error handling logic, which should be refactored to enhance maintainability.
| +actor_rollout_ref.actor.megatron.override_transformer_config.moe_token_dispatcher_type=flex \ | ||
| +actor_rollout_ref.actor.megatron.override_transformer_config.recompute_method=uniform \ | ||
| +actor_rollout_ref.actor.megatron.override_transformer_config.recompute_granularity=full \ | ||
| actor_rollout_ref.actor.megatron.override_transformer_config.attention_backend='fused' \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is missing the + prefix used for other override_transformer_config arguments. This inconsistency could lead to an error if attention_backend is not defined in the base configuration. For consistency and to prevent potential issues, please add the + prefix.
| actor_rollout_ref.actor.megatron.override_transformer_config.attention_backend='fused' \ | |
| +actor_rollout_ref.actor.megatron.override_transformer_config.attention_backend='fused' \ |
| if _VLLM_VERSION == version.parse("0.12.0"): | ||
| from vllm.entrypoints.harmony_utils import get_encoding | ||
|
|
||
| get_encoding() | ||
| try: | ||
| get_encoding() | ||
| except Exception as e: | ||
| logger.warning( | ||
| f"Failed to call get_encoding() for vLLM {_VLLM_VERSION}: {e}. " | ||
| "This operation requires network access. Please ensure you have internet connection " | ||
| "or configure proxy settings if needed." | ||
| ) | ||
| raise | ||
| elif _VLLM_VERSION >= version.parse("0.13.0"): | ||
| from vllm.entrypoints.openai.parser.harmony_utils import get_encoding | ||
|
|
||
| get_encoding() | ||
| try: | ||
| get_encoding() | ||
| except Exception as e: | ||
| logger.warning( | ||
| f"Failed to call get_encoding() for vLLM {_VLLM_VERSION}: {e}. " | ||
| "This operation requires network access. Please ensure you have internet connection " | ||
| "or configure proxy settings if needed." | ||
| ) | ||
| raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The try-except block for handling exceptions from get_encoding() is duplicated across the if and elif branches. This code duplication makes the code harder to maintain, as any future changes would need to be applied in two places. I suggest refactoring this to remove the duplication.
| if _VLLM_VERSION == version.parse("0.12.0"): | |
| from vllm.entrypoints.harmony_utils import get_encoding | |
| get_encoding() | |
| try: | |
| get_encoding() | |
| except Exception as e: | |
| logger.warning( | |
| f"Failed to call get_encoding() for vLLM {_VLLM_VERSION}: {e}. " | |
| "This operation requires network access. Please ensure you have internet connection " | |
| "or configure proxy settings if needed." | |
| ) | |
| raise | |
| elif _VLLM_VERSION >= version.parse("0.13.0"): | |
| from vllm.entrypoints.openai.parser.harmony_utils import get_encoding | |
| get_encoding() | |
| try: | |
| get_encoding() | |
| except Exception as e: | |
| logger.warning( | |
| f"Failed to call get_encoding() for vLLM {_VLLM_VERSION}: {e}. " | |
| "This operation requires network access. Please ensure you have internet connection " | |
| "or configure proxy settings if needed." | |
| ) | |
| raise | |
| get_encoding_fn = None | |
| if _VLLM_VERSION == version.parse("0.12.0"): | |
| from vllm.entrypoints.harmony_utils import get_encoding as get_encoding_fn | |
| elif _VLLM_VERSION >= version.parse("0.13.0"): | |
| from vllm.entrypoints.openai.parser.harmony_utils import get_encoding as get_encoding_fn | |
| if get_encoding_fn: | |
| try: | |
| get_encoding_fn() | |
| except Exception as e: | |
| logger.warning( | |
| f"Failed to call get_encoding() for vLLM {_VLLM_VERSION}: {e}. " | |
| "This operation requires network access. Please ensure you have internet connection " | |
| "or configure proxy settings if needed." | |
| ) | |
| raise |
|
Shall we just let |
Yes, maybe we can remove the "raise" so that get_encoding() failures are non-fatal. For non-gpt-oss models, only a warning will be logged and execution continues. If gpt-oss is actually used, the error will be raised again at that point. |
Shall we introduce an environment variable to enable this? I didn't see ways to enable this at runtime |
fix by running |
What does this PR do?
Add try-except with warning when get_encoding() fails due to network

issues. Users should run get_encoding() in a network-enabled environment
first to cache the vocab file.
Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,megatron,veomni,sglang,vllm,rollout,trainer,ci,training_utils,recipe,hardware,deployment,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,cfg,reward,like[megatron, fsdp, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batchingTest
API and Usage Example
# Add code snippet or script demonstrating how to use thisDesign & Code Changes
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)recipesubmodule, please also update the reference to the submodule commit viagit submodule update --remoteorcd recipe && git pull origin main.