Skip to content

Conversation

@none0663
Copy link
Contributor

@none0663 none0663 commented Jan 16, 2026

What does this PR do?

  1. Add network error handling for get_encoding()

Add try-except with warning when get_encoding() fails due to network
issues. Users should run get_encoding() in a network-enabled environment
first to cache the vocab file.
Clipboard_Screenshot_1768555981

  1. add actor_rollout_ref.actor.megatron.override_transformer_config.attention_backend='fused' for "No dot product attention backend is available for the provided inputs" dsv3:No dot product attention backend is available for the provided inputs #3707

Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review.

Checklist Before Starting

  • Search for similar PRs. Paste at least one query link here: ...
  • Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
    • {modules} include fsdp, megatron, veomni, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg, reward
    • If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
    • {type} is in feat, fix, refactor, chore, test
    • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
    • Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces error handling for network issues during get_encoding() and updates a run script with a configuration fix. The changes are generally good, but I've identified a couple of areas for improvement. The shell script has an inconsistent command-line argument that could lead to an error. Additionally, the Python code contains duplicated error handling logic, which should be refactored to enhance maintainability.

+actor_rollout_ref.actor.megatron.override_transformer_config.moe_token_dispatcher_type=flex \
+actor_rollout_ref.actor.megatron.override_transformer_config.recompute_method=uniform \
+actor_rollout_ref.actor.megatron.override_transformer_config.recompute_granularity=full \
actor_rollout_ref.actor.megatron.override_transformer_config.attention_backend='fused' \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This line is missing the + prefix used for other override_transformer_config arguments. This inconsistency could lead to an error if attention_backend is not defined in the base configuration. For consistency and to prevent potential issues, please add the + prefix.

Suggested change
actor_rollout_ref.actor.megatron.override_transformer_config.attention_backend='fused' \
+actor_rollout_ref.actor.megatron.override_transformer_config.attention_backend='fused' \

Comment on lines 76 to 99
if _VLLM_VERSION == version.parse("0.12.0"):
from vllm.entrypoints.harmony_utils import get_encoding

get_encoding()
try:
get_encoding()
except Exception as e:
logger.warning(
f"Failed to call get_encoding() for vLLM {_VLLM_VERSION}: {e}. "
"This operation requires network access. Please ensure you have internet connection "
"or configure proxy settings if needed."
)
raise
elif _VLLM_VERSION >= version.parse("0.13.0"):
from vllm.entrypoints.openai.parser.harmony_utils import get_encoding

get_encoding()
try:
get_encoding()
except Exception as e:
logger.warning(
f"Failed to call get_encoding() for vLLM {_VLLM_VERSION}: {e}. "
"This operation requires network access. Please ensure you have internet connection "
"or configure proxy settings if needed."
)
raise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The try-except block for handling exceptions from get_encoding() is duplicated across the if and elif branches. This code duplication makes the code harder to maintain, as any future changes would need to be applied in two places. I suggest refactoring this to remove the duplication.

Suggested change
if _VLLM_VERSION == version.parse("0.12.0"):
from vllm.entrypoints.harmony_utils import get_encoding
get_encoding()
try:
get_encoding()
except Exception as e:
logger.warning(
f"Failed to call get_encoding() for vLLM {_VLLM_VERSION}: {e}. "
"This operation requires network access. Please ensure you have internet connection "
"or configure proxy settings if needed."
)
raise
elif _VLLM_VERSION >= version.parse("0.13.0"):
from vllm.entrypoints.openai.parser.harmony_utils import get_encoding
get_encoding()
try:
get_encoding()
except Exception as e:
logger.warning(
f"Failed to call get_encoding() for vLLM {_VLLM_VERSION}: {e}. "
"This operation requires network access. Please ensure you have internet connection "
"or configure proxy settings if needed."
)
raise
get_encoding_fn = None
if _VLLM_VERSION == version.parse("0.12.0"):
from vllm.entrypoints.harmony_utils import get_encoding as get_encoding_fn
elif _VLLM_VERSION >= version.parse("0.13.0"):
from vllm.entrypoints.openai.parser.harmony_utils import get_encoding as get_encoding_fn
if get_encoding_fn:
try:
get_encoding_fn()
except Exception as e:
logger.warning(
f"Failed to call get_encoding() for vLLM {_VLLM_VERSION}: {e}. "
"This operation requires network access. Please ensure you have internet connection "
"or configure proxy settings if needed."
)
raise

@vermouth1992
Copy link
Collaborator

Shall we just let get_encoding fail if we are not using gpt-oss?

@none0663
Copy link
Contributor Author

Shall we just let get_encoding fail if we are not using gpt-oss?

Yes, maybe we can remove the "raise" so that get_encoding() failures are non-fatal. For non-gpt-oss models, only a warning will be logged and execution continues. If gpt-oss is actually used, the error will be raised again at that point.

@vermouth1992
Copy link
Collaborator

Shall we just let get_encoding fail if we are not using gpt-oss?

Yes, maybe we can remove the "raise" so that get_encoding() failures are non-fatal. For non-gpt-oss models, only a warning will be logged and execution continues. If gpt-oss is actually used, the error will be raised again at that point.

Shall we introduce an environment variable to enable this? I didn't see ways to enable this at runtime

@none0663 none0663 changed the title [rollout]: add note for running deepseek v31 with vllm 0.12.0 [rollout] fix: add note for running deepseek v31 with vllm 0.12.0 Jan 19, 2026
@none0663
Copy link
Contributor Author

none0663 commented Jan 19, 2026

Shall we introduce an environment variable to enable this? I didn't see ways to enable this at runtime

fix by running get_encoding only when model type is gpt_oss, just like in the vllm repo code https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/responses/serving.py#L246

@none0663 none0663 requested a review from tardis-key as a code owner January 26, 2026 03:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants