[model] feat: add NPU GRPO training scripts for Qwen2.5-32B/Qwen3-30B (Megaton/vLLM backends) #4984

psyloy · 2026-01-19T12:35:52Z

What does this PR do?

This PR mainly adds the following content for NPU (Ascend) platform:

Add GRPO training scripts for Qwen2.5-32B model based on Megaton and vLLM backends.
Add GRPO training scripts for Qwen3-30B model based on Megaton and vLLM backends.

Test

Qwen2.5-32B grpo training with gms8k :

Qwen3-30B grpo training with gms8k :

API and Usage Example

# Run Qwen2.5-32B GRPO training (Megatron + vLLM backend) on Ascend NPU
bash ./examples/grpo_trainer/run_qwen2_5-32b_grpo_megatron_vllm_npu.sh

# Run Qwen3MoE-30B GRPO training (Megatron + vLLM backend) on Ascend NPU
bash ./examples/grpo_trainer/run_qwen3moe-30b_grpo_megatron_vllm_npu.sh

CLAassistant · 2026-01-19T12:36:02Z

All committers have signed the CLA.

gemini-code-assist

Code Review

This pull request introduces new GRPO training scripts for Qwen models on Ascend NPUs, along with corresponding documentation. My review focuses on correctness issues within the newly added shell scripts and the documentation file. I have identified several critical syntax errors in the shell scripts that would cause them to fail, likely due to copy-paste mistakes (e.g., + prefixes and the use of an undefined variable). The documentation's code examples contain similar errors. I have provided specific suggestions to rectify these issues.

gemini-code-assist · 2026-01-19T12:37:35Z

examples/grpo_trainer/run_qwen2_5-32b_grpo_megatron_vllm_npu.sh

+    +actor_rollout_ref.actor.optim.override_optimizer_config.optimizer_offload_fraction=${optimizer_offload_fraction}
+    +actor_rollout_ref.actor.optim.override_optimizer_config.use_precision_aware_optimizer=True
+    +actor_rollout_ref.actor.optim.override_optimizer_config.optimizer_cpu_offload=True


These lines have a + prefix, which is invalid syntax for a bash array assignment. This appears to be a copy-paste error from a diff and will cause the script to fail. Please remove the + prefixes.

Suggested change

+actor_rollout_ref.actor.optim.override_optimizer_config.optimizer_offload_fraction=${optimizer_offload_fraction}

+actor_rollout_ref.actor.optim.override_optimizer_config.use_precision_aware_optimizer=True

+actor_rollout_ref.actor.optim.override_optimizer_config.optimizer_cpu_offload=True

actor_rollout_ref.actor.optim.override_optimizer_config.optimizer_offload_fraction=${optimizer_offload_fraction}

actor_rollout_ref.actor.optim.override_optimizer_config.use_precision_aware_optimizer=True

actor_rollout_ref.actor.optim.override_optimizer_config.optimizer_cpu_offload=True

gemini-code-assist · 2026-01-19T12:37:35Z

examples/grpo_trainer/run_qwen2_5-32b_grpo_megatron_vllm_npu.sh

+    actor_rollout_ref.actor.megatron.grad_offload=${all_offload}
+    actor_rollout_ref.actor.megatron.dist_checkpointing_path=${MCORE_MODEL_PATH}
+    actor_rollout_ref.actor.megatron.use_dist_checkpointing=False
+    +actor_rollout_ref.actor.megatron.override_transformer_config.use_flash_attn=True


This line has a + prefix, which is invalid syntax for a bash array assignment. This appears to be a copy-paste error from a diff and will cause the script to fail. Please remove the + prefix.

Suggested change

+actor_rollout_ref.actor.megatron.override_transformer_config.use_flash_attn=True

actor_rollout_ref.actor.megatron.override_transformer_config.use_flash_attn=True

gemini-code-assist · 2026-01-19T12:37:35Z

examples/grpo_trainer/run_qwen2_5-32b_grpo_megatron_vllm_npu.sh

+    "${ACTOR_CONFIG[@]}" \
+    "${REF_CONFIG[@]}" \
+    "${ROLLOUT_CONFIG[@]}" \
+    "${REWARD_MODEL_CONFIG[@]}" \


The REWARD_MODEL_CONFIG variable is used here but is not defined anywhere in the script. Since set -u is active, this will cause the script to fail with an 'unbound variable' error. This parameter does not seem to be required according to the provided documentation. Please remove this line.

gemini-code-assist · 2026-01-19T12:37:35Z

examples/grpo_trainer/run_qwen3moe-30b_grpo_megatron_vllm_npu.sh

+    +actor_rollout_ref.actor.optim.override_optimizer_config.optimizer_offload_fraction=${optimizer_offload_fraction}
+    +actor_rollout_ref.actor.optim.override_optimizer_config.use_precision_aware_optimizer=True
+    +actor_rollout_ref.actor.optim.override_optimizer_config.optimizer_cpu_offload=True


These lines have a + prefix, which is invalid syntax for a bash array assignment. This appears to be a copy-paste error from a diff and will cause the script to fail. Please remove the + prefixes.

Suggested change

+actor_rollout_ref.actor.optim.override_optimizer_config.optimizer_offload_fraction=${optimizer_offload_fraction}

+actor_rollout_ref.actor.optim.override_optimizer_config.use_precision_aware_optimizer=True

+actor_rollout_ref.actor.optim.override_optimizer_config.optimizer_cpu_offload=True

actor_rollout_ref.actor.optim.override_optimizer_config.optimizer_offload_fraction=${optimizer_offload_fraction}

actor_rollout_ref.actor.optim.override_optimizer_config.use_precision_aware_optimizer=True

actor_rollout_ref.actor.optim.override_optimizer_config.optimizer_cpu_offload=True

gemini-code-assist · 2026-01-19T12:37:35Z

examples/grpo_trainer/run_qwen3moe-30b_grpo_megatron_vllm_npu.sh

+    +actor_rollout_ref.actor.megatron.override_transformer_config.use_flash_attn=True
+    +actor_rollout_ref.actor.megatron.override_transformer_config.recompute_method=uniform
+    +actor_rollout_ref.actor.megatron.override_transformer_config.recompute_granularity=full
+    +actor_rollout_ref.actor.megatron.override_transformer_config.recompute_num_layers=1


These lines have a + prefix, which is invalid syntax for a bash array assignment. This appears to be a copy-paste error from a diff and will cause the script to fail. Please remove the + prefixes.

Suggested change

+actor_rollout_ref.actor.megatron.override_transformer_config.use_flash_attn=True

+actor_rollout_ref.actor.megatron.override_transformer_config.recompute_method=uniform

+actor_rollout_ref.actor.megatron.override_transformer_config.recompute_granularity=full

+actor_rollout_ref.actor.megatron.override_transformer_config.recompute_num_layers=1

actor_rollout_ref.actor.megatron.override_transformer_config.use_flash_attn=True

actor_rollout_ref.actor.megatron.override_transformer_config.recompute_method=uniform

actor_rollout_ref.actor.megatron.override_transformer_config.recompute_granularity=full

actor_rollout_ref.actor.megatron.override_transformer_config.recompute_num_layers=1

gemini-code-assist · 2026-01-19T12:37:35Z

examples/grpo_trainer/run_qwen3moe-30b_grpo_megatron_vllm_npu.sh

+    "${ACTOR_CONFIG[@]}" \
+    "${REF_CONFIG[@]}" \
+    "${ROLLOUT_CONFIG[@]}" \
+    "${REWARD_MODEL_CONFIG[@]}" \


The REWARD_MODEL_CONFIG variable is used here but is not defined anywhere in the script. Since set -u is active, this will cause the script to fail with an 'unbound variable' error. This parameter does not seem to be required according to the provided documentation. Please remove this line.

gemini-code-assist · 2026-01-19T12:37:36Z

docs/ascend_tutorial/ascend_training_configuration_guide.md

+    +actor_rollout_ref.actor.megatron.override_transformer_config.use_flash_attn=True
+    +actor_rollout_ref.actor.megatron.override_transformer_config.recompute_method=uniform
+    +actor_rollout_ref.actor.megatron.override_transformer_config.recompute_granularity=full
+    +actor_rollout_ref.actor.megatron.override_transformer_config.recompute_num_layers=1


These lines in the shell code block have a + prefix, which appears to be a copy-paste artifact from a diff. This makes the example code syntactically incorrect. Please remove the + prefixes to ensure the example is valid.

Suggested change

+actor_rollout_ref.actor.megatron.override_transformer_config.use_flash_attn=True

+actor_rollout_ref.actor.megatron.override_transformer_config.recompute_method=uniform

+actor_rollout_ref.actor.megatron.override_transformer_config.recompute_granularity=full

+actor_rollout_ref.actor.megatron.override_transformer_config.recompute_num_layers=1

actor_rollout_ref.actor.megatron.override_transformer_config.use_flash_attn=True

actor_rollout_ref.actor.megatron.override_transformer_config.recompute_method=uniform

actor_rollout_ref.actor.megatron.override_transformer_config.recompute_granularity=full

actor_rollout_ref.actor.megatron.override_transformer_config.recompute_num_layers=1

gemini-code-assist · 2026-01-19T12:37:36Z

docs/ascend_tutorial/ascend_training_configuration_guide.md

+    actor_rollout_ref.rollout.top_p=${top_p}
+    actor_rollout_ref.rollout.top_k=${top_k}
+    actor_rollout_ref.rollout.temperature=${temperature}


The variables ${top_p}, ${top_k}, and ${temperature} are used here but are not defined in the script example. This will cause an error if the code is executed. Please define these variables or replace them with example values, such as the ones used in the other training scripts (1.0, -1, 1.0 respectively).

Suggested change

actor_rollout_ref.rollout.top_p=${top_p}

actor_rollout_ref.rollout.top_k=${top_k}

actor_rollout_ref.rollout.temperature=${temperature}

actor_rollout_ref.rollout.top_p=1.0

actor_rollout_ref.rollout.top_k=-1

actor_rollout_ref.rollout.temperature=1.0

gemini-code-assist · 2026-01-19T12:37:36Z

docs/ascend_tutorial/ascend_training_configuration_guide.md

+    actor_rollout_ref.rollout.val_kwargs.top_p=${top_p}
+    actor_rollout_ref.rollout.val_kwargs.top_k=${top_k}
+    actor_rollout_ref.rollout.val_kwargs.temperature=${temperature}


The variables ${top_p}, ${top_k}, and ${temperature} are used here but are not defined in the script example. This will cause an error if the code is executed. Please define these variables or replace them with example values, such as the ones used in the other training scripts (1.0, -1, 1.0 respectively).

Suggested change

actor_rollout_ref.rollout.val_kwargs.top_p=${top_p}

actor_rollout_ref.rollout.val_kwargs.top_k=${top_k}

actor_rollout_ref.rollout.val_kwargs.temperature=${temperature}

actor_rollout_ref.rollout.val_kwargs.top_p=1.0

actor_rollout_ref.rollout.val_kwargs.top_k=-1

actor_rollout_ref.rollout.val_kwargs.temperature=1.0

docs/ascend_tutorial/ascend_training_configuration_guide.md

…po_megatron_vllm_npu.sh

psyloy requested review from FightingZhen, PeterSH6, ji-huazhong and vermouth1992 as code owners January 19, 2026 12:35

gemini-code-assist bot reviewed Jan 19, 2026

View reviewed changes

psyloy force-pushed the main branch 2 times, most recently from 7752678 to 692efda Compare January 26, 2026 08:58

psyloy requested a review from tardis-key as a code owner January 26, 2026 08:58

tardis-key reviewed Jan 26, 2026

View reviewed changes

docs/ascend_tutorial/ascend_training_configuration_guide.md Outdated Show resolved Hide resolved

psyloy force-pushed the main branch from 692efda to 00ac8ab Compare January 27, 2026 03:00

add run_qwen2_5-32b_grpo_megatron_vllm_npu.sh and run_qwen3moe-30b_gr…

cf2a5de

…po_megatron_vllm_npu.sh

psyloy force-pushed the main branch from 00ac8ab to cf2a5de Compare January 27, 2026 03:56

psyloy changed the title ~~[model,doc] feat: add NPU GRPO training scripts for Qwen2.5-32B/Qwen3-30B (Megaton/vLLM backends)~~ [model] feat: add NPU GRPO training scripts for Qwen2.5-32B/Qwen3-30B (Megaton/vLLM backends) Jan 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[model] feat: add NPU GRPO training scripts for Qwen2.5-32B/Qwen3-30B (Megaton/vLLM backends) #4984

[model] feat: add NPU GRPO training scripts for Qwen2.5-32B/Qwen3-30B (Megaton/vLLM backends) #4984

psyloy commented Jan 19, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Jan 19, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 19, 2026

Uh oh!

gemini-code-assist bot Jan 19, 2026

Uh oh!

gemini-code-assist bot Jan 19, 2026

Uh oh!

gemini-code-assist bot Jan 19, 2026

Uh oh!

gemini-code-assist bot Jan 19, 2026

Uh oh!

gemini-code-assist bot Jan 19, 2026

Uh oh!

gemini-code-assist bot Jan 19, 2026

Uh oh!

gemini-code-assist bot Jan 19, 2026

Uh oh!

gemini-code-assist bot Jan 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	+actor_rollout_ref.actor.megatron.override_transformer_config.use_flash_attn=True
	actor_rollout_ref.actor.megatron.override_transformer_config.use_flash_attn=True

[model] feat: add NPU GRPO training scripts for Qwen2.5-32B/Qwen3-30B (Megaton/vLLM backends) #4984

Are you sure you want to change the base?

[model] feat: add NPU GRPO training scripts for Qwen2.5-32B/Qwen3-30B (Megaton/vLLM backends) #4984

Conversation

psyloy commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Test

API and Usage Example

Uh oh!

CLAassistant commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

psyloy commented Jan 19, 2026 •

edited

Loading

CLAassistant commented Jan 19, 2026 •

edited

Loading