Fix GLM4-MoE HF export and RL vLLM reload compatibility by mansimov · Pull Request #1886 · PrimeIntellect-ai/prime-rl

mansimov · 2026-02-25T16:44:49Z

Summary

This PR fixes a set of GLM4-MoE compatibility issues found while running the small-scale MoE flow (mini_moe -> SFT warmup -> RL):

scripts/mini_moe.py verify path no longer unexpectedly requires accelerate
GLM4-MoE PrimeRL -> HF export now writes fused routed-expert keys expected by current transformers
GLM4-MoE config now tolerates serialized head_dim (needed for vLLM/transformers config compatibility)
RL filesystem broadcasts convert GLM4-MoE fused expert weights back to legacy per-expert keys so vLLM live /update_weights reload works

Root Cause

There were two key format mismatches:

Checkpoint export mismatch: PrimeRL GLM4-MoE TT->HF conversion emitted legacy per-expert keys, but current transformers GLM4-MoE expects fused
keys (gate_up_proj, down_proj)
vLLM live reload mismatch: after fixing checkpoint export to fused keys, vLLM initial load succeeded, but vLLM live reload (/update_weights) still
failed on fused GLM4-MoE expert keys during RL filesystem weight broadcasts

Changes

`scripts/mini_moe.py`

Load models on CPU in verify(...), then move to CUDA for parity checks
Avoids transformers from_pretrained() requiring accelerate due to torch.device("cuda") context interaction

`src/prime_rl/trainer/models/glm4_moe/converting_glm4_moe.py`

Update GLM4-MoE convert_tt_to_hf to export fused routed expert tensors:
model.layers.<i>.mlp.experts.gate_up_proj
model.layers.<i>.mlp.experts.down_proj

`src/prime_rl/trainer/models/glm4_moe/configuration_glm4_moe.py`

Make local config compatible with serialized head_dim values in checkpoint config.json

`src/prime_rl/trainer/rl/broadcast/filesystem.py`

Add broadcast-only GLM4-MoE fused->per-expert conversion before saving filesystem broadcast checkpoints
Keeps:
outputs/weights/step_* HF-compatible (fused)
outputs/run_default/broadcasts/step_* vLLM-reload-compatible (per-expert)

Validation

Roundtrip verification

uv run python scripts/mini_moe.py --arch glm4_moe --output-dir outputs/weights/step_200 --verify-only
Result: HF vs PrimeRL and HF -> PrimeRL -> HF max logits diffs are small, verification passes

RL run with live updates

Mini GLM4-MoE RL run progresses across many steps (including live weight broadcasts / vLLM reloads)
Verified reward is logged by orchestrator and updates continue without collective_rpc update_weights_from_path failure

Environment (validated on)

torch==2.10.0+cu128
transformers==5.2.0.dev0
vllm==0.16.0
Python 3.12.6
NVIDIA A100-SXM4-40GB
NVIDIA driver 570.86.15 (CUDA 12.8)

Note

Medium Risk
Touches model weight conversion and checkpoint/broadcast serialization for GLM4-MoE, so a mistake could break loading or live reload in Transformers/vLLM. Scope is localized and adds targeted compatibility shims rather than broad refactors.

Overview
Fixes GLM4-MoE weight format mismatches across HF export, roundtrip verification, and RL vLLM live reload.

GLM4-MoE TT→HF conversion now exports routed experts in the new fused Transformers format (mlp.experts.gate_up_proj + mlp.experts.down_proj) instead of per-expert keys, and Glm4MoeConfig now tolerates serialized head_dim via a property.

For RL filesystem broadcasts, GLM4-MoE checkpoints are converted back from fused to per-expert tensors right before saving so vLLM /update_weights reload continues to work, and scripts/mini_moe.py verification avoids torch.device("cuda") during from_pretrained() (preventing an unexpected accelerate requirement) by loading on CPU then moving models/inputs to CUDA.

^{Written by Cursor Bugbot for commit b96a8a8. This will update automatically on new commits. Configure here.}

Fix GLM4-MoE HF export and RL vLLM reload compatibility

b96a8a8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix GLM4-MoE HF export and RL vLLM reload compatibility#1886

Fix GLM4-MoE HF export and RL vLLM reload compatibility#1886
mansimov wants to merge 1 commit intoPrimeIntellect-ai:mainfrom
mansimov:fix/glm4-moe-vllm-reload-compat

mansimov commented Feb 25, 2026 •

edited by cursor bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mansimov commented Feb 25, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

Changes

scripts/mini_moe.py

src/prime_rl/trainer/models/glm4_moe/converting_glm4_moe.py

src/prime_rl/trainer/models/glm4_moe/configuration_glm4_moe.py

src/prime_rl/trainer/rl/broadcast/filesystem.py

Validation

Roundtrip verification

RL run with live updates

Environment (validated on)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mansimov commented Feb 25, 2026 •

edited by cursor bot

Loading

`scripts/mini_moe.py`

`src/prime_rl/trainer/models/glm4_moe/converting_glm4_moe.py`

`src/prime_rl/trainer/models/glm4_moe/configuration_glm4_moe.py`

`src/prime_rl/trainer/rl/broadcast/filesystem.py`