Skip to content

Fix GLM4-MoE HF export and RL vLLM reload compatibility#1886

Open
mansimov wants to merge 1 commit intoPrimeIntellect-ai:mainfrom
mansimov:fix/glm4-moe-vllm-reload-compat
Open

Fix GLM4-MoE HF export and RL vLLM reload compatibility#1886
mansimov wants to merge 1 commit intoPrimeIntellect-ai:mainfrom
mansimov:fix/glm4-moe-vllm-reload-compat

Conversation

@mansimov
Copy link

@mansimov mansimov commented Feb 25, 2026

Summary

This PR fixes a set of GLM4-MoE compatibility issues found while running the small-scale MoE flow (mini_moe -> SFT warmup -> RL):

  1. scripts/mini_moe.py verify path no longer unexpectedly requires accelerate
  2. GLM4-MoE PrimeRL -> HF export now writes fused routed-expert keys expected by current transformers
  3. GLM4-MoE config now tolerates serialized head_dim (needed for vLLM/transformers config compatibility)
  4. RL filesystem broadcasts convert GLM4-MoE fused expert weights back to legacy per-expert keys so vLLM live /update_weights reload works

Root Cause

There were two key format mismatches:

  • Checkpoint export mismatch: PrimeRL GLM4-MoE TT->HF conversion emitted legacy per-expert keys, but current transformers GLM4-MoE expects fused
    keys (gate_up_proj, down_proj)
  • vLLM live reload mismatch: after fixing checkpoint export to fused keys, vLLM initial load succeeded, but vLLM live reload (/update_weights) still
    failed on fused GLM4-MoE expert keys during RL filesystem weight broadcasts

Changes

scripts/mini_moe.py

  • Load models on CPU in verify(...), then move to CUDA for parity checks
  • Avoids transformers from_pretrained() requiring accelerate due to torch.device("cuda") context interaction

src/prime_rl/trainer/models/glm4_moe/converting_glm4_moe.py

  • Update GLM4-MoE convert_tt_to_hf to export fused routed expert tensors:
  • model.layers.<i>.mlp.experts.gate_up_proj
  • model.layers.<i>.mlp.experts.down_proj

src/prime_rl/trainer/models/glm4_moe/configuration_glm4_moe.py

  • Make local config compatible with serialized head_dim values in checkpoint config.json

src/prime_rl/trainer/rl/broadcast/filesystem.py

  • Add broadcast-only GLM4-MoE fused->per-expert conversion before saving filesystem broadcast checkpoints
  • Keeps:
  • outputs/weights/step_* HF-compatible (fused)
  • outputs/run_default/broadcasts/step_* vLLM-reload-compatible (per-expert)

Validation

Roundtrip verification

  • uv run python scripts/mini_moe.py --arch glm4_moe --output-dir outputs/weights/step_200 --verify-only
  • Result: HF vs PrimeRL and HF -> PrimeRL -> HF max logits diffs are small, verification passes

RL run with live updates

  • Mini GLM4-MoE RL run progresses across many steps (including live weight broadcasts / vLLM reloads)
  • Verified reward is logged by orchestrator and updates continue without collective_rpc update_weights_from_path failure

Environment (validated on)

  • torch==2.10.0+cu128
  • transformers==5.2.0.dev0
  • vllm==0.16.0
  • Python 3.12.6
  • NVIDIA A100-SXM4-40GB
  • NVIDIA driver 570.86.15 (CUDA 12.8)

Note

Medium Risk
Touches model weight conversion and checkpoint/broadcast serialization for GLM4-MoE, so a mistake could break loading or live reload in Transformers/vLLM. Scope is localized and adds targeted compatibility shims rather than broad refactors.

Overview
Fixes GLM4-MoE weight format mismatches across HF export, roundtrip verification, and RL vLLM live reload.

GLM4-MoE TT→HF conversion now exports routed experts in the new fused Transformers format (mlp.experts.gate_up_proj + mlp.experts.down_proj) instead of per-expert keys, and Glm4MoeConfig now tolerates serialized head_dim via a property.

For RL filesystem broadcasts, GLM4-MoE checkpoints are converted back from fused to per-expert tensors right before saving so vLLM /update_weights reload continues to work, and scripts/mini_moe.py verification avoids torch.device("cuda") during from_pretrained() (preventing an unexpected accelerate requirement) by loading on CPU then moving models/inputs to CUDA.

Written by Cursor Bugbot for commit b96a8a8. This will update automatically on new commits. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant