Skip to content

Conversation

@tAnGjIa520
Copy link
Contributor

@tAnGjIa520 tAnGjIa520 commented Dec 1, 2025

This PR introduces a batch-optimized AlphaZero MCTS implementation in C++, achieving 2x
speedup compared to the standard sequential version.

Batch MCTS Inference: The core improvement is the get_next_actions_batch() function (line
207-415 in mcts_alphazero.cpp), which processes multiple game states simultaneously. Instead of running MCTS
simulations sequentially for each environment, we now:

  1. Batch Root Expansion: Initialize multiple root nodes and expand them with a single batched neural network call
    (policy_value_func_batch), reducing GPU overhead
  2. Parallel Simulation Phase: Run simulations for all environments simultaneously (line 280-369), collecting leaf
    nodes that need expansion
  3. Batch Leaf Expansion: Group all leaf nodes from unfinished games and perform batched inference via
    _batch_expand_leaf_nodes() (line 162-205), minimizing individual network calls
  4. Legal Action Caching: Cache legal actions at the environment level to avoid repeated Python calls during child
    selection, significantly reducing Python-C++ interface overhead

本 PR 引入了批量优化的 AlphaZero MCTS C++ 实现,相比标准顺序版本实现了 2 倍加速。

批量 MCTS 推理: 核心改进是 get_next_actions_batch() 函数(mcts_alphazero.cpp 第 207-415
行),可同时处理多个游戏状态。我们不再为每个环境顺序运行 MCTS 模拟,而是:

  1. 批量根节点扩展: 初始化多个根节点并通过单次批量神经网络调用(policy_value_func_batch)进行扩展,减少 GPU 开销
  2. 并行模拟阶段: 同时为所有环境运行模拟(第 280-369 行),收集需要扩展的叶节点
  3. 批量叶节点扩展: 将所有未完成游戏的叶节点分组,通过 _batch_expand_leaf_nodes()(第 162-205
    行)执行批量推理,最小化单独的网络调用
  4. 合法动作缓存: 在环境层级缓存合法动作,避免在子节点选择期间重复的 Python 调用,显著减少 Python-C++ 接口开销

@puyuan1996 puyuan1996 added the enhancement New feature or request label Dec 4, 2025
  Document C++-Python env interaction as bottleneck and suggest C++ implementation.
  - Convert all Chinese comments to English
  - Add parameter documentation for batch MCTS calls
  - Follow Google Style documentation standards
@puyuan1996 puyuan1996 changed the title feature(tj): add batch alpha-zero feature(tj): optimize AlphaZero with batch inference support Dec 6, 2025
   - Document C++-Python interaction bottleneck in current architecture
   - Note env_cpp_ification as future optimization direction
   - Mark get_next_action as non-batch version with reference to batch alternative
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants