feature(mycve): add chinese chess env and related demo #442

mycve · 2025-11-23T10:32:04Z

尝试添加第一版中国象棋的demo支持：

具有统一红方视角、历史4步帧、8100动作空间（为了简单直观未优化，仅用mask屏蔽非法动作）
支持人机交互评估下棋
引入强软对手的支持（uci协议。注：此功能未验证代码正常跑通）

由于中国象棋规则比较复杂，目前引入外部库处理，再此基础上env加入了：~~超出500步算红输（自然限着60回合未吃子和棋除外），重复4步棋局面算当前方输~~

Observation Space:
字典结构，包含以下键：
- observation: shape (N, 10, 9), float32.
- N = 14 * stack_obs_num + 1 = 14 * 4 + 1 = 57
- 前 56 个通道为 4 帧历史观测堆叠，每一帧包含 14 个特征平面 (7种棋子 x 2种颜色)
- 最后一个通道为当前玩家颜色平面 (全1表示红方/先手，全0表示黑方/后手)
- 采用 Canonical View (规范视角)：始终以当前玩家视角观察棋盘 (自己棋子在下方/前7层)
- action_mask: shape (8100,), int8. 合法动作掩码，1表示合法，0表示非法
- board: shape (10, 9), int8. 棋盘可视化表示，用于调试或渲染
- to_play: shape (1,), int32. 当前该谁走 (-1: 结束/未知, 0: 黑方, 1: 红方)
Action Space:
- Discrete(8100). 动作是移动的索引 (from_square * 90 + to_square)
- 棋盘有 90 个位置 (0-89)，动作空间涵盖所有可能的起点-终点组合 (90 * 90 = 8100)
- 实际合法动作远小于 8100 (通常几十到一百多)
Reward Space:
- Box(-1, 1, (1,), float32).
- +1: 当前玩家获胜 (Checkmate)
- -1: 当前玩家失败 (被Checkmate或长将违规)
- 0: 平局 (长闲循环、自然限招、无子可动等) 或游戏未结束

puyuan1996

非常感谢和期待你的贡献啦

puyuan1996 · 2025-11-25T07:08:58Z

zoo/board_games/chinesechess/envs/cchess_env.py

+    - ``self_play_mode``: 自对弈模式，用于 AlphaZero/MuZero 数据生成
+    - ``play_with_bot_mode``: 与内置 bot 对战模式
+    - ``eval_mode``: 评估模式
+"""


您好可以仿照https://github.com/opendilab/LightZero/blob/main/zoo/board_games/tictactoe/envs/test_tictactoe_env.py写一下cchess人与bot对战的测试入口吗

不同难度的bot是通过调用Pikafish实现的吗？如果是的话，选用这个的考虑是？

bot是由pikafish接管的。难度是由搜索深度决定，越深越强也越耗时，可以设置非常大的深度（一般情况下专业棋手深度对标也就在10-15层），也可以设置搜索时间，但为了稳定性能或评估一般就用深度比较好一些。至于选用pikafish，因为它是目前最强的开源引擎。当前bot（cchess库提供支持）可以接入任何支持uci协议的引擎。

puyuan1996 · 2025-11-25T07:12:41Z

zoo/board_games/chinesechess/envs/cchess/engine.py

@@ -0,0 +1,3131 @@
+from __future__ import annotations
+


请问这个bot engine的参考codebase和原理是？可以在文件开始的地方加一下overview

puyuan1996 · 2025-11-25T07:13:54Z

zoo/board_games/chinesechess/config/cchess_muzero_sp_mode_config.py

+from easydict import EasyDict
+
+# ==============================================================
+# 最常修改的配置参数


目前这个config 多gpu版本是跑通的状态吗？可以新加一下zoo/board_games/chinesechess/config/cchess_muzero_bot_mode_config.py的方便初始的探究self-play会更复杂一些

puyuan1996 · 2025-11-25T07:16:36Z

尝试添加第一版中国象棋的demo支持：具有统一红方视角、历史4步帧、8100动作空间（未优化）、支持人机交互评估下棋、引入强软对手的支持（uci协议。注：此功能未验证代码正常跑通）

由于中国象棋规则比较复杂，目前引入外部库处理，再此基础上env加入了：超出500步算红输（自然限着60回合未吃子和棋除外），重复4步棋局面算当前方输

可以在PR description和env开始的overview加一下目前MDP中的s a r done的详细定义哈，例如目前的状态是如何表征的，动作为什么是8100，如何判定游戏结束和最终奖励等

mycve · 2025-11-25T18:44:45Z

尝试添加第一版中国象棋的demo支持：具有统一红方视角、历史4步帧、8100动作空间（未优化）、支持人机交互评估下棋、引入强软对手的支持（uci协议。注：此功能未验证代码正常跑通）
由于中国象棋规则比较复杂，目前引入外部库处理，再此基础上env加入了：超出500步算红输（自然限着60回合未吃子和棋除外），重复4步棋局面算当前方输

可以在PR description和env开始的overview加一下目前MDP中的s a r done的详细定义哈，例如目前的状态是如何表征的，动作为什么是8100，如何判定游戏结束和最终奖励等

好的。奖励这块设计还需要斟酌一下，之前的超过500步判红输、循环局面判负，这两个设计本意是增强信号，但似乎不太好，我在实验，斟酌一下

tAnGjIa520 · 2025-12-04T06:17:42Z

您好，您是否有最近一次 commit 过后的 muzero 上的性能测试结果？可以是 sp mode 也可以是 bot mode，bot 最好是和 pikafish 的结果。

mycve · 2025-12-04T07:26:48Z

您好，您是否有最近一次 commit 过后的 muzero 上的性能测试结果？可以是 sp mode 也可以是 bot mode，bot 最好是和 pikafish 的结果。

我还在训练，太慢了（距离我上次commit期间，仅修改了一些运行小参数），我的8*5090，利用率平均只有（20%）情况下，目前持续生成、训练了5天，mcts搜索次数设置的100，step到了900多万。
前1、2个小时表现有胜负，在往后几天基本卡在重复局面（4回合）和最大步数（120）导致和棋。

今天到了900w时候每episode输赢频率变高了一些。（价值损失走高了）

tAnGjIa520 · 2025-12-04T07:38:03Z

感谢。能否截一下 collector_step 和 evaluator_step 相关的实验截图，类似下面这种

mycve · 2025-12-04T07:46:29Z

感谢。能否截一下 collector_step 和 evaluator_step 相关的实验截图，类似下面这种

tAnGjIa520 · 2025-12-05T10:33:39Z

您好，sp mode 的适配我去 review 一遍，建议您先用 bot mode 训练（depth 可以设置为 10 ）

mycve · 2025-12-05T10:51:04Z

您好，sp mode 的适配我去 review 一遍，建议您先用 bot mode 训练（depth 可以设置为 10 ）

好的，代码可能多少有一些问题，你们审查一下是非常好的。
同时希望能继续优化设计一下框架逻辑，解决GPU利用率的问题。目前在我看来，并非是凑大的批推理问题。

add chinesechess demo

50955c7

puyuan1996 changed the title ~~add chinesechess demo~~ feature(mycve): add chinese chess env and related demo Nov 25, 2025

puyuan1996 added environment New or improved environment enhancement New feature or request labels Nov 25, 2025

puyuan1996 requested changes Nov 25, 2025

View reviewed changes

test added 3 commits November 26, 2025 02:59

简化reward奖励设计

2e713c9

add pytest demo,human_vs_bot,bot_vs_bot

934ba19

添加遗漏的动作空间翻转，添加env测试翻转的脚本

3037dc9

Merge branch 'opendilab:main' into main

246f444

feature(mycve): add chinese chess env and related demo #442

Are you sure you want to change the base?

feature(mycve): add chinese chess env and related demo #442

Uh oh!

Conversation

mycve commented Nov 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

puyuan1996 left a comment

Choose a reason for hiding this comment

Uh oh!

puyuan1996 Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

puyuan1996 Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mycve Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

puyuan1996 Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

puyuan1996 Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

puyuan1996 commented Nov 25, 2025

Uh oh!

mycve commented Nov 25, 2025

Uh oh!

tAnGjIa520 commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mycve commented Dec 4, 2025

Uh oh!

tAnGjIa520 commented Dec 4, 2025

Uh oh!

mycve commented Dec 4, 2025

Uh oh!

tAnGjIa520 commented Dec 5, 2025

Uh oh!

mycve commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mycve commented Nov 23, 2025 •

edited

Loading

puyuan1996 Nov 25, 2025 •

edited

Loading

tAnGjIa520 commented Dec 4, 2025 •

edited

Loading