-
Notifications
You must be signed in to change notification settings - Fork 182
feature(mycve): add chinese chess env and related demo #442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
puyuan1996
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
非常感谢和期待你的贡献啦
| - ``self_play_mode``: 自对弈模式,用于 AlphaZero/MuZero 数据生成 | ||
| - ``play_with_bot_mode``: 与内置 bot 对战模式 | ||
| - ``eval_mode``: 评估模式 | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不同难度的bot是通过调用Pikafish实现的吗?如果是的话,选用这个的考虑是?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bot是由pikafish接管的。难度是由搜索深度决定,越深越强也越耗时,可以设置非常大的深度(一般情况下专业棋手深度对标也就在10-15层),也可以设置搜索时间,但为了稳定性能或评估一般就用深度比较好一些。至于选用pikafish,因为它是目前最强的开源引擎。当前bot(cchess库提供支持)可以接入任何支持uci协议的引擎。
| @@ -0,0 +1,3131 @@ | |||
| from __future__ import annotations | |||
|
|
|||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
请问这个bot engine的参考codebase和原理是?可以在文件开始的地方加一下overview
| from easydict import EasyDict | ||
|
|
||
| # ============================================================== | ||
| # 最常修改的配置参数 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
目前这个config 多gpu版本是跑通的状态吗?可以新加一下zoo/board_games/chinesechess/config/cchess_muzero_bot_mode_config.py的 方便初始的探究self-play会更复杂一些
可以在PR description和env开始的overview加一下目前MDP中的s a r done的详细定义哈,例如目前的状态是如何表征的,动作为什么是8100,如何判定游戏结束和最终奖励等 |
好的。奖励这块设计还需要斟酌一下,之前的超过500步判红输、循环局面判负,这两个设计本意是增强信号,但似乎不太好,我在实验,斟酌一下 |
|
您好,您是否有最近一次 commit 过后 的 muzero 上的性能测试结果?可以是 sp mode 也可以是 bot mode,bot 最好是和 pikafish 的结果 。 |
|
您好,sp mode 的适配 我去 review 一遍,建议您先用 bot mode 训练(depth 可以设置为 10 ) |
好的,代码可能多少有一些问题,你们审查一下是非常好的。 |








尝试添加第一版中国象棋的demo支持:
由于中国象棋规则比较复杂,目前引入外部库处理,再此基础上env加入了:
超出500步算红输(自然限着60回合未吃子和棋除外),重复4步棋局面算当前方输Observation Space:
字典结构,包含以下键:
-
observation: shape (N, 10, 9), float32.- N = 14 * stack_obs_num + 1 = 14 * 4 + 1 = 57
- 前 56 个通道为 4 帧历史观测堆叠,每一帧包含 14 个特征平面 (7种棋子 x 2种颜色)
- 最后一个通道为当前玩家颜色平面 (全1表示红方/先手,全0表示黑方/后手)
- 采用 Canonical View (规范视角):始终以当前玩家视角观察棋盘 (自己棋子在下方/前7层)
-
action_mask: shape (8100,), int8. 合法动作掩码,1表示合法,0表示非法-
board: shape (10, 9), int8. 棋盘可视化表示,用于调试或渲染-
to_play: shape (1,), int32. 当前该谁走 (-1: 结束/未知, 0: 黑方, 1: 红方)Action Space:
- Discrete(8100). 动作是移动的索引 (from_square * 90 + to_square)
- 棋盘有 90 个位置 (0-89),动作空间涵盖所有可能的起点-终点组合 (90 * 90 = 8100)
- 实际合法动作远小于 8100 (通常几十到一百多)
Reward Space:
- Box(-1, 1, (1,), float32).
- +1: 当前玩家获胜 (Checkmate)
- -1: 当前玩家失败 (被Checkmate或长将违规)
- 0: 平局 (长闲循环、自然限招、无子可动等) 或 游戏未结束