-
Notifications
You must be signed in to change notification settings - Fork 8
Supervised Learning
Err323 put together a CCRL dataset, already converted into V3 training format. The blog post can be found here.
After reading this article, you should be able to take a collection of PGN's and train a net using lczero-training.
Thanks to TomekJ for some additional Windows flavor.
A newer, faster tool for converting pgn to training data can be found here.
[Note: this article assumes you are using Linux. Performing the same on Windows is possible, but as I don't use Windows, documenting the details will have to be left to someone else.]
You will need the following software:
-
pgn-extract
- the supervised learning pgn parser is very brittle. As a starting point, I would run your pgn file throughpgn-extract
withpgn-extract -7 -C < input.pgn > output.pgn
. See here for details. - The "supervise" branch of my fork of lczero. Yes
lczero
is the old engine for the nets, but it also has the supervised training code in it, which I fixed and reenabled. This should also be merged into the master branch of the original repo. I don't control that, so can only be confident that my fork has the right code. - The master branch of lczero-training. There's a fair bit of fiddling with setup here. You'll need CUDA-9.0 for tensorflow, et al, which is different than the CUDA-9.2, et al you got for lc0. I'll eventually add a section on configuration of this beast.
The high level supervised learning process runs as follows:
- Make sure the individual pgn files you will be converting to training data have less than 500k games in them. The training software expects files -- called "chunks" -- with one game per chunk. So training data directories
will be created with a potentially large number of files, which can become unwieldy. You can use
pgn-extract
to break a pgn file into equal sized files with N games. See documentation. - Clean up the pgn files with
pgn-extract -7 -C < input.pgn > output.pgn
. Change the filenames to reflect your naming scheme. - Run
lczero
to generate the training data. Note that lczero requires a weights file for this step. The weights file is loaded but ignored. This is an artifact of the all in one nature oflczero
.
./lczero -w weights_useless.txt.gz --supervise my_pgn_file.pgn
- Clean up the mess and edit your pgn file when it dumps core because of some minor pgn issue.
- Finally you get a clean run. You should have a directory called
supervise-my_pgn_file
with files of the formtraining.XXXXX.gz
where the X's are digits (there could be 1 or a dozen digits, depending on how many games you had). There should be one file for each game in your pgn. - If you've converted several pgn's, put all the various "supervise" directory in a common subdirectory. This will make it easier to process them in the training step.
- Determine how many "chunks" you have in the subdir by running
find subdir -type f | wc -l
. Let's assume we have 901265 chunks. - In your
lczero-training
directory. Change directory to the tf subdir. There should be a "config" subdirectory. Let's copy the example config file and make it work for us.
%YAML 1.2
---
name: 'my-first-net-64x6' # ideally no spaces
gpu: 0 # gpu id to process on
dataset:
num_chunks: 901265 # newest nof chunks to parse
train_ratio: 0.90 # trainingset ratio
# For separated test and train data.
#input_train: '/path/to/chunks/*/draw/' # supports glob
#input_test: '/path/to/chunks/*/draw/' # supports glob
# For a one-shot run with all data in one directory.
input: '/subdir/supervise-*/'
training:
batch_size: 2048 # training batch
test_steps: 2000 # eval test set values after this many steps
train_avg_report_steps: 200 # training reports its average values after this many steps.
total_steps: 140000 # terminate after these steps
# checkpoint_steps: 10000 # optional frequency for checkpointing before finish
shuffle_size: 524288 # size of the shuffle buffer
lr_values: # list of learning rates
- 0.02
- 0.002
- 0.0005
lr_boundaries: # list of boundaries
- 100000
- 130000
policy_loss_weight: 1.0 # weight of policy loss
value_loss_weight: 1.0 # weight of value loss
path: '/path/to/store/networks' # network storage dir
model:
filters: 64
residual_blocks: 6
...
You may have to fiddle with some of the LR or batch sizes depending on your GPU and the directory specifics are also up to you.
- Run the training. This will take a long time. Maybe you can reduce the number of steps in the config file to check things out at first. From the tf dir, run the following.
./train.py --cfg configs/my-first-net.yaml --output my-first-net.txt
- After churning through a number of steps, it should barf out a network weights file, which you can then use with lc0.
Simple, no?
My new (old) blog is at lczero.libertymedia.io