Skip to content

Rescoring Training Data

dkappe edited this page Mar 3, 2020 · 10 revisions

Rescorer

The rescorer binary is a special compile of lc0, which you can clone from Tilps' fork of lc0.

git clone -b rescore_tb --recurse-submodules https://github.com/Tilps/lc0 rescorer

Setup for compilation

I have a virtualenv for compiling lc0. Here are the requirements you can feed to pip. For more info on compiling lc0, check the lc0 github repo.

meson==0.52.1
ninja==1.9.0.post1
pkg-resources==0.0.0

Building

I'm using ubuntu, but Windows should be vaguely similar. Hopefully everything is configured properly and you can just go to the source directory and type

./build.sh

After much downloading of protobuf and compiling, a lc0 binary should be in build/release and ready to be copied to a directory. I create a bin directory. A ls -l should show the following:

-rwxr-xr-x 1 dkappe dkappe 1912720 Feb  6 23:15 rescorer
drwxr-xr-x 3 dkappe dkappe    4096 Feb  6 23:17 rescorer@exe
drwxr-xr-x 4 dkappe dkappe    4096 Feb  6 23:17 subprojects

The Data

I have a directory called t40 to which I have downloaded some training files:

drwxr-xr-x 2 dkappe dkappe 536576 Feb  7 00:00 training-run1-20190726-1017
drwxr-xr-x 2 dkappe dkappe 270336 Jul 26  2019 training-run1-20190726-1117

Both of those directories have files of the form training.30837054.gz that contain the training records for one game.

Rescoring with Syzygy

A little help can be had with ./bin/rescorer rescore --help

Some important snippets:

       --syzygy-paths=STRING
               List of Syzygy tablebase directories

       --gaviotatb-paths=STRING
               List of Gaviota tablebase directories

       --input=STRING
               Directory with gzipped files in need of rescoring.

       --output=STRING
               Directory to write rescored files.

IMPORTANT: RUNNING RESCORER WILL ERASE YOUR ORIGINAL TRAINING FILES.

So lets run rescorer. (Of course, make sure to remove the LICENSE file out of t60 data dirs, otherwise you will get a segfault.)

mkdir out
./bin/rescorer rescore --threads=4 --syzygy-paths=/home/dkappe/chess/egtb --input=/home/dkappe/deep3/data/t40/training-run1-20190726-1017/ --output=out

and the output:

       _
|   _ | |
|_ |_ |_| v0.23.0-dev+git.74a5d42 built Feb  6 2020
Found 510WDL, 0 DTM and 510 DTZ tablebase files.
Thread: 0 starting
Thread: 1 starting
Thread: 2 starting
Thread: 3 starting
Games processed: 13223
Positions processed: 1636979
Rescores performed: 28136
Cumulative outcome change: 28136
Secondary rescores performed: 3188
Secondary rescores performed used dtz: 824
Number of policy values boosted by dtz or dtm 0
Number of policy values boosted by dtm 0
Orig policy_sum dist of boost candidate:
 -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan
Boosted policy_sum dist of boost candidate:
 -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan
Original L: 3119 D: 5719 W: 4385
After L: 3082 D: 5802 W: 4339

You now should have the same number of .gz files in the output directory, rescored. Enjoy.

Notes

  • I haven't been able to get it to work with t60 data, only t40. Rescorer dumps core on t60 files.
  • Apparently it may have been corrupt data rather than something specific to t60.
  • Doh!! It was the presence of the LICENSE file in t60 data dirs, which rescorer tried to process.

Scripting

Here is a sample script for converting a directory of unrescored files to rescored files:

 #!/bin/bash

mkdir rescored-$1
/home/dkappe/data/chess/bin/rescorer rescore --syzygy-paths=/home/dkappe/data/chess/egtb --threads=8 --input=$1 --output=rescored-$1