Skip to content

A Leela NNUE? Night Nurse and Others

dkappe edited this page Aug 21, 2020 · 6 revisions

What if you distilled a leela net to a NNUE?

You’d get Night Nurse. Well, almost. The leela-style net in question is Bad Gyal, which was developed independently without any leela data. It runs on lc0, Allie, Scorpio and a0lite.

But before we get into exactly how Night Nurse was developed, maybe a little bit of background on NNUE’s.

What’s an NNUE?

Without getting too far into the technical details of NNUE, the acronym stands for “Efficiently Updateable Neural Network.” It’s essentially a shallow, fully connected net with the neat trick that between positions only a small number of inputs change and a limited part of the net needs to be recomputed. As a result, it can run efficiently on a CPU and doesn’t require a GPU to run.

This network takes chess board features and produces an evaluation which is used instead of a hand crafted Evaluation. Depending on the size of the net, it will run at 50%-80% of the speed of a hand crafted eval.

Training a NNUE

All the code and know-how we have for nnue comes from Shogi. The first thing you do when you train a network is generate a billion positions worth of training data using Stockfish’s native eval at some shallow depth (folklore says 8). Let’s take a look at a text representation of such a training record:

fen 8/8/4k3/p7/1bK5/1P6/P7/8 b - - 4 1
move e6e5
score 152
ply 51
result 0
e

Initially you train the net (or, rather, the qsearch of the net) on the score target based on the inputs of the board represented by the fen. Later, on higher quality data, you do start mixing in some of the game result.

Using the tools for other engines

Nothing says you have to use Stockfish to generate this data. This text format above can be imported back into binary training data and used for training an NNUE. I first wrote some python scripts to generate the text records from an arbitrary UCI engine. I have built NNUE distilled from Toga II 4.0, Ice and Komodo 14, all at depth 8.

I happened to have very many pgn’s from 800 node test matches I ran while developing the aforementioned Bad Gyal. I wrote another python script to extract training data from these pgn’s and ended up with 180m or so positions. That and the uci scripts is what I use to generate data for Night Nurse.

Spinning straw into gold

Bad Gyal and Night Nurse are both strong. But where does that strength come from? If you look at Bad Gyal training data —- great unwashed lichess games blended 50/50 with shallow sf10 eval and multipv policy —- it’s inexplicable where this strength comes from. It’s the power of the 128x10 resnet that turns horrible training data into a strong mcts/nn performer. The training data that Bad Gyal feeds to Night Nurse is already quite good.

Where to get the nets

They can all be found, free of charge, on my Patreon site.