Real Time Blunder Checking

Blunder Checking Leela

Not so long ago, in the era of the 192x15 networks, the people were restless.

"Why does Leela blunder so badly?," they were heard to exclaim.

I decided to answer their question.

The Approach

How to get to the bottom of this blundering? Well, one could take the multipv output from an "adviser" AB engine and make sure that the move Leela picked was not too far from the best move, in terms of the adviser's centipawn scores. I would make both the adviser engine and the centipawn window (where the adviser overrules Leela) configurable.

Having decided on an approach, I wrote a uci golang wrapper, called Advice, that would run leela and the adviser engine simultaneously and make a decision on whether to overrule when leela spit back bestmove.

The Test

The TC was fairly fast: 0.25 sec per move. This was the TC I had been using for my tracking gauntlet.

Komodo 12
Gull 3
Cheng4
Senpai

At the time, Leela was between Gull and Cheng (stronger than Cheng, weaker than Gull). Komodo was much stronger Senpai was a bit weaker.

I used a recent net -- ID356 -- for the test. Openings were randomly selected 10 ply from ProDeo-3100, played twice with colors reversed. Leela played 200 games against each opponent. The adviser engine was Cheng4, except in one test run, where it was SF9. No EGTB were used.

The Results

Advice

The opponents are on the bottom axis.
Each colored bar represents a different centipawn window -- 80, 200, 300 -- except the first, where no adviser was used.
The adviser was Cheng4, except in the case of the last run, where sf9 was the adviser.
The chart shows the winrate, not Elo.

Some initial observations:

The adviser had a small but noticeable positive effect, with a few exception.
An 80 cp window was most positive. The 200 and 300 windows effect was somewhat less pronounced.
The strength of the adviser engine was important. Cheng4 didn't help nearly as much against the much stronger Komodo 12 and in fact performed worse than Leela with no advisor at 80 cp. Sf9 at 80 cp made the most difference.

The Details

I decided to dig into the details of the advice. Here is a representative sample of advice:

Started game 16 of 20 (komodo12 vs ChengT10175)
4642869 <ChengT10175(0): info string override move c2b1(-14305) with c2b2(-898) at ply 108 (8/2Q5/8/pq1r4/8/2R4P/P1k2PPK/8 b - - 0 54)
4667078 <ChengT10175(0): info string override move d4a4(-14183) with b3c4(-2414) at ply 120 (8/8/8/Q7/3r2P1/1k5P/5P1K/8 b - g3 0 60)
4675279 <ChengT10175(0): info string override move c4c5(-15065) with c4d3(-14256) at ply 124 (8/8/8/3r2P1/Q1k5/7P/5P1K/8 b - - 2 62)

This is a later run with the centipawn window set to 300. Most of the overruling was either slowing down mate or speeding it up (if Leela was winning). There were very few cases where a blunder was overruled. This held for many thousands of games for both the "blunder" prone 192x15 generation of nets, and the later Test10 nets.

Conclusions

Leela really doesn't blunder that often.
Too small of a window, and you kill Leela's style. Too big, and you might not catch blunders (see previous point).
How can an AB engine distinguish between a blunder and a patented Leela positional sacrifice?
Without using AB data in the MCTS, having an AB engine blundercheck Leela is of limited use.
You can't just use any old engine to provide advice on tactical blunders. It will try to give advice on any position that moves the needle, not just on ones with material loss. So the strength of the engine is crucial.

My new (old) blog is at lczero.libertymedia.io

Provide feedback

Saved searches

Use saved searches to filter your results more quickly