Code for training an AI for Lean 3 using Deep Reinforcement Learning.
It consists of multiple agents. One is for generating theorems while the others are trying to solve those. The reward for the theorem generation depends on the performance of the solvers. It is supposed to keep the tasks for the solvers manageable. Sadly, the solvers don't achieve to solve non-trivial goals. Most likely for the lack of computational power and some inefficiencies.
The theorems are generated by starting with the goal "Prop" in Lean and applying definitions till all goals are closed. There are also a few more actions like closing the goal by adding the current goal to the hypotheses and applying it. That's the good working part of this project. It's not a problem to generate a lot of diverse theorems which are not solved by some fixed tactics. See as an example the lemma set I uploaded in https://github.com/todbeibrot/lemma-set.
The solvers can choose of a set of tactics and theorems to make progress on the goal. To get some more training, we also feed it theorems from mathlib and old solved theorems to solve. The amount of additional training dependents on the performance on the different kind of theorems. As first action the solvers can decide if they want to solve the negation of the theorem instead.
For the network: I used a CNN. There is no good reason why I chose this architecture over a transformer. As embedding I used a word2vec model. It is trained on data from previous runs.
In order to use it, you need an instance of lean-gym (https://github.com/openai/lean-gym), replacing the file "tactic.lean" with the altered file in the folder "lean-gym" and adding the other 3 files in the folder to the target. You also need a list of theorems and definitions in a "data" folder to use.
If you are interested in me uploading training graphs, models, generated theorems, the word2vec model or lists of definitions/theorems to use, leave a message.
For evaluation: I didn't develop a working evalutaion on MiniF2F for this version. An earlier version reached ~11%. That's still pretty bad, considering that a lot of theorems in MiniF2F are one-liner. Note that the detection of sorrys shouldn't be turned off outside of training.