-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check set.mm LaTeX in CI pipeline #3100
Conversation
Generate a list of symbols in TeX format, then try to generate a PDF. This will check if every symbol is valid TeX. I don't expect this specific commit to succeed, but hopefully it'll help us see what's left to do. Signed-off-by: David A. Wheeler <[email protected]>
Signed-off-by: David A. Wheeler <[email protected]>
Currently this fails on:
Is there an Ubuntu package with phonetic.sty? Maybe adding this would help: texlive-fonts-extra |
@GinoGiotto @benjub @wlammen - eventually I'd like the CI pipeline to call the script & see if the LaTeX works. Then we'll be immediately notified of future problems when they happen. Suggestions on how to "make this work" are welcome. |
This is related to #3099 |
Did you check out those Github actions @david-a-wheeler ? |
Looks like it:
|
Signed-off-by: David A. Wheeler <[email protected]>
Signed-off-by: David A. Wheeler <[email protected]>
Signed-off-by: David A. Wheeler <[email protected]>
Signed-off-by: David A. Wheeler <[email protected]>
you can also
|
Signed-off-by: David A. Wheeler <[email protected]>
Signed-off-by: David A. Wheeler <[email protected]>
Thanks! That's better than trying to download it multiple different ways (with different versions). I think we're homing into a useful LaTeX test. Once we test every proposed change, we'll be much more likely to notice problems :-). |
Signed-off-by: David A. Wheeler <[email protected]>
The latest revision hasn't finished running, but I can already tell you where it's going to fail:
Can our TeX-ers fix that? |
Hard to tell without seeing the file that caused the warning. If this is because of the latexdef of @tirix's df-rag, then probably |
The problematic generated line is: \verb!raG! & $\L\mathrm{G}$ \\ So yes, we need to change its latex representation at line 436180:
I'll create a separate pull request to do that. Then we may be able to generate all TeX symbols without problems... and keep them that way. |
Currently we have only one identified problem in the generated TeX table: `\L` isn't allowed in math mode. Switching to `\\corner` should fix this (per @benjub). This should fix the remaining problem in creating the LaTeX CI pipeline check in #3100. Signed-off-by: David A. Wheeler <[email protected]>
See #3104 which should fix the remaining known problem. |
Currently we have only one identified problem in the generated TeX table: `\L` isn't allowed in math mode. Switching to `\\corner` should fix this (per @benjub). This should fix the remaining problem in creating the LaTeX CI pipeline check in #3100. Signed-off-by: David A. Wheeler <[email protected]>
By the way: why not do this in a separate job of the .yml ? |
We really depend on LaTeX, not just TeX, so make that clear in the comments about verification. Signed-off-by: David A. Wheeler <[email protected]>
Signed-off-by: David A. Wheeler <[email protected]>
We could, but we rebuild metamath-exe from source code. If we did this in a separate job we'd need to rebuild metamath-exe twice. I thought it'd be simpler to do it in a single job. |
Isn't the LaTeX part independent of metamath.c ? I don't see where it is used in the part you added ? |
If this is the only LaTeX test, I agree. But I'm expecting to use metamath to generate some LaTeX proofs and test this too (not done yet). So it makes sense to pit them here. |
If it's possible I would suggest to randomize the process (check different proofs each time), to increase the probability of spotting bugs in the future. |
@GinoGiotto - it's worthy of discussion. The risk is that the tests become "flapping" tests, that is, appear to randomly fail. If you rerun a test & it succeeds, it won't be obvious there's a problem. We could pick statement number 1 + ( (hash of .mm file) mod (number of statements) ). Then every change to the database would pick a specific statement to test, but it'd be the same one until the next change. |
One approach could be to design a process where, in case of failure, the system repeats the same tests as before, and only resumes random testing once the problem is solved. But to be honest I don't know how it would be possible to implement such a mechanism. |
Although fuzz testing or generative testing[1] or the like can be helpful,[2] I don't think it is good idea to be including tests containing randomness in our github checks, for the reasons @david-a-wheeler gives. It just becomes too hard to know whether a test failure was caused by a particular commit or pull request that way. [1] see for example https://medium.com/geckoboard-under-the-hood/how-generative-testing-changed-the-way-we-qa-geckoboard-b4a48a193449 [2] well at least in the abstract, I'm not sure I have more answers than anyone else about what to advise someone who is thinking of writing such tests, in terms of how to run these tests (manual? automated? under what circumstances?) |
A different approach could be to simply test very long proofs. They are usually more likely to contain bugs (e.g. this bug was very common among long proofs metamath/metamath-exe#129). The disadvantage is that testing could be quite slow tho. |
Fuzzing is awesome for security vulnerability detection, but I don't think that's the goal here. I think we want predictable tests and ones that don't take too long within our CiCD process. Ipdflatex is slow. still think this compromise would do it:
We could pick several instead of just one. The point is that it would pick "randomly" for a particular version of set.mm, but it'd be predictable for any particular version. That would ensure that re-running the tests for a particular version of set.mm would run the same tests. We could pick separately create a set of "long" ones & pick just one. We could separately have a process that runs PDF generation on all proofs, but I'd expect that to take 24hours or so. Something you'd want to do separately. |
If all proofs were checked, maybe another solution would be to just pick the latest statement. That would be more likely to correlate with any pr's changes. |
That's true, but that means that you have to compute "what is the most recent statement". That requires the CI pipeline to use a process to compare the differences over time, and that's more complex than something that uses only the proposed last state. Also what's "most recent" is a little more complex - that depends on the branch you're on (sometimes multiple different branches can end up at the same place).n That's less of a big deal but worth mentioning. |
I was thinking about using the date in the tags of a theorem, which is statically findable, although possibly not actually the most recent theorem now that I think about it |
I was thinking about using the date in the tags of a theorem, which is statically findable, although possibly not actually the most recent theorem now that I think about it
That's a good idea. That's probably close enough for our purposes.
|
All: there are many other things we could do with LaTeX, but let's merge this PR as a starting point. It at least ensures that people won't create new definitions/symbols with broken LaTeX, and that's a starting point. Do have a +1 from anyone? |
This reverts commit 3e6b9b8.
Generate a list of symbols in TeX format, then try to generate a PDF. This will check if every symbol is valid TeX.
I don't expect this specific commit to succeed, but hopefully it'll help us see what's left to do.