Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

first tests for (mm0 parser, mmu parser, mm0+mmu proof checker) #79

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

Lakedaemon
Copy link

Please tell me what is wrong so that I can fix it and make the next prs behave better.
Best regards,
Olivier

@digama0
Copy link
Owner

digama0 commented Apr 23, 2021

I pushed some changes.

  • I shortened the names of the test files; there is no need to go into details and it's annoying to deal with spaces and long names when calling these from the console.
  • The comments are more minimal - in a file that would otherwise only be 6 characters long, a few sentences of comments actually makes the intent of the file less clear. For more complicated and subtle tests a short sentence fragment should be sufficient. Remember, we're going for quantity not quality here.
  • The folder organization was changed to tests/mm0_mmu/fail/test.mm0. The intent is that the top level directory specifies the test format, meaning that all mm0_mmu tests can be run in the same way, by passing MM0 + MMU to a verifier.
  • A tests/mm0_mmu/run-mm0-hs.sh script has been added to run a single test from the pass or fail directory using the mm0-hs tool. More scripts should be added to run other tools, anything that can consume an MM0 + MMU pair. I encourage you to add a similar script to invoke mm0kt, and even PR it here. You should include a link to the installation instructions though, for people who don't have mm0kt on their path.
  • The organization into phases like matching, registering and so on was dropped. You can indicate this in the name of the test if you like, but I don't think the tests need to be any further organized than this. (I'm expecting the mm0_mmu/fail directory to grow to potentially thousands of files without any additional hierarchy.) This is easiest for tooling.

By the way, if it helps any I'm taking https://github.com/leanprover-community/lean/tree/master/tests/lean for inspiration. I don't have *.expected.out files because diagnostic messages are tool specific, although perhaps the fail tests can get expected out files for various tools.

@Lakedaemon
Copy link
Author

Ok, I'll do it that way, to the best of my capabilities.

when matching mm0 statement vs mmu directive, it's a bit like having an expected.out and checking it, I guess.
I wonder if/how it could be done for other stages.

@Lakedaemon Lakedaemon closed this Apr 23, 2021
@Lakedaemon Lakedaemon reopened this Apr 23, 2021
@Lakedaemon
Copy link
Author

Am I supposed to do something to merge those 2 tests (that you beautified) ?
Is it finished ?
For the next tests, am I supposed to commit them in this pr or am I supposed to open another one ?

@digama0
Copy link
Owner

digama0 commented Apr 23, 2021

The existing tests are fine, but you can add more tests to this PR. You had quite a few more last time you showed me.

@Lakedaemon
Copy link
Author

ok, will do

@Lakedaemon
Copy link
Author

those good ?

@Lakedaemon
Copy link
Author

another batch
Waiting for feedback and instructions for the next tests

@digama0
Copy link
Owner

digama0 commented Apr 23, 2021

these are good modulo pass/whitespace2

@Lakedaemon
Copy link
Author

I'll start working on the next batch then...and slowly resume implementing the improved kotlin patcher for mm0 (and a patch for set.mm.mm0)

@Lakedaemon
Copy link
Author

bad-math-string2.mm0 serves no purpose and should be removed :/

@ A B C D E F G H I J K L M N O
P Q R S T U V W X Y Z [ \ ] ^ _
` a b c d e f g h i j k l m n o
p q r s t u v w x y z { | } ~ DEL $;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this delimiter list is terrifying. The multi-character delimiters are not necessarily supported on all verifiers (certainly not on any that I have written), so that would put this in the "not necessarily pass" category (I'm thinking of calling this run/ to mean "don't segfault but pass and fail are both okay")

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have to modify the spec to only allow printable ascii inside math strings, because low ascii is a vector for display bugs, which could be exploitable to produce misleading theorem statements.

@digama0
Copy link
Owner

digama0 commented Apr 23, 2021

I don't think there is anything wrong with bad-math-string2, that's a possible parsing error. Same with delimiter $ $ $ $ $ $;

@Lakedaemon
Copy link
Author

You are the boss, you know best.

The list inside delimiter was to test the fact that math string may hold ANY ascii char except '$'
also, those are not multicharacter delimiters, that is just a space separated list of 1 char Ascii chars (who are displayed weirdly by their name when they are not printable)

Indeed, it wouldn't hurt to prevent the use of all those legacy and useless ascii (not printable) char.

@Lakedaemon
Copy link
Author

Lakedaemon commented Apr 24, 2021

I don't think there is anything wrong with bad-math-string2, that's a possible parsing error. Same with delimiter $ $ $ $ $ $;

oh yeah, nice one.

I'm still a bit confused about if/how I should/could test things.
I fear that this is going to be hard to test that a developper implements parsing the right way (with the right structure for stuff).

I made 3 attemps at developping parsers for mm0 (and 1 for metamath), So I tried a few things (some of them stupid), so I have some experience in doing dangerous things : like skipping unparsed balanced parentheses strings to parse them later in a different process. It gives me some ideas to help other developpers prevent the same pitfalls :

  • using a third party isWhiteSpace() to skip mm0/mmu whitespace
  • implementing/hacking things quickly while discovering the mm0/mmu specs and not handling correctly corner cases...
  • misunderstanding the spec...
    But still, I fear that doing a good testing job is going to be hard

You probably have it easier with the mmb format where everything is (certainly) located in a convenient place and you do not have parsing (structure) issues. Here, that's me being jealous :)

@Lakedaemon
Copy link
Author

Lakedaemon commented Apr 24, 2021

what about too big ints, what should developper do to have valid proofcheckers in this case ?
prefix p : $+$ prec 9999999999999999999999999999999999999999999999999999999999999999;

Say, if there is no limit to ints, and a big int overflows yet the developper doesn't guard agains that.
So an (mostly) infinite precedence becomes 1 and all math parses become different.
Shouldn't we guard agains that ?
Looks to me like an exploitable case

@Lakedaemon
Copy link
Author

in the spec, shouldn't math-strings be called something else ?
I'll be doing mostly maths but you have been doing weirder stuff.
Wouldn't another term be more suitable ? treeString ? formula ? parsable ?
Just a little morning tease to make, sure you are alert and operational first thing in the morning ! :)

@Lakedaemon
Copy link
Author

Regarding primitives, maybe we should test against balanced parenthesis... but I do not see yet how that could be done simply (maybe later when we get to dynamic parsing/proofs)

For comments, we probably should test against weirdly place comments (in whitestuff inside statements/dirrectives)
consuming lines should be done right also, so that no part of the file disappear because of rogue comments (exploitable ?) but how to do that ?

@Lakedaemon
Copy link
Author

Lakedaemon commented Apr 24, 2021

arhhhh line comment don't have to start a line !!!!!!

because of that
file ::= (lexeme | whitespace)*
whitespace ::= whitestuff+
whitestuff ::= whitechar | line-comment

We can have line comments anywhere a space is ok

so this is valid mm0 stuff

sort-- comment inside a statement
s;

Which makes catching/storing/somewhat preserving line comments a complete headache (for a patcher/transformation tool)

That's it...I'm crying ;_;

@Lakedaemon
Copy link
Author

mmh. So I got whitespace wrong from the start...
I have been thinking about how to preserve space/formatting and there might be a reasonnable simple/performant way to do it.
I'll have a go at it.

@Lakedaemon
Copy link
Author

As I do not know yet how to simply test balanced parenthesis stuff,
parsing primitives are mostly done
I'll now start testing higher level stuff (from the bottom up). First commes mm0 statements and mmu directives (separately)

@Lakedaemon
Copy link
Author

for sort fail test, we should probably check the order of modifyers...

@Lakedaemon
Copy link
Author

Lakedaemon commented Apr 24, 2021

I'm wondering if I should commit fail tests for 3 modifiers 4 * 5 = 20 tests :/
and fail test for 4 modifiers = 23 tests :/

If we were thorough, yes... now, betting on what you'll want... :)
and waiting for further instructions...

@Lakedaemon
Copy link
Author

Let's see. You are a world expert on logic, formalism and the whole point of mm0 is to make sure that there is more truth, less false truth, less bugs in software and maths (and who knows where else...)...

I'll prepare the 43 additionnal tests, while waiting ^^

@@ -0,0 +1,2 @@
-- V-- overflowing hazard
prefix p : $+$ prec 4294967296;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a run test, verifiers may or may not reject it

Copy link
Author

@Lakedaemon Lakedaemon Apr 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does it mean for the test, should we remove it ?

proof checking should fail on such a test (rejecting bad input), right ?
We can make it a very long int, like bigger than u128, if you fear that in the future someone will implement a proof checker with precedence as u128 ints...

But it would be nice if the tests made the developpers aware that they have to reject bad stuff, or bad things could happen

@Lakedaemon
Copy link
Author

when you have this
term a: s > t > s;

what names do you give to binders in the mmu file ? (because of alpha renaming, we can do as we want, but what do you suggest ?)

@digama0
Copy link
Owner

digama0 commented Apr 24, 2021

Those binders are named _.

@digama0
Copy link
Owner

digama0 commented Apr 25, 2021

what about too big ints, what should developper do to have valid proofcheckers in this case ?
prefix p : $+$ prec 9999999999999999999999999999999999999999999999999999999999999999;

Say, if there is no limit to ints, and a big int overflows yet the developper doesn't guard agains that.
So an (mostly) infinite precedence becomes 1 and all math parses become different.
Shouldn't we guard agains that ?
Looks to me like an exploitable case

There are two valid behaviors for large precedences (more than 2046).

  1. The number is too large, reject it as soon as it is parsed.
  2. Accept it as a large precedence; it is required to be larger than all smaller numbers you could type there.

That is, the verifier implementation gets to choose what the largest numerical precedence is (as long as it is at least 2046), and all numerical precedences in that range have to be ordered like numbers. If you use a fixed size integer type to hold precedences (which is reasonable and logical), then a reasonable choice for the largest precedence is either MAX or MAX-1, using MAX to hold the special non-numerical precedence max. In this case, during parsing you have to be careful to detect when overflow occurs so you can throw a parsing error. Silently allowing the overflow is not acceptable, because then the precedences will not be in numerical order.

@digama0
Copy link
Owner

digama0 commented Apr 25, 2021

for sort fail test, we should probably check the order of modifyers...

I'm considering making this a run test, but for the moment I believe the spec doesn't allow sort modifiers to be out of order so this is fine.

in the spec, shouldn't math-strings be called something else ?

It's true that you can do lots of things with mathematics, but the name seems fine. It is similar to LaTeX "math mode" as the generic term for "when you are between $ delimiters".

arhhhh line comment don't have to start a line !!!!!!
We can have line comments anywhere a space is ok
Which makes catching/storing/somewhat preserving line comments a complete headache (for a patcher/transformation tool)
That's it...I'm crying ;_;

Yes, this is valid. You should be glad that line comments aren't allowed inside math-strings, I considered that for a while (it can be useful for huge multi-line definitions). But it's a lot harder to reserve -- or some other comment starter inside math strings where almost every character can be used as part of a domain specific operator.

This is part of why --| comments exist. Internal comments like your example should almost always be discarded at the lexer stage, but --| comments immediately before the start of a statement is recognized as a "doc comment" and is transferred along with the statement through tools like from-mm and to-lean.

@digama0
Copy link
Owner

digama0 commented Apr 25, 2021

I'm wondering if I should commit fail tests for 3 modifiers 4 * 5 = 20 tests :/
and fail test for 4 modifiers = 23 tests :/

I think there are only 6 = 4 choose 2 out of order pairs to test: any out of order modifier list needs to contain an adjacent out of order modifier (of which there are 6 possibilities), or a duplicate modifier (of which there are 4 possibilities). There are 16 possible pass tests but I don't think it's important to have them all.

But considering that the main goal here is to catch whether the verifier is using an algorithm that lets these pass, you should only check that the order is correct (by checking that s1234 passes) and that out of order is rejected (so s21, s22, s31, s32, s33, s41, s42, s43, s44 are rejected). I'm pretty sure you will find a bug in one of my parsers with that.

@digama0
Copy link
Owner

digama0 commented Apr 25, 2021

For comments, we probably should test against weirdly place comments (in whitestuff inside statements/dirrectives)
consuming lines should be done right also, so that no part of the file disappear because of rogue comments (exploitable ?) but how to do that ?

Yes, surprising comments can definitely cause exploitable issues. (I recall something similar with surprising behavior of the C preprocessor where /* inside the first part of an #if #else #endif can comment out the #else and cause the #if itself to match some other #else, but only when the if condition is true.)

Here's an example of an issue that the current spec may not defend against:

-- I've proved false?
theorem foo: $ F. $;

the secret is that between the two lines is a \r, which may or may not be displayed as a line break but in any case will not terminate the comment (so the theorem is actually in the comment).

@Lakedaemon
Copy link
Author

I'm wondering if I should commit fail tests for 3 modifiers 4 * 5 = 20 tests :/
and fail test for 4 modifiers = 23 tests :/

I think there are only 6 = 4 choose 2 out of order pairs to test: any out of order modifier list needs to contain an adjacent out of order modifier (of which there are 6 possibilities), or a duplicate modifier (of which there are 4 possibilities). There are 16 possible pass tests but I don't think it's important to have them all.

But considering that the main goal here is to catch whether the verifier is using an algorithm that lets these pass, you should only check that the order is correct (by checking that s1234 passes) and that out of order is rejected (so s21, s22, s31, s32, s33, s41, s42, s43, s44 are rejected). I'm pretty sure you will find a bug in one of my parsers with that.

Damn, I lost my bet :/
So. I'll only be commiting the 16 pass possibilities inside a single pass test and the 6 out of order 2 modifiers fait tests

@Lakedaemon
Copy link
Author

Lakedaemon commented Apr 25, 2021

What is a "run" test ?
Are there other kind of tests ?

On my machine, I run a test, it either fails or pass which is the expected behavior or not, I do not have other options
in the end with the testing framework I use (though I may tweek how/when it fails/pass)

@digama0
Copy link
Owner

digama0 commented Apr 25, 2021

What is a "run" test ?
Are there other kind of tests ?

I mentioned that in an earlier post as a third category besides pass and fail for things which may pass or fail depending on the implementation. One option is to just not have any tests that exercise this category, but another option is to have the tests there and just ignore the result. If the verifier is capable of signalling "bug" (i.e. panic, segfault, or other internal error) separately from "pass" and "fail" then a run test should check that you don't get "bug" but both "pass" and "fail" are acceptable answers. For barebones verifiers like mm0-c there is no distinction between "fail" and "bug" so there isn't much to test.

A given implementation should generally be able to say for any given run test whether it is supposed to be a pass or fail, but keeping track of this information in the test suite is more difficult because of the implementation sensitivity.

@Lakedaemon
Copy link
Author

ok, I understand now

@digama0 digama0 force-pushed the master branch 2 times, most recently from 4a8b328 to 97abfae Compare November 5, 2021 04:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants