first tests for (mm0 parser, mmu parser, mm0+mmu proof checker) #79

Lakedaemon · 2021-04-23T12:35:11Z

Please tell me what is wrong so that I can fix it and make the next prs behave better.
Best regards,
Olivier

tests/fail/parsingBoth/chars cannot be outside a subset of the ascii charset.mm0

tests/fail/parsingBoth/chars cannot be outside a subset of the ascii charset2.mm0

digama0 · 2021-04-23T14:32:52Z

I pushed some changes.

I shortened the names of the test files; there is no need to go into details and it's annoying to deal with spaces and long names when calling these from the console.
The comments are more minimal - in a file that would otherwise only be 6 characters long, a few sentences of comments actually makes the intent of the file less clear. For more complicated and subtle tests a short sentence fragment should be sufficient. Remember, we're going for quantity not quality here.
The folder organization was changed to tests/mm0_mmu/fail/test.mm0. The intent is that the top level directory specifies the test format, meaning that all mm0_mmu tests can be run in the same way, by passing MM0 + MMU to a verifier.
A tests/mm0_mmu/run-mm0-hs.sh script has been added to run a single test from the pass or fail directory using the mm0-hs tool. More scripts should be added to run other tools, anything that can consume an MM0 + MMU pair. I encourage you to add a similar script to invoke mm0kt, and even PR it here. You should include a link to the installation instructions though, for people who don't have mm0kt on their path.
The organization into phases like matching, registering and so on was dropped. You can indicate this in the name of the test if you like, but I don't think the tests need to be any further organized than this. (I'm expecting the mm0_mmu/fail directory to grow to potentially thousands of files without any additional hierarchy.) This is easiest for tooling.

By the way, if it helps any I'm taking https://github.com/leanprover-community/lean/tree/master/tests/lean for inspiration. I don't have *.expected.out files because diagnostic messages are tool specific, although perhaps the fail tests can get expected out files for various tools.

Lakedaemon · 2021-04-23T15:01:59Z

Ok, I'll do it that way, to the best of my capabilities.

when matching mm0 statement vs mmu directive, it's a bit like having an expected.out and checking it, I guess.
I wonder if/how it could be done for other stages.

Lakedaemon · 2021-04-23T15:34:37Z

Am I supposed to do something to merge those 2 tests (that you beautified) ?
Is it finished ?
For the next tests, am I supposed to commit them in this pr or am I supposed to open another one ?

digama0 · 2021-04-23T15:41:23Z

The existing tests are fine, but you can add more tests to this PR. You had quite a few more last time you showed me.

Lakedaemon · 2021-04-23T15:52:35Z

ok, will do

Lakedaemon · 2021-04-23T17:01:25Z

those good ?

Lakedaemon · 2021-04-23T17:42:51Z

another batch
Waiting for feedback and instructions for the next tests

tests/mm0_mmu/pass/whitespace2.mm0

digama0 · 2021-04-23T17:56:44Z

these are good modulo pass/whitespace2

Lakedaemon · 2021-04-23T18:50:13Z

I'll start working on the next batch then...and slowly resume implementing the improved kotlin patcher for mm0 (and a patch for set.mm.mm0)

Lakedaemon · 2021-04-23T19:41:39Z

bad-math-string2.mm0 serves no purpose and should be removed :/

digama0 · 2021-04-23T20:24:16Z

tests/mm0_mmu/pass/math-string.mm0

+  @ A B C D E F G H I J K L M N O
+  P Q R S T U V W X Y Z [ \ ] ^ _
+  ` a b c d e f g h i j k l m n o
+   p q r s t u v w x y z { | } ~ DEL $;


this delimiter list is terrifying. The multi-character delimiters are not necessarily supported on all verifiers (certainly not on any that I have written), so that would put this in the "not necessarily pass" category (I'm thinking of calling this run/ to mean "don't segfault but pass and fail are both okay")

I might have to modify the spec to only allow printable ascii inside math strings, because low ascii is a vector for display bugs, which could be exploitable to produce misleading theorem statements.

digama0 · 2021-04-23T20:29:58Z

I don't think there is anything wrong with bad-math-string2, that's a possible parsing error. Same with delimiter $ $ $ $ $ $;

Lakedaemon · 2021-04-23T20:34:13Z

You are the boss, you know best.

The list inside delimiter was to test the fact that math string may hold ANY ascii char except '$'
also, those are not multicharacter delimiters, that is just a space separated list of 1 char Ascii chars (who are displayed weirdly by their name when they are not printable)

Indeed, it wouldn't hurt to prevent the use of all those legacy and useless ascii (not printable) char.

Lakedaemon · 2021-04-24T05:04:01Z

I don't think there is anything wrong with bad-math-string2, that's a possible parsing error. Same with delimiter $ $ $ $ $ $;

oh yeah, nice one.

I'm still a bit confused about if/how I should/could test things.
I fear that this is going to be hard to test that a developper implements parsing the right way (with the right structure for stuff).

I made 3 attemps at developping parsers for mm0 (and 1 for metamath), So I tried a few things (some of them stupid), so I have some experience in doing dangerous things : like skipping unparsed balanced parentheses strings to parse them later in a different process. It gives me some ideas to help other developpers prevent the same pitfalls :

using a third party isWhiteSpace() to skip mm0/mmu whitespace
implementing/hacking things quickly while discovering the mm0/mmu specs and not handling correctly corner cases...
misunderstanding the spec...
But still, I fear that doing a good testing job is going to be hard

You probably have it easier with the mmb format where everything is (certainly) located in a convenient place and you do not have parsing (structure) issues. Here, that's me being jealous :)

Lakedaemon · 2021-04-24T05:11:47Z

what about too big ints, what should developper do to have valid proofcheckers in this case ?
prefix p : $+$ prec 9999999999999999999999999999999999999999999999999999999999999999;

Say, if there is no limit to ints, and a big int overflows yet the developper doesn't guard agains that.
So an (mostly) infinite precedence becomes 1 and all math parses become different.
Shouldn't we guard agains that ?
Looks to me like an exploitable case

Lakedaemon · 2021-04-24T05:37:35Z

in the spec, shouldn't math-strings be called something else ?
I'll be doing mostly maths but you have been doing weirder stuff.
Wouldn't another term be more suitable ? treeString ? formula ? parsable ?
Just a little morning tease to make, sure you are alert and operational first thing in the morning ! :)

Lakedaemon · 2021-04-24T05:54:11Z

Regarding primitives, maybe we should test against balanced parenthesis... but I do not see yet how that could be done simply (maybe later when we get to dynamic parsing/proofs)

For comments, we probably should test against weirdly place comments (in whitestuff inside statements/dirrectives)
consuming lines should be done right also, so that no part of the file disappear because of rogue comments (exploitable ?) but how to do that ?

Lakedaemon · 2021-04-24T06:10:10Z

arhhhh line comment don't have to start a line !!!!!!

because of that
file ::= (lexeme | whitespace)*
whitespace ::= whitestuff+
whitestuff ::= whitechar | line-comment

We can have line comments anywhere a space is ok

so this is valid mm0 stuff

sort-- comment inside a statement
s;

Which makes catching/storing/somewhat preserving line comments a complete headache (for a patcher/transformation tool)

That's it...I'm crying ;_;

Lakedaemon · 2021-04-24T07:27:03Z

mmh. So I got whitespace wrong from the start...
I have been thinking about how to preserve space/formatting and there might be a reasonnable simple/performant way to do it.
I'll have a go at it.

Lakedaemon · 2021-04-24T08:31:55Z

As I do not know yet how to simply test balanced parenthesis stuff,
parsing primitives are mostly done
I'll now start testing higher level stuff (from the bottom up). First commes mm0 statements and mmu directives (separately)

Lakedaemon · 2021-04-24T08:44:03Z

for sort fail test, we should probably check the order of modifyers...

Lakedaemon · 2021-04-24T08:57:13Z

I'm wondering if I should commit fail tests for 3 modifiers 4 * 5 = 20 tests :/
and fail test for 4 modifiers = 23 tests :/

If we were thorough, yes... now, betting on what you'll want... :)
and waiting for further instructions...

Lakedaemon · 2021-04-24T10:22:05Z

Let's see. You are a world expert on logic, formalism and the whole point of mm0 is to make sure that there is more truth, less false truth, less bugs in software and maths (and who knows where else...)...

I'll prepare the 43 additionnal tests, while waiting ^^

digama0 · 2021-04-24T16:17:44Z

tests/mm0_mmu/fail/bad-int1.mm0

@@ -0,0 +1,2 @@
+--                      V--  overflowing hazard
+prefix p : $+$ prec 4294967296;


this is a run test, verifiers may or may not reject it

what does it mean for the test, should we remove it ?

proof checking should fail on such a test (rejecting bad input), right ?
We can make it a very long int, like bigger than u128, if you fear that in the future someone will implement a proof checker with precedence as u128 ints...

But it would be nice if the tests made the developpers aware that they have to reject bad stuff, or bad things could happen

Lakedaemon · 2021-04-24T19:31:24Z

when you have this
term a: s > t > s;

what names do you give to binders in the mmu file ? (because of alpha renaming, we can do as we want, but what do you suggest ?)

digama0 · 2021-04-24T23:58:56Z

Those binders are named _.

digama0 · 2021-04-25T00:11:41Z

what about too big ints, what should developper do to have valid proofcheckers in this case ?
prefix p : $+$ prec 9999999999999999999999999999999999999999999999999999999999999999;

Say, if there is no limit to ints, and a big int overflows yet the developper doesn't guard agains that.
So an (mostly) infinite precedence becomes 1 and all math parses become different.
Shouldn't we guard agains that ?
Looks to me like an exploitable case

There are two valid behaviors for large precedences (more than 2046).

The number is too large, reject it as soon as it is parsed.
Accept it as a large precedence; it is required to be larger than all smaller numbers you could type there.

That is, the verifier implementation gets to choose what the largest numerical precedence is (as long as it is at least 2046), and all numerical precedences in that range have to be ordered like numbers. If you use a fixed size integer type to hold precedences (which is reasonable and logical), then a reasonable choice for the largest precedence is either MAX or MAX-1, using MAX to hold the special non-numerical precedence max. In this case, during parsing you have to be careful to detect when overflow occurs so you can throw a parsing error. Silently allowing the overflow is not acceptable, because then the precedences will not be in numerical order.

digama0 · 2021-04-25T00:23:10Z

for sort fail test, we should probably check the order of modifyers...

I'm considering making this a run test, but for the moment I believe the spec doesn't allow sort modifiers to be out of order so this is fine.

in the spec, shouldn't math-strings be called something else ?

It's true that you can do lots of things with mathematics, but the name seems fine. It is similar to LaTeX "math mode" as the generic term for "when you are between $ delimiters".

arhhhh line comment don't have to start a line !!!!!!
We can have line comments anywhere a space is ok
Which makes catching/storing/somewhat preserving line comments a complete headache (for a patcher/transformation tool)
That's it...I'm crying ;_;

Yes, this is valid. You should be glad that line comments aren't allowed inside math-strings, I considered that for a while (it can be useful for huge multi-line definitions). But it's a lot harder to reserve -- or some other comment starter inside math strings where almost every character can be used as part of a domain specific operator.

This is part of why --| comments exist. Internal comments like your example should almost always be discarded at the lexer stage, but --| comments immediately before the start of a statement is recognized as a "doc comment" and is transferred along with the statement through tools like from-mm and to-lean.

digama0 · 2021-04-25T00:32:12Z

I'm wondering if I should commit fail tests for 3 modifiers 4 * 5 = 20 tests :/
and fail test for 4 modifiers = 23 tests :/

I think there are only 6 = 4 choose 2 out of order pairs to test: any out of order modifier list needs to contain an adjacent out of order modifier (of which there are 6 possibilities), or a duplicate modifier (of which there are 4 possibilities). There are 16 possible pass tests but I don't think it's important to have them all.

But considering that the main goal here is to catch whether the verifier is using an algorithm that lets these pass, you should only check that the order is correct (by checking that s1234 passes) and that out of order is rejected (so s21, s22, s31, s32, s33, s41, s42, s43, s44 are rejected). I'm pretty sure you will find a bug in one of my parsers with that.

digama0 · 2021-04-25T01:18:04Z

For comments, we probably should test against weirdly place comments (in whitestuff inside statements/dirrectives)
consuming lines should be done right also, so that no part of the file disappear because of rogue comments (exploitable ?) but how to do that ?

Yes, surprising comments can definitely cause exploitable issues. (I recall something similar with surprising behavior of the C preprocessor where /* inside the first part of an #if #else #endif can comment out the #else and cause the #if itself to match some other #else, but only when the if condition is true.)

Here's an example of an issue that the current spec may not defend against:

-- I've proved false?
theorem foo: $ F. $;

the secret is that between the two lines is a \r, which may or may not be displayed as a line break but in any case will not terminate the comment (so the theorem is actually in the comment).

Lakedaemon · 2021-04-25T04:57:09Z

I'm wondering if I should commit fail tests for 3 modifiers 4 * 5 = 20 tests :/
and fail test for 4 modifiers = 23 tests :/

I think there are only 6 = 4 choose 2 out of order pairs to test: any out of order modifier list needs to contain an adjacent out of order modifier (of which there are 6 possibilities), or a duplicate modifier (of which there are 4 possibilities). There are 16 possible pass tests but I don't think it's important to have them all.

But considering that the main goal here is to catch whether the verifier is using an algorithm that lets these pass, you should only check that the order is correct (by checking that s1234 passes) and that out of order is rejected (so s21, s22, s31, s32, s33, s41, s42, s43, s44 are rejected). I'm pretty sure you will find a bug in one of my parsers with that.

Damn, I lost my bet :/
So. I'll only be commiting the 16 pass possibilities inside a single pass test and the 6 out of order 2 modifiers fait tests

Lakedaemon · 2021-04-25T04:59:53Z

What is a "run" test ?
Are there other kind of tests ?

On my machine, I run a test, it either fails or pass which is the expected behavior or not, I do not have other options
in the end with the testing framework I use (though I may tweek how/when it fails/pass)

digama0 · 2021-04-25T05:21:52Z

What is a "run" test ?
Are there other kind of tests ?

I mentioned that in an earlier post as a third category besides pass and fail for things which may pass or fail depending on the implementation. One option is to just not have any tests that exercise this category, but another option is to have the tests there and just ignore the result. If the verifier is capable of signalling "bug" (i.e. panic, segfault, or other internal error) separately from "pass" and "fail" then a run test should check that you don't get "bug" but both "pass" and "fail" are acceptable answers. For barebones verifiers like mm0-c there is no distinction between "fail" and "bug" so there isn't much to test.

A given implementation should generally be able to say for any given run test whether it is supposed to be a pass or fail, but keeping track of this information in the test suite is more difficult because of the implementation sensitivity.

Lakedaemon · 2021-04-25T07:42:59Z

ok, I understand now

first tests for (mm0 parser, mmu parser, mm0+mmu proof checker)

2b486d2

digama0 reviewed Apr 23, 2021

View reviewed changes

tests/fail/parsingBoth/chars cannot be outside a subset of the ascii charset.mm0 Outdated Show resolved Hide resolved

tests/fail/parsingBoth/chars cannot be outside a subset of the ascii charset2.mm0 Outdated Show resolved Hide resolved

digama0 added 2 commits April 23, 2021 10:19

tweaking test setup

00b56ce

newlines

a1e58c8

Lakedaemon closed this Apr 23, 2021

Lakedaemon reopened this Apr 23, 2021

whitespace tests

90e6ec2

tests for ids

389d9e4

digama0 reviewed Apr 23, 2021

View reviewed changes

tests/mm0_mmu/pass/whitespace2.mm0 Show resolved Hide resolved

tests for math-strings

f4bea2e

digama0 reviewed Apr 23, 2021

View reviewed changes

int tests

9eccd00

comment test

97d49c5

sort pass test

06b15a9

fail test for sort with 2 modifiers

54f1424

digama0 reviewed Apr 24, 2021

View reviewed changes

all bad-sort tests with regards to order of the modifiers

cfe043e

lakedaemon added 3 commits April 25, 2021 07:03

out of order 2 modifiers bad-sort tests

bccf813

Merge branch 'master' of https://github.com/Lakedaemon/mm0

43015b5

git (my lack of mastery of it) messed the deletion

d4bb832

moved 2 tests to run folder

af6aed7

digama0 force-pushed the master branch 2 times, most recently from 4a8b328 to 97abfae Compare November 5, 2021 04:15

		@@ -0,0 +1,2 @@
		-- V-- overflowing hazard
		prefix p : $+$ prec 4294967296;

first tests for (mm0 parser, mmu parser, mm0+mmu proof checker) #79

Are you sure you want to change the base?

first tests for (mm0 parser, mmu parser, mm0+mmu proof checker) #79

Conversation

Lakedaemon commented Apr 23, 2021

digama0 commented Apr 23, 2021 • edited Loading

Lakedaemon commented Apr 23, 2021

Lakedaemon commented Apr 23, 2021

digama0 commented Apr 23, 2021

Lakedaemon commented Apr 23, 2021

Lakedaemon commented Apr 23, 2021

Lakedaemon commented Apr 23, 2021

digama0 commented Apr 23, 2021

Lakedaemon commented Apr 23, 2021

Lakedaemon commented Apr 23, 2021

digama0 Apr 23, 2021

Choose a reason for hiding this comment

digama0 Apr 23, 2021

Choose a reason for hiding this comment

digama0 commented Apr 23, 2021

Lakedaemon commented Apr 23, 2021

Lakedaemon commented Apr 24, 2021 • edited Loading

Lakedaemon commented Apr 24, 2021 • edited Loading

Lakedaemon commented Apr 24, 2021

Lakedaemon commented Apr 24, 2021

Lakedaemon commented Apr 24, 2021 • edited Loading

Lakedaemon commented Apr 24, 2021

Lakedaemon commented Apr 24, 2021

Lakedaemon commented Apr 24, 2021

Lakedaemon commented Apr 24, 2021 • edited Loading

Lakedaemon commented Apr 24, 2021

digama0 Apr 24, 2021

Choose a reason for hiding this comment

Lakedaemon Apr 24, 2021 • edited Loading

Choose a reason for hiding this comment

Lakedaemon commented Apr 24, 2021

digama0 commented Apr 24, 2021

digama0 commented Apr 25, 2021

digama0 commented Apr 25, 2021

digama0 commented Apr 25, 2021

digama0 commented Apr 25, 2021

Lakedaemon commented Apr 25, 2021

Lakedaemon commented Apr 25, 2021 • edited Loading

digama0 commented Apr 25, 2021 • edited Loading

Lakedaemon commented Apr 25, 2021

digama0 commented Apr 23, 2021 •

edited

Loading

Lakedaemon commented Apr 24, 2021 •

edited

Loading

Lakedaemon commented Apr 24, 2021 •

edited

Loading

Lakedaemon commented Apr 24, 2021 •

edited

Loading

Lakedaemon commented Apr 24, 2021 •

edited

Loading

Lakedaemon Apr 24, 2021 •

edited

Loading

Lakedaemon commented Apr 25, 2021 •

edited

Loading

digama0 commented Apr 25, 2021 •

edited

Loading