-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimized syntax '+' cause 'random_recursive_mutation' error #42
Comments
sorry, the commit 6eae7d1, this is a wrong patch, '+' syntax will cause 'random_recursive_mutation' to be invalid, please roll back to ff4e5a2; for antlr4, the '+' syntax uses breadth expansion, but 'random_recursive_mutation' relies on depth expansion (tree structure) operation; in actual operation, 'random_recursive_mutation' always returns that the recursive node cannot be found, so each round
Back to the original question #17, I gave the reason in this reply #17 (comment), there are several places in the grammar that break the rules of LL(1)/LL(*), (non-terminal symbols can be deduced from ε), which makes antlr4 very easy to enter a backtracking loop. for my original question, my test case is:
similarly, I broke the LL(1)/LL(*) rules (the candidate values of non-terminal symbols (first(α)) intersect), which also caused antlr4 to fall into grammar parsing backtracking. I fixed the syntax file as follows:
|
so is a fix to the grammar mutator needed? |
nope, the commit ff4e5a2 is perfect, Grammar-Mutator will works fine. as users, what we need to do is to write the correct json syntax as correctly as possible (perhaps according to LL(1) rules), checking and testing more, and then fuzzing in the production environment |
@0x7Fancy do you mean we need to revert you PR that was merged recently? |
yes, revert that PR. (or let me overwrite it later |
based on the above issues, we create simpler test cases,
test.json
:tanslate to
test.g4
:and input
40960_very.txt
:running with
antlr4-parse
:from the perspective of antlr4, we can use the
+
syntax to describetest.g4
, and ignore this prefix matching, as followstest.g4
:running again with
antlr4-parse
:so I made a patch to implement the above ideas, please refer to 0x7Fancy@6eae7d1;
I have only implemented the optimization of head recursion and tail recursion here, which is simple and easy to understand. for intermediate recursion, I think it can be rewritten as head/tail recursion in json
of course, this is just a mitigation measure. When the mutation generates a sufficiently complex syntax tree, it may still cause antlr4 to get stuck in syntax parsing.
Originally posted by @0x7Fancy in #17 (comment)
The text was updated successfully, but these errors were encountered: