Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix indirect left recursive rule operator precedence via delegated operator precedence #4480

Open
wants to merge 7 commits into
base: dev
Choose a base branch
from

Conversation

MisterErwin
Copy link

@MisterErwin MisterErwin commented Nov 30, 2023

Resolves #4460

Operator precedence is currently (incorrectly) reset to zero, when a left-recursive rule contains other (non-left)recursive elements.

grammar T
e : m | e '*' e * ID;
m: '-' e;
ID : 'a'..'z'+ ;

The ATN for rule e is shown below:
ATN of rule e

In case no indirect recursion was present (e : '-' e; | e '*' e * ID;), the ATN would look like the following:

ATN of rule e: '-' e | e'*' ...

Notably, without indirectness, <e[2]> signals a precedence used.

For completeness, the ATN of rule m is shown below too:
ATN of rule m

The <e[0]> (or in Java: e(0);) in rule m resets the operator precedence to zero, which is not correct (see issue #4460 or the tests added by this PR).

Proposed solution:

In case we detect a rule reference resulting in a binary or prefix recursion, we introduce an optional parameter int _dp (similar to the p parameter of left recursive rules) with a default of 0 to the targeted rule.
(This would replace the old <e[0]> by a <e[_dp]>.)

This PR does not touch the ATN behavior of ANTLR (well.. almost not. The possible precedence value _dp is now excluded during the ATN creation, but where this value is now set, any precedence was 0 before - yay).

This MR also introduces tests for additional attributes for rules (but not for left recursive ones, because ANTLR outputs an error message for these: rule e is left recursive but doesn't conform to a pattern ANTLR can handle)

The precedence option for indirect left-recursive rules is now set to "_dp", which is a new parameter introduced (similar to the existing _p).

Breaking changes:

This PR would prevent optional parameters from being added to parsers for the Dart target via the grammar.
Normal parameters are not affected, thus I would argue that change to be not of importance.

The reason I found this bug at all was that with large grammars,
I wanted to use the same rules (such as formulas) in multiple other rules.

The expected token position of the TestSymbolIssues#testUndefinedLabel test had to be modified by 5 tokens ((= dp 1)).

Thanks to Ken (@kaby76) for the initial directions over on SO

(By accident this also fixes #4477 :) )

P.S.: I've run a bunch of tests on internal grammars generated using the changes proposed, which so far has not shown additional errors.

P.P.S.: I am not sure if a RuleWalker similar to the LeftRecursiveRuleWalker instead of my LeftRecursiveRuleTransformer#isBinaryOrPrefix method would have been the better option - (but my version is at least easier to step through while debugging)

P.P.P.S.: The DOTGenerator is not aware of the presence of the _dp value for rererences - if desired, I can add this functionality (but wanted to refrain from doing so before the other changes were discussed.)

@MisterErwin MisterErwin marked this pull request as draft November 30, 2023 15:56
@MisterErwin MisterErwin marked this pull request as ready for review November 30, 2023 21:40
@n-dragon
Copy link

n-dragon commented Aug 30, 2024

any update on when this will be merged ?
side question,
how did you generate the ATN ?

@MisterErwin
Copy link
Author

MisterErwin commented Aug 30, 2024

any update on when this will be merged ?

Some aspects of this PR offer up some discussion (mainly my decicion on the RuleWalker and DOTGenerator),
other than I am just waiting for feedback (see also this discussion).
Which is also why haven't resolved the merge conflict.

For my usages, I'm either working around this issue by preprocessing the input grammars or by patching these changes into antlr.

side question, how did you generate the ATN ?

The -atn CLI option outputs the ATN in graphviz/dot format (which you can render using a graphviz or some websites)

@n-dragon
Copy link

n-dragon commented Sep 4, 2024

Some aspects of this PR offer up some discussion (mainly my decicion on the RuleWalker and DOTGenerator),
other than I am just waiting for feedback (see also this discussion).
Which is also why haven't re

thanks for the answer,
may I have your opinion on this discussion?
#4686

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Swift] only first rule parameter generated correctly left recursive rule references and operator precedence
3 participants