Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various errors when using | inside of terminals #31

Open
swwu opened this issue Jul 7, 2022 · 0 comments
Open

Various errors when using | inside of terminals #31

swwu opened this issue Jul 7, 2022 · 0 comments

Comments

@swwu
Copy link

swwu commented Jul 7, 2022

I've noticed some errors when using a terminal "production" rule of the form

T0: T1 | T2 | T3

where all of the given expressions are terminals. These errors only occur in the standalone parser generated by Lark.js; the same grammar will correctly parse an identical string in the python version of lark. I've isolated two hopefully-minimal-enough example cases below.

This seems to be similar to #21 in that it's related to some Javascript-specific regex foible that gets encountered when agglomerating terminals together via |, but as I'm not super-familiar with the internals of the library I can't be sure. As in #21, replacing VALUE with value everywhere (i.e. replacing the terminal rule with a non-terminal one) causes both of the following examples to parse correctly.

Example 1

This grammar:

?start: thing
thing: thing W thing
    | expr
expr: label W? VALUE
    | VALUE
label: BARE_WORD W? ":"
W: /[ \t\n\v\f]/+
VALUE: NUMBER | BARE_WORD | STRING
BARE_WORD: /[^\s:\(\)]/+
STRING: "\"" /((?:\\"|[^\r\n"]))/* "\""
NUMBER: /[0-9]+/

fails with UnexpectedToken when attempting to parse the string "a:b", although running it in the Python version of Lark results in a correct parse.

Example 2

This grammar:

?start: thing
thing: label VALUE | VALUE
label: BARE_WORD W? ":"
W: /[ \t\n\v\f]/+
VALUE: NUMBER | BARE_WORD | STRING
BARE_WORD: /[^\s:\(\)]/+
STRING: "\"" /((?:\\"|[^\r\n"]))/* "\""
NUMBER: /[0-9]+/

fails with SyntaxError: Invalid flags supplied to RegExp constructor 'nully' during lexing of the same string "a:b"; the Python version also correctly parses it.

@swwu swwu changed the title Various issues with terminals that produce one of many other terminals Various errors when using | inside of terminals Jul 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant