Incremental and error-resilient parser #286

novusnota · 2024-04-23T17:23:55Z

Would greatly help with #183.

The basic idea is to move from the current Ohm-generated parser to a standalone one, which would be suitable for tools like Language Server or others. Additionally, it would be nice to keep the existing grammar.ohm spec file — it will allow us to check the implementetion of the new parser against an Ohm-generated one.

List of ideas, papers and useful blog-posts (organized by year, in decreasing order):

Some more things:

The Lezer Parser System: https://lezer.codemirror.net
https://github.com/zesterer/chumsky

novusnota · 2024-04-23T18:52:13Z

But until then, we must utilize the most out of Ohm's parser, including its incremental parsing capabilities and try to enhance its syntax error reporting clarity locally or in it's upstream repo. It's actually seems to be not that far from giving a structured error message instead of a pre-formed string (although if worst comes to worst, we could parse the resulting string however brittle this may seem).

Note, that it's not strictly necessary to eventually make an incremental parser, but it's a must to make it error-resilient and producing nice errors (see: #183).

0xGeorgii · 2024-04-26T09:39:16Z

hey @novusnota this is a great idea and a huge boost for tact infrastructure! I am interested in implementing this part but there are many details for such a big project that need to be settled down before approaching it. here are some of the thoughts and questions I would like to raise and discuss here:

what are the key non-functional qualities required for the parser?
1.1 performance? should it be performance-optimized and be able to possess N tokens in a second? if yes, is there a desired value for that or a reference performance example (like to be not slower than Ohm)?
1.2 resilience? should it be able to produce a stable AST for the broken syntax?
1.3 do we need a fuzzer to validate pp. 1.1 and 1.2 ?
should it be stateful? keeping in mind an IDE infrastructure pipeline IDE <-> LSP <-> [parser; compiler] ?
2.1 now, here it means incremental parsing is required because we probably do not want to reiterate the whole tree but only the changed branch.
do we need a custom lexer?
a smart internal structure design is required to provide a painless way to reflect tact syntax updates.
~~is a static AST analyser needed?~~

The target language is assumed to be ts to keep the variety of used technologies dense.
Since Ohm is currently used, the new parser can adopt and take some approaches from it (to reduce reinventing the wheel phase).

anton-trunov · 2024-04-26T09:43:46Z

is a static AST analyser needed?

This is definitely out-of-scope for the parser project. There are two static analyzer projects supported by the grants and bounties program:

novusnota · 2024-05-02T16:33:03Z

@0xGeorgii this is a big project, but we don't need to rush out the implementation before we prototype some more with Ohm and check off some points from #314 with it. Moreover, #183 has to be addressed with the current Ohm parser we have.

That said, Ohm won't be going anywhere even when we'll have a new parser — it's always nice to have a formal specification/reference, so Ohm's grammar.ohm and auto-generated parser would suit as that reference.

Now, to your questions:

1.1 performance?

I think just being faster than Ohm would do, but concrete metrics are up for discussion.

1.2 resilience? should it be able to produce a stable AST for the broken syntax?

Yes, that's the point :)
Note, that this is different from error-recovery in a sense that the parser should just not fall in presence of failures instead of trying to correct some of them (by placing a pseudo-node with ); when calling functions like myAddress(|, where | is the caret position, for example). Although I'm not sure which strategy would be the most optimal, there's one thing for sure — this parser should work in editor/IDE environments too, where code is, most of the time, in the broken state.

Oh, and incrementality — lexer/parser combo should be able to start from a specified rule and not re-lex/re-parse the whole tree. This is very important for editor/IDE environments with frequent changes of the source code. (We may introduce a debounce or something here, just to keep things simple, but that's still a thing to keep in mind nonetheless).

At the moment, I lean towards having a combined LL (with some tricks) + Pratt (for expressions) parser architecture. But this is not set in stone :)

1.3 do we need a fuzzer to validate pp. 1.1 and 1.2?

Not sure here, but it would be nice.

do we need a custom lexer?

Yes, and it also must be error-resilient — just marking invalid tokens and going forward should be enough.

a smart internal structure design is required to provide a painless way to reflect tact syntax updates.

That's what the Ohm's grammar for — syntax will first be tested in it, then those changed would be matched by the new parser. At least that's how I view it.

Some important questions (IMHO) weren't asked:

What to do with comments? Especially multi-line ones? Do we attach them to the AST after the first pass, or...
What to do with errors? How could we fine-tune them for: end-users on the CLI, API usage in tools, etc.
How would the API for external tools and CLI would look like? Related to Tact frontend API: Improvements for tooling #314 and other similar issues.

Summoning @anton-trunov to correct me about our plans and/or my points here :)

novusnota added the refactoring label Apr 23, 2024

anton-trunov added the scope: parser label Apr 23, 2024

anton-trunov added big This is a hard task, more like a project and will take a while to implement and removed refactoring labels Apr 24, 2024

anton-trunov changed the title ~~Towards a better, incremental and error-resilient parser~~ Incremental and error-resilient parser Apr 24, 2024

novusnota mentioned this issue Apr 24, 2024

Add a continuously updated (not necessarily incremental) code model to the compiler API #291

Open

novusnota mentioned this issue May 1, 2024

Tact frontend API: Improvements for tooling #314

Open

17 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incremental and error-resilient parser #286

Incremental and error-resilient parser #286

novusnota commented Apr 23, 2024 •

edited by anton-trunov

Loading

novusnota commented Apr 23, 2024 •

edited

Loading

0xGeorgii commented Apr 26, 2024 •

edited

Loading

anton-trunov commented Apr 26, 2024

novusnota commented May 2, 2024 •

edited

Loading

Incremental and error-resilient parser #286

Incremental and error-resilient parser #286

Comments

novusnota commented Apr 23, 2024 • edited by anton-trunov Loading

novusnota commented Apr 23, 2024 • edited Loading

0xGeorgii commented Apr 26, 2024 • edited Loading

anton-trunov commented Apr 26, 2024

novusnota commented May 2, 2024 • edited Loading

novusnota commented Apr 23, 2024 •

edited by anton-trunov

Loading

novusnota commented Apr 23, 2024 •

edited

Loading

0xGeorgii commented Apr 26, 2024 •

edited

Loading

novusnota commented May 2, 2024 •

edited

Loading