Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TODO "Figure out how to do verbatim sequences in ANTLR" is impossible to implement with ANTLR #39

Open
ST92 opened this issue Dec 19, 2021 · 3 comments

Comments

@ST92
Copy link

ST92 commented Dec 19, 2021

I spent last few days digging around why would such an innocently looking feature be left as a TODO. I was looking for something to put my hands into, and it looked promising enough.

Turns out ANTLR doesn't support back-references or forward-references at all. There is no good way to do it using only it.

A known workaround is to embed actions that verify whether the two tokens match (analogous to how XML opening-closing tag pair match tagname), but those actions involve putting a piece of code inside the grammar lexer definition in a programming language that matches the language of the generated lexer.

That would mean putting Java code in concise-encoding grammar definitions, and thus tying it tightly to Java.

I want to write a Rust 100% implementation. AFAIK at this moment I need to write a custom lexer and parser to make verbatim escape sequences work.

TLDR; ANTLR is insufficient, because VES grammar is context-sensitive

@ST92
Copy link
Author

ST92 commented Dec 19, 2021

On a positive note, the spec is detailed enough, such that wrong grammar files don't impact anything really. Honestly I'm a bit disappointed that ANTLR seems the best tool for the job but is very much lacking.

@kstenerud
Copy link
Owner

Yeah, I was hoping to rig something up with a templating engine (in python or whatever) to generate a finalized grammar file with stub code for whatever language is being built. In theory the actual verbatim sequence code itself is simple since you're just reading termination token data until the next whitespace, then reading content data until you encounter the termination token again.

@kstenerud
Copy link
Owner

BTW please do write up anything that you find confusing or weird in the spec. If it's confusing, it's badly written!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants