Massive update (details within) #3

savetheclocktower · 2025-08-07T22:13:42Z

I fully do not expect this PR to be adopted in its entirety. But I've forked this parser and made some changes, and I feel it would be rude not to at least offer up this work in case any of it looked intriguing.

Problems

Here are the main things I wanted to fix:

All “keywords” in Vento were assigned as keyword nodes, rather than the more common pattern of treating them as anonymous nodes. The first “word” in every tag was presumed to be a keyword; this offers flexibility for, say, extensions (and could be added back if there were generalized need for such a thing). But these two decisions combine to produce a tree that's very hard to query without adding #match? predicates everywhere.
There's no structural relationship between, e.g., {{ if }} and its corresponding {{ /if }}. That makes it hard to do things like code folding. It's even harder to do proper indentation hinting, since there's nothing built into the Vento syntax that tells me whether or not {{ foo "bar"}} is a standalone tag or is just a start tag to be followed by an eventual {{ /foo }}.
The Vento syntax is refreshingly simple in what it manages itself and what it cedes to JavaScript. But some constructs in the parser cede to JavaScript at awkward points. For instance, given {{ for value of collection }}, the parser would claim for as the keyword and then wrap value of collection in a code node — but that's not fit to be highlighted by JS on its own, because it isn't valid JavaScript.

There are some upsides to the current simplicity; for instance, it means that highlights.scm gets to be pretty short. But the price is paid in folds.scm and indents.scm, which have to be much larger, since they need to query against the content of keyword nodes as the only way of identifying paired nodes. (And I don't use Neovim, so I don't even know if folds can be properly expressed using their query syntax.)

Fixes

This approach definitely has some trade-offs rather than being an unmitigated positive, but I think it's worth it:

All known blocks (constructs with beginning and ending tags) have corresponding nodes in the tree.
Even keywords that have block and non-block versions — like export and set — are parsed properly.
Known keywords are recognized and marked as anonymous nodes.
Injection points are more precise. (For instance: in the {{for value of collection }} example from above, only collection is wrapped in a code node; for and of are treated as keywords. This lets me mark value as a variable and allows collection to instead be any JS of arbitrary complexity, since the JS grammar is in charge of highlighting it.)
I fixed something that bugged me: tags were inheriting all of the leading space before the tag. This gave lots of tag nodes starting positions that didn't correspond to where their {{ delimiters began.

And a couple others that are more like chores than refactors:

Tests must now go in test/corpus instead of merely corpus, so I moved them.
I updated to the latest tree-sitter-cli. This introduces a lot of new boilerplate files, but also handles most of the gruntwork of exposing the parser to various Tree-sitter bindings.

Once again, I do not even expect this to be landed as-is, since it would be disruptive! But if any of this appeals to you, let me know, and I'll see about isolating it and making a more precise contribution.

* Update to latest `tree-sitter-cli` and ABI 15 * Rewrite nearly from scratch * Change the corpus to agree * Add structures for all known blocks (makes folds and indents much easier) * Use more anonymous nodes

* Add support for two loop variables in `for` * Fix leading whitespace before JavaScript-embed tags * Add lots of explanatory comments * Add `for` blocks to the corpus

wrapperup · 2025-08-08T08:26:05Z

Wow! Thanks for doing all this.

I basically only used this for syntax highlighting personally, so I have no issue with changing how the nodes are structured. Honestly, the current implementation is extremely simplistic and has many edge cases even for just that purpose, so I have no issues replacing it with something more sound.

Will take some time to review the PR this weekend!

savetheclocktower added 12 commits August 6, 2025 22:29

Big changes!

2785ac2

* Update to latest `tree-sitter-cli` and ABI 15 * Rewrite nearly from scratch * Change the corpus to agree * Add structures for all known blocks (makes folds and indents much easier) * Use more anonymous nodes

Remove test file

4b2885f

Other fixes:

93f1cc3

* Add support for two loop variables in `for` * Fix leading whitespace before JavaScript-embed tags * Add lots of explanatory comments * Add `for` blocks to the corpus

Whoops; this should be ignored

28a66ab

Flip DEBUG off

e83b61d

Remove old code

3c35beb

Fix highlights query

59e0081

Fix import specifiers

a64226c

Fix issue with JS tags that start with for or if

1448610

Fix leading whitespace on comment_tags

240c61c

Flip DEBUG off again

e8e5928

Fix comment delimiter anonymous nodes

9ec7dc3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Massive update (details within) #3

Massive update (details within) #3

Uh oh!

savetheclocktower commented Aug 7, 2025 •

edited

Loading

Uh oh!

wrapperup commented Aug 8, 2025

Uh oh!

Uh oh!

Massive update (details within) #3

Are you sure you want to change the base?

Massive update (details within) #3

Uh oh!

Conversation

savetheclocktower commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problems

Fixes

Uh oh!

wrapperup commented Aug 8, 2025

Uh oh!

Uh oh!

savetheclocktower commented Aug 7, 2025 •

edited

Loading