Skip to content

Conversation

savetheclocktower
Copy link

@savetheclocktower savetheclocktower commented Aug 7, 2025

I fully do not expect this PR to be adopted in its entirety. But I've forked this parser and made some changes, and I feel it would be rude not to at least offer up this work in case any of it looked intriguing.

Problems

Here are the main things I wanted to fix:

  • All “keywords” in Vento were assigned as keyword nodes, rather than the more common pattern of treating them as anonymous nodes. The first “word” in every tag was presumed to be a keyword; this offers flexibility for, say, extensions (and could be added back if there were generalized need for such a thing). But these two decisions combine to produce a tree that's very hard to query without adding #match? predicates everywhere.
  • There's no structural relationship between, e.g., {{ if }} and its corresponding {{ /if }}. That makes it hard to do things like code folding. It's even harder to do proper indentation hinting, since there's nothing built into the Vento syntax that tells me whether or not {{ foo "bar"}} is a standalone tag or is just a start tag to be followed by an eventual {{ /foo }}.
  • The Vento syntax is refreshingly simple in what it manages itself and what it cedes to JavaScript. But some constructs in the parser cede to JavaScript at awkward points. For instance, given {{ for value of collection }}, the parser would claim for as the keyword and then wrap value of collection in a code node — but that's not fit to be highlighted by JS on its own, because it isn't valid JavaScript.

There are some upsides to the current simplicity; for instance, it means that highlights.scm gets to be pretty short. But the price is paid in folds.scm and indents.scm, which have to be much larger, since they need to query against the content of keyword nodes as the only way of identifying paired nodes. (And I don't use Neovim, so I don't even know if folds can be properly expressed using their query syntax.)

Fixes

This approach definitely has some trade-offs rather than being an unmitigated positive, but I think it's worth it:

  • All known blocks (constructs with beginning and ending tags) have corresponding nodes in the tree.
  • Even keywords that have block and non-block versions — like export and set — are parsed properly.
  • Known keywords are recognized and marked as anonymous nodes.
  • Injection points are more precise. (For instance: in the {{for value of collection }} example from above, only collection is wrapped in a code node; for and of are treated as keywords. This lets me mark value as a variable and allows collection to instead be any JS of arbitrary complexity, since the JS grammar is in charge of highlighting it.)
  • I fixed something that bugged me: tags were inheriting all of the leading space before the tag. This gave lots of tag nodes starting positions that didn't correspond to where their {{ delimiters began.

And a couple others that are more like chores than refactors:

  • Tests must now go in test/corpus instead of merely corpus, so I moved them.
  • I updated to the latest tree-sitter-cli. This introduces a lot of new boilerplate files, but also handles most of the gruntwork of exposing the parser to various Tree-sitter bindings.

Once again, I do not even expect this to be landed as-is, since it would be disruptive! But if any of this appeals to you, let me know, and I'll see about isolating it and making a more precise contribution.

* Update to latest `tree-sitter-cli` and ABI 15
* Rewrite nearly from scratch
* Change the corpus to agree
* Add structures for all known blocks (makes folds and indents much easier)
* Use more anonymous nodes
* Add support for two loop variables in `for`
* Fix leading whitespace before JavaScript-embed tags
* Add lots of explanatory comments
* Add `for` blocks to the corpus
@wrapperup
Copy link
Collaborator

Wow! Thanks for doing all this.

I basically only used this for syntax highlighting personally, so I have no issue with changing how the nodes are structured. Honestly, the current implementation is extremely simplistic and has many edge cases even for just that purpose, so I have no issues replacing it with something more sound.

Will take some time to review the PR this weekend!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants