🚀 Getting the grammar ready for a first release #7

TheCedarPrince · 2024-08-16T03:03:49Z

I was wondering, would you be willing to tag a release of tree-sitter-forester? I wanted to try to experiment with tree-sitter-forester in some other tree-sitter pipelines and wanted to have a tagged version (with a tar.gz file of source code) to have as reference.

Thanks!

~ tcp 🌳

kentookura · 2024-08-16T06:06:45Z

Sure. I am putting in some work to update the grammar for 4.2. I think when that's done it's a good time for a release. May I ask what you're planning on trying out? I'll definitely be interested as well 👀

TheCedarPrince · 2024-08-16T18:47:48Z

Hi @kentookura,

That sounds great!

May I ask what you're planning on trying out?

And absolutely, happy to share! Hope you don't mind the long message as I just got my early experiments working last night with your tree-sitter package but here's the story.

Background

I really enjoy Forester and have been making tons of notes with the tool for my studies, research, and more. I also want to start extracting from my Forester forest all kinds of pieces of information to make study materials (i.e. flash cards) to help me better retain what I am studying. Finally, I have a prior knowledge base of over 500 notes (maybe 600 at this point) that I would like to migrate over to tree files. For that reason, I came across your tree-sitter-forester project.

Experiment

As a first pass, I wanted to see if I could use your grammar within the Julia programming language (the language I am most comfortable in with analysis and more) to manipulate tree files. This involved a prototype of bundling your tree-sitter grammar offline into an executable, forking TreeSitter.jl and adding support for forester grammar into it, and using some of the queries your provided to test out the grammar.

Result

I got it working! Here is a screenshot of this whole process taking place:

It's pretty great to see the whole thing coming together!

Next Steps

Now I am trying to figure out the following:

First Idea

Investigate queries I can make as I would like to extract the contents of entire paragraphs at a time. This almost seems to work as shown here:

But it ends up sometimes breaking on larger files like:

Which gives:

Second Idea

Try to figure out how to handle tikz better within files. The file example I gave above

Seems to break and throw occasional errors within the parsing process:

Third Idea

How to support queries for custom macros. I really have no idea about this one as it seems like parsing doesn't work on my custom macros that I have made within Forester.

Concluding Remarks

I really appreciate all the work you are doing here! As it stands, I haven't shared publicly the code to do all this yet as I saw you don't have a LICENSE file within the main repo. If you want to give this all a try, happy to make it accessible! Additionally, I am happy to share any additional thoughts or perspectives on this all!

TheCedarPrince · 2024-08-16T22:47:31Z

P.S. @MichaelHatherly just wanted to give you a little ping and say this would not have been possible without you so thank you very much for TreeSitter.jl!

kentookura · 2024-08-17T13:13:50Z

@TheCedarPrince Thanks for the thorough writeup!

I am definitely interested in making this grammar as useful as possible for integrating with other tools/languages, so I am very grateful that you have a real project to work on. I think this will greatly help the development of this grammar.

I renamed the issue so that it better describes the things we are discussing.

A couple of notes:

I have added a license and pushed some updates. Did anything break after the update? Are you still encountering errors?
Prior to this, my only intended usecase for this grammar was syntax highlighting in neovim (and other editors that support treesitter). As such, I did not need to parse particularly accurately. For your usecase we need to be more accurate and comprehensive. I'd like to ask you to help me make the parser better by reporting any errors you encounter in the issues.
Looking at the rust grammar, it looks like they do separate commits for changes to the grammar and the files that change after doing codegen. This looks like a good thing to do, as small changes to the grammar will still result in large changesets. (This note is more for myself)
I wonder if we can use language injections to accurately parse the embedded tex code that you have in your macros.
I should set up a CI action: https://github.com/tree-sitter/parse-action

TheCedarPrince · 2024-08-17T22:25:44Z

Hey @kentookura,

Thanks for the thorough writeup!

Glad you appreciated the note!

I am definitely interested in making this grammar as useful as possible for integrating with other tools/languages, so I am very grateful that you have a real project to work on. I think this will greatly help the development of this grammar.

Oh wow! Thank you so much and yes, I'll be tinkering off and on with my notes as a I move forward. I will definitely keep you apprised of my experiments. In fact, actually, our discussion so far here has made me realize I should probably publicly share my forest soon. That way, I can more easily reference tree examples for parsing.

I have added a license and pushed some updates.

Where did you put the license file? I grepped around the repo and saw it was under an MIT license. Is that right?

Did anything break after the update? Are you still encountering errors?

I just rebuilt on my side and it seems I am no longer able to query based on "highlights" or many other fields. It seems I can only write queries now for paragraphs and at the moment... Not sure what broke. I could open an issue perhaps?

I also double checked and my old Julia code does definitely work without the recent changes you made. For example, this works:

using TreeSitter

forester = Parser(:forester)

test_file = """
       \\title{A Small Experiment}
       \\taxon{julia, experiment}
       \\date{08-16-2024}
       \\p{This is a small paragraph.
       It's not too much.
       But it is my paragraph}
       % This is a little comment
       """;

tree = parse(forester, test_file);

q = query```
       ((paragraph (_) @text.paragraph))
       ((taxon (_) @text.taxon))
       (title (_) @text.title)
       ```forester

out = []
for capture in TreeSitter.each_capture(tree, q, test_file)
   id = TreeSitter.capture_name(q, capture)
   literal = TreeSitter.slice(test_file, capture.node)
   push!(out, (id, literal))
end

Which gives:

5-element Vector{Any}:
 ("text.title", "A Small Experiment")
 ("text.taxon", "julia, experiment")
 ("text.paragraph", "This is a small paragraph.")
 ("text.paragraph", "It's not too much.")
 ("text.paragraph", "But it is my paragraph")

But it does not in the most recent grammar update (i.e. this one: 0ceda08). I didn't check the other generate files commit you made since February but I think it is the same as my fork's here: 722f586 (all I did was run tree-sitter generate in the root of the repo and committed changes).

Prior to this, my only intended usecase for this grammar was syntax highlighting in neovim (and other editors that support treesitter).

Yea, I saw that was the case and was a big fan of your work already on the neovim plugin. I just decided to give it a whirl on my side to see if I could pass it into a Julia program too without having to rewrite anything on my end. I was very shocked in an immensely pleasant way to see that it worked mostly out of the box and I think it is a strong testament to your implementation as well as the tree-sitter standard! 😃

For your usecase we need to be more accurate and comprehensive. I'd like to ask you to help me make the parser better by reporting any errors you encounter in the issues.

Absolutely and can do! I will admit I am absolutely new to tree-sitter and the only way I know how to test things is in the context of my Julia experiments. If there is an easier way to test things or a way that is more helpful to you, please let me know.

I wonder if we can use language injections to accurately parse the embedded tex code that you have in your macros.

That would be neat! Maybe worth opening a separate feature issue to discuss?

I should set up a CI action: tree-sitter/parse-action

Oh that would be great. I hope that would help to catch issues rather than me having to go through some of my build process because it is a bit laborious on my side (looking to see how to fix that from a Julia perspective).

All in all, great stuff! Let me know what you think! Thanks again for all the work here!

~ tcp 🌳

kentookura · 2024-08-18T05:00:23Z

Whoops, there was a weird git issue. I force pushed main with the GNU license, sorry about that.

Indeed, some of the queries are bound to be broken now, as they depend on the specific structure of the parse tree.

In contrast to menhir (the ocaml parser generator we use), we can't specify the syntax tree structure on its own and ensure that the parser generates this structure. Rather, the shape that the syntax tree can take on is defined by the grammar itself. I guess it's bad practice to push broken changes when people actually use this code, but I have not tagged a release yet. I'll be more disciplined once 0.1.0 lands.

When I came back to this repo after a while and ran tree-sitter generate a bunch of new files appeared in the repo, and I couldn't find anything in the release notes of treesitter about that, but I didn't look particularly hard...

If there is an easier way to test things or a way that is more helpful to you, please let me know.

The way testing works is that we add pieces of syntax along with the expected parse tree to this file:
https://github.com/kentookura/tree-sitter-forester/blob/main/test/corpus/statements.txt
and run tree-sitter test. You can read more about the workflow here:
https://tree-sitter.github.io/tree-sitter/creating-parsers#command-test

That would be neat! Maybe worth opening a separate feature issue to discuss?

Sure. I hope this is achievable with the grammar itself, the only way I've seen it done is with queries, meaning parsing only accurately enough to know where the foreign code starts and ends, and then telling the editor that this range should actually be highlighted as tex.

TheCedarPrince · 2024-08-18T05:46:47Z

Whoops, there was a weird git issue. I force pushed main with the GNU license, sorry about that.

I see it! Awesome stuff -- I'll open up my Julia experiments soon so if you want to try things out, you are welcome to.

I guess it's bad practice to push broken changes when people actually use this code, but I have not tagged a release yet. I'll be more disciplined once 0.1.0 lands.

No worries! It's kinda anything goes until a first release happens! :D

You can read more about the workflow here:
tree-sitter.github.io/tree-sitter/creating-parsers#command-test

Oh sweet! Thanks!

Sure. I hope this is achievable with the grammar itself, the only way I've seen it done is with queries,

I'll open up something later for tracking!

kentookura changed the title ~~Could v0.1.0 Have a Tagged Release?~~ 🚀 Getting the grammar ready for a first release Aug 17, 2024

TheCedarPrince mentioned this issue Aug 19, 2024

[FEATURE] Trying To Extract Comment Text #11

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚀 Getting the grammar ready for a first release #7

🚀 Getting the grammar ready for a first release #7

TheCedarPrince commented Aug 16, 2024

kentookura commented Aug 16, 2024

TheCedarPrince commented Aug 16, 2024

TheCedarPrince commented Aug 16, 2024

kentookura commented Aug 17, 2024

TheCedarPrince commented Aug 17, 2024

kentookura commented Aug 18, 2024

TheCedarPrince commented Aug 18, 2024

🚀 Getting the grammar ready for a first release #7

🚀 Getting the grammar ready for a first release #7

Comments

TheCedarPrince commented Aug 16, 2024

kentookura commented Aug 16, 2024

TheCedarPrince commented Aug 16, 2024

Background

Experiment

Result

Next Steps

First Idea

Second Idea

Third Idea

Concluding Remarks

TheCedarPrince commented Aug 16, 2024

kentookura commented Aug 17, 2024

TheCedarPrince commented Aug 17, 2024

kentookura commented Aug 18, 2024

TheCedarPrince commented Aug 18, 2024