-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🚀 Getting the grammar ready for a first release #7
Comments
Sure. I am putting in some work to update the grammar for 4.2. I think when that's done it's a good time for a release. May I ask what you're planning on trying out? I'll definitely be interested as well 👀 |
Hi @kentookura, That sounds great!
And absolutely, happy to share! Hope you don't mind the long message as I just got my early experiments working last night with your tree-sitter package but here's the story. BackgroundI really enjoy Forester and have been making tons of notes with the tool for my studies, research, and more. I also want to start extracting from my Forester forest all kinds of pieces of information to make study materials (i.e. flash cards) to help me better retain what I am studying. Finally, I have a prior knowledge base of over 500 notes (maybe 600 at this point) that I would like to migrate over to tree files. For that reason, I came across your tree-sitter-forester project. ExperimentAs a first pass, I wanted to see if I could use your grammar within the Julia programming language (the language I am most comfortable in with analysis and more) to manipulate tree files. This involved a prototype of bundling your tree-sitter grammar offline into an executable, forking TreeSitter.jl and adding support for forester grammar into it, and using some of the queries your provided to test out the grammar. ResultI got it working! Here is a screenshot of this whole process taking place: It's pretty great to see the whole thing coming together! Next StepsNow I am trying to figure out the following: First IdeaInvestigate queries I can make as I would like to extract the contents of entire paragraphs at a time. This almost seems to work as shown here: But it ends up sometimes breaking on larger files like: Which gives: Second IdeaTry to figure out how to handle tikz better within files. The file example I gave above Seems to break and throw occasional errors within the parsing process: Third IdeaHow to support queries for custom macros. I really have no idea about this one as it seems like parsing doesn't work on my custom macros that I have made within Forester. Concluding RemarksI really appreciate all the work you are doing here! As it stands, I haven't shared publicly the code to do all this yet as I saw you don't have a LICENSE file within the main repo. If you want to give this all a try, happy to make it accessible! Additionally, I am happy to share any additional thoughts or perspectives on this all! |
P.S. @MichaelHatherly just wanted to give you a little ping and say this would not have been possible without you so thank you very much for TreeSitter.jl! |
@TheCedarPrince Thanks for the thorough writeup! I am definitely interested in making this grammar as useful as possible for integrating with other tools/languages, so I am very grateful that you have a real project to work on. I think this will greatly help the development of this grammar. I renamed the issue so that it better describes the things we are discussing. A couple of notes:
|
Hey @kentookura,
Glad you appreciated the note!
Oh wow! Thank you so much and yes, I'll be tinkering off and on with my notes as a I move forward. I will definitely keep you apprised of my experiments. In fact, actually, our discussion so far here has made me realize I should probably publicly share my forest soon. That way, I can more easily reference tree examples for parsing.
Where did you put the license file? I grepped around the repo and saw it was under an MIT license. Is that right?
I just rebuilt on my side and it seems I am no longer able to query based on "highlights" or many other fields. It seems I can only write queries now for paragraphs and at the moment... Not sure what broke. I could open an issue perhaps? I also double checked and my old Julia code does definitely work without the recent changes you made. For example, this works: using TreeSitter
forester = Parser(:forester)
test_file = """
\\title{A Small Experiment}
\\taxon{julia, experiment}
\\date{08-16-2024}
\\p{This is a small paragraph.
It's not too much.
But it is my paragraph}
% This is a little comment
""";
tree = parse(forester, test_file);
q = query```
((paragraph (_) @text.paragraph))
((taxon (_) @text.taxon))
(title (_) @text.title)
```forester
out = []
for capture in TreeSitter.each_capture(tree, q, test_file)
id = TreeSitter.capture_name(q, capture)
literal = TreeSitter.slice(test_file, capture.node)
push!(out, (id, literal))
end Which gives: 5-element Vector{Any}:
("text.title", "A Small Experiment")
("text.taxon", "julia, experiment")
("text.paragraph", "This is a small paragraph.")
("text.paragraph", "It's not too much.")
("text.paragraph", "But it is my paragraph") But it does not in the most recent grammar update (i.e. this one: 0ceda08). I didn't check the other generate files commit you made since February but I think it is the same as my fork's here: 722f586 (all I did was run
Yea, I saw that was the case and was a big fan of your work already on the neovim plugin. I just decided to give it a whirl on my side to see if I could pass it into a Julia program too without having to rewrite anything on my end. I was very shocked in an immensely pleasant way to see that it worked mostly out of the box and I think it is a strong testament to your implementation as well as the tree-sitter standard! 😃
Absolutely and can do! I will admit I am absolutely new to tree-sitter and the only way I know how to test things is in the context of my Julia experiments. If there is an easier way to test things or a way that is more helpful to you, please let me know.
That would be neat! Maybe worth opening a separate feature issue to discuss?
Oh that would be great. I hope that would help to catch issues rather than me having to go through some of my build process because it is a bit laborious on my side (looking to see how to fix that from a Julia perspective). All in all, great stuff! Let me know what you think! Thanks again for all the work here! ~ tcp 🌳 |
Whoops, there was a weird git issue. I force pushed main with the GNU license, sorry about that. Indeed, some of the queries are bound to be broken now, as they depend on the specific structure of the parse tree. In contrast to menhir (the ocaml parser generator we use), we can't specify the syntax tree structure on its own and ensure that the parser generates this structure. Rather, the shape that the syntax tree can take on is defined by the grammar itself. I guess it's bad practice to push broken changes when people actually use this code, but I have not tagged a release yet. I'll be more disciplined once 0.1.0 lands. When I came back to this repo after a while and ran
The way testing works is that we add pieces of syntax along with the expected parse tree to this file:
Sure. I hope this is achievable with the grammar itself, the only way I've seen it done is with queries, meaning parsing only accurately enough to know where the foreign code starts and ends, and then telling the editor that this range should actually be highlighted as tex. |
I see it! Awesome stuff -- I'll open up my Julia experiments soon so if you want to try things out, you are welcome to.
No worries! It's kinda anything goes until a first release happens! :D
Oh sweet! Thanks!
I'll open up something later for tracking! |
Hey @kentookura ,
I was wondering, would you be willing to tag a release of tree-sitter-forester? I wanted to try to experiment with tree-sitter-forester in some other tree-sitter pipelines and wanted to have a tagged version (with a tar.gz file of source code) to have as reference.
Thanks!
~ tcp 🌳
The text was updated successfully, but these errors were encountered: