Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Designate doc comments #99

Merged
merged 2 commits into from
Jan 10, 2022
Merged

Designate doc comments #99

merged 2 commits into from
Jan 10, 2022

Conversation

resolritter
Copy link
Contributor

closes #88

@resolritter resolritter force-pushed the doc_comments branch 3 times, most recently from 314b2b8 to 59bf0d3 Compare January 10, 2021 16:35
@resolritter
Copy link
Contributor Author

resolritter commented Jan 10, 2021

My first attempted implementation was using regular expressions, but, unfortunately, the examples were not passing (https://travis-ci.org/github/tree-sitter/tree-sitter-rust/builds/753422004#L413). My regular expressions seemingly had trouble capturing the pattern / / \n?; the third character in the sequence is what's decides between a line or doc comment, so I thought it made more sense to use externals for this in order to leverage lexer->lookahead probes.

Therefore, I've reworked the approach to use externals (commented in be26083).

@resolritter
Copy link
Contributor Author

Since doc comments are generally multi-line, I've made the /// contiguous lines be parsed as a group instead of a node per line. That should make it easier to extract the whole block.

It'd be useful to have a node without the leading ///, leaving only the inner content, so that the Markdown parser could be injected... I don't know if it's possible to do that currently.

@Luni-4
Copy link
Contributor

Luni-4 commented Mar 24, 2021

@resolritter

Is there any update on this one?

@dodomorandi
Copy link

What's the current status on this? Is any help needed?

@bestouff
Copy link

Ping ?

@pickfire
Copy link

pickfire commented Dec 1, 2021

What about # within doc comments code blocks?

@resolritter
Copy link
Contributor Author

I do not have merge access to this repository. This is not the only PR in review limbo right now.

@aryx aryx requested review from maxbrunsfeld and dcreager December 1, 2021 12:55
@aryx
Copy link
Contributor

aryx commented Dec 1, 2021

@dcreager is there someone maintaining tree-sitter-rust at github right now?

@archseer
Copy link

archseer commented Dec 1, 2021

It would be great if the rust grammar had more maintainers (for example tree-sitter-haskell has been doing great since tek took the reigns). Both this, #105 and #85 are PRs I've been looking forward to.

@aryx
Copy link
Contributor

aryx commented Jan 5, 2022

@resolritter can you rebase on the latest and I'll approve the PR.

@resolritter
Copy link
Contributor Author

FYI #126

@resolritter
Copy link
Contributor Author

resolritter commented Jan 8, 2022

The current implementation is incomplete because Rust requires strictly three slashes for doc comments, any more than that and it turns back to a normal line comment. This is explained in the current documentation for comments: https://doc.rust-lang.org/reference/comments.html#examples.

Moving this to Draft while I try to fix that.

Edit: Should be fixed

@resolritter resolritter marked this pull request as draft January 8, 2022 05:13
@resolritter resolritter marked this pull request as ready for review January 8, 2022 05:25
============================================

/// Doc
/// Comment
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

//! as well.

But I think //!! will become normal comment. The last I worked on is this

https://github.com/mawww/kakoune/blob/871782faaf954cda654cdbdcb3590b45534607d1/rc/filetype/rust.kak#L46-L49

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From https://doc.rust-lang.org/reference/comments.html#examples

//! - Inner line doc
//!! - Still an inner line doc (but with a bang at the beginning)

So I implemented it as "any ! makes it a doc comment, no matter how many"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's only up till four, IIRC it's the same highlight in vim.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In SpaceVim (without LSP) the highlight is different (grey for comments, orange for doc comments) and that was really useful IMHO.

@aryx
Copy link
Contributor

aryx commented Jan 9, 2022

need to rebase, because I've merged your other PR. Having those parser.c in the repository is annoying.

@resolritter
Copy link
Contributor Author

Rebased and also added support for //! as per #99 (comment)

@maxbrunsfeld
Copy link
Contributor

This PR seems to have introduced a bug where the scanner can get into an infinite loop. I'm seeing the Tree-sitter test suite hang after updating tree-sitter-rust to the latest master. I guess the test coverage in this repo wasn't sufficient to protect against this. I'm going to revert this for now.

maxbrunsfeld added a commit that referenced this pull request Jan 10, 2022
This reverts commit 67d304c, reversing
changes made to 5993f53.
@maxbrunsfeld
Copy link
Contributor

It looks like the infinite loop was happening during some randomized mutation of the Doc comments test. Maybe an unterminated doc comment at the end of the file or something?

@resolritter Feel free to do a new PR if you can get this to avoid an infinite loop.

I also have questions about the need to do this using the external scanner. Couldn't you just do this in the grammar?

grammar({
  // ...

  rules: {
    // ...

    doc_comment: $ => token(choice(
      // exactly three leading slashes
      seq('///', optional(/[^/].*/)),
      seq('//!', /.*/),
    )),

    // any number of leading slashes other than three, which would produce a doc comment.
    line_comment: $ => token(seq(
      '//', optional('//'), /.*/
    )),
  }
});

@maxbrunsfeld
Copy link
Contributor

Of course, this would not join all of the adjacent doc comments into one continuous node - you'd get one node per comment. I think this might be better though: I think it would make it easier to determine what ranges of text contain the documentation itself, because you wouldn't have to deal with leading whitespace. I also just think that it retains more information to provide a node for each comment, and it's somewhat "lossy" to group them all into one node.

I'm still open to the other approach though, if people have strong feelings that it's more useful to get a single node.

@maxbrunsfeld
Copy link
Contributor

Whatever happens, I promise we won't wait a year to merge this time.

@resolritter
Copy link
Contributor Author

@maxbrunsfeld

I also have questions about the need to do this using the external scanner. Couldn't you just do this in the grammar? Of course, this would not join all of the adjacent doc comments into one continuous node - you'd get one node per comment

Having the whole text in a single node is why it was done this way. What would be the alternative for highlighting the code below?

/// ```
/// use foo::Foo;
/// let bar = Foo::new("foo");
/// ```

I can only infer the following steps:

  1. Collect all the consecutive doc comment nodes in the same nesting level
  2. Join their slices from the source code into a buffer
  3. Remove the leading comment markers
  4. Reparse the text as markdown

Having the text in a single node gets rid of steps 1 and 2. Or do you see a more efficient way to go about that? Or do you think having to traverse the tree in order to collect the text is not a problem?

It looks like the infinite loop was happening during some randomized mutation of the Doc comments

How can I try this randomization when testing locally?

if (started_with_slash == false || lexer->lookahead != '/') {
lexer->result_symbol = DOC_COMMENT;
while (true) {
while (lexer->lookahead != '\n') {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at least one of the problems is in this line: it should check for lexer->lookahead != 0 as well

@resolritter resolritter mentioned this pull request Jan 10, 2022
@resolritter
Copy link
Contributor Author

I also just think that it retains more information to provide a node for each comment, and it's somewhat "lossy" to group them all into one node.

This is a good point. It is definitely "lossier" than other nodes since it also includes the leading whitespace for contiguous lines.

What's being gained by this approach is the ease of fetching the whole content directly, at the cost of less precision for the ranges. Feel free to close #128 if you feel like it isn't a good tradeoff.

@maxbrunsfeld
Copy link
Contributor

What would be the alternative for highlighting the code below?

I think there are challenges either way, but it is more straightforward if you have a separate node for each comment.

Copying the doc comments' text into a separate buffer is not an option - we need to parse the code in place so that the positions of the nodes in the nested syntax tree correspond correctly to the original file. So what we need to do is to retrieve a list of ranges from the original file that should be parsed, together, in a nested language (markdown). We can then parse the contents of those ranges using Tree-sitter's set_included_ranges API.

If we have a separate node for each comment, then we need to

  • Find each run of consecutive doc comments. This can be done with a query like this:
    ((doc_comment)+ @doc)
  • Take the ranges of those nodes
  • Advance the start of those ranges by 3 characters (for /// or //!)

If, on the other hand, we have one giant node, then we need to:

  • Take the node's range and split it into lines
  • For each of those lines, re-examine the source code:
    • Find the column where the doc comment begins
    • Find the end of the line
  • Generate range between those two positions

@resolritter
Copy link
Contributor Author

I was not aware that (query)+ existed. This seems to work:

> ./node_modules/.bin/tree-sitter query <(echo -e "((line_comment)+ @doc)") <(echo -e "// foo\nfoo;\n// foo\n// foo")
/dev/fd/61
  pattern: 0
    capture: 0 - doc, start: (0, 0), end: (0, 6), text: `// foo`
  pattern: 0
    capture: 0 - doc, start: (2, 0), end: (2, 6), text: `// foo`
    capture: 0 - doc, start: (3, 0), end: (3, 6), text: `// foo`

Since this use-case is supported by the query API, I am fine with closing #128.

@aryx
Copy link
Contributor

aryx commented Jan 11, 2022

What are those tree-sitter test suites that catch the regression? Could we run them in the CI of tree-sitter-rust?

theHamsta added a commit to theHamsta/nvim-treesitter that referenced this pull request Jan 11, 2022
theHamsta added a commit to theHamsta/nvim-treesitter that referenced this pull request Jan 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Discriminate doc comments from single line comments
8 participants