Designate doc comments #99

resolritter · 2021-01-07T20:01:25Z

closes #88

resolritter · 2021-01-10T16:42:35Z

My first attempted implementation was using regular expressions, but, unfortunately, the examples were not passing (https://travis-ci.org/github/tree-sitter/tree-sitter-rust/builds/753422004#L413). My regular expressions seemingly had trouble capturing the pattern / / \n?; the third character in the sequence is what's decides between a line or doc comment, so I thought it made more sense to use externals for this in order to leverage lexer->lookahead probes.

Therefore, I've reworked the approach to use externals (commented in be26083).

resolritter · 2021-02-03T01:07:50Z

Since doc comments are generally multi-line, I've made the /// contiguous lines be parsed as a group instead of a node per line. That should make it easier to extract the whole block.

It'd be useful to have a node without the leading ///, leaving only the inner content, so that the Markdown parser could be injected... I don't know if it's possible to do that currently.

Luni-4 · 2021-03-24T10:00:05Z

@resolritter

Is there any update on this one?

dodomorandi · 2021-07-03T17:19:49Z

What's the current status on this? Is any help needed?

bestouff · 2021-11-26T15:41:06Z

Ping ?

pickfire · 2021-12-01T06:27:57Z

What about # within doc comments code blocks?

resolritter · 2021-12-01T11:19:55Z

I do not have merge access to this repository. This is not the only PR in review limbo right now.

aryx · 2021-12-01T12:56:26Z

@dcreager is there someone maintaining tree-sitter-rust at github right now?

archseer · 2021-12-01T13:09:41Z

It would be great if the rust grammar had more maintainers (for example tree-sitter-haskell has been doing great since tek took the reigns). Both this, #105 and #85 are PRs I've been looking forward to.

aryx · 2022-01-05T08:53:00Z

@resolritter can you rebase on the latest and I'll approve the PR.

resolritter · 2022-01-08T03:18:32Z

FYI #126

resolritter · 2022-01-08T05:13:37Z

The current implementation is incomplete because Rust requires strictly three slashes for doc comments, any more than that and it turns back to a normal line comment. This is explained in the current documentation for comments: https://doc.rust-lang.org/reference/comments.html#examples.

Moving this to Draft while I try to fix that.

Edit: Should be fixed

pickfire · 2022-01-08T14:12:56Z

corpus/source_files.txt

+============================================
+
+/// Doc
+/// Comment


//! as well.

But I think //!! will become normal comment. The last I worked on is this

https://github.com/mawww/kakoune/blob/871782faaf954cda654cdbdcb3590b45534607d1/rc/filetype/rust.kak#L46-L49

From https://doc.rust-lang.org/reference/comments.html#examples

//! - Inner line doc
//!! - Still an inner line doc (but with a bang at the beginning)

So I implemented it as "any ! makes it a doc comment, no matter how many"

I believe it's only up till four, IIRC it's the same highlight in vim.

In SpaceVim (without LSP) the highlight is different (grey for comments, orange for doc comments) and that was really useful IMHO.

aryx · 2022-01-09T18:27:43Z

need to rebase, because I've merged your other PR. Having those parser.c in the repository is annoying.

resolritter · 2022-01-09T22:07:33Z

Rebased and also added support for //! as per #99 (comment)

Ref: tree-sitter/tree-sitter-rust#99

maxbrunsfeld · 2022-01-10T21:05:37Z

This PR seems to have introduced a bug where the scanner can get into an infinite loop. I'm seeing the Tree-sitter test suite hang after updating tree-sitter-rust to the latest master. I guess the test coverage in this repo wasn't sufficient to protect against this. I'm going to revert this for now.

This reverts commit 67d304c, reversing changes made to 5993f53.

maxbrunsfeld · 2022-01-10T21:25:47Z

It looks like the infinite loop was happening during some randomized mutation of the Doc comments test. Maybe an unterminated doc comment at the end of the file or something?

@resolritter Feel free to do a new PR if you can get this to avoid an infinite loop.

I also have questions about the need to do this using the external scanner. Couldn't you just do this in the grammar?

grammar({
  // ...

  rules: {
    // ...

    doc_comment: $ => token(choice(
      // exactly three leading slashes
      seq('///', optional(/[^/].*/)),
      seq('//!', /.*/),
    )),

    // any number of leading slashes other than three, which would produce a doc comment.
    line_comment: $ => token(seq(
      '//', optional('//'), /.*/
    )),
  }
});

maxbrunsfeld · 2022-01-10T21:28:47Z

Of course, this would not join all of the adjacent doc comments into one continuous node - you'd get one node per comment. I think this might be better though: I think it would make it easier to determine what ranges of text contain the documentation itself, because you wouldn't have to deal with leading whitespace. I also just think that it retains more information to provide a node for each comment, and it's somewhat "lossy" to group them all into one node.

I'm still open to the other approach though, if people have strong feelings that it's more useful to get a single node.

maxbrunsfeld · 2022-01-10T21:31:58Z

Whatever happens, I promise we won't wait a year to merge this time.

resolritter · 2022-01-10T21:55:58Z

@maxbrunsfeld

I also have questions about the need to do this using the external scanner. Couldn't you just do this in the grammar? Of course, this would not join all of the adjacent doc comments into one continuous node - you'd get one node per comment

Having the whole text in a single node is why it was done this way. What would be the alternative for highlighting the code below?

/// ```
/// use foo::Foo;
/// let bar = Foo::new("foo");
/// ```

I can only infer the following steps:

Collect all the consecutive doc comment nodes in the same nesting level
Join their slices from the source code into a buffer
Remove the leading comment markers
Reparse the text as markdown

Having the text in a single node gets rid of steps 1 and 2. Or do you see a more efficient way to go about that? Or do you think having to traverse the tree in order to collect the text is not a problem?

It looks like the infinite loop was happening during some randomized mutation of the Doc comments

How can I try this randomization when testing locally?

resolritter · 2022-01-10T22:18:34Z

src/scanner.c

+          if (started_with_slash == false || lexer->lookahead != '/') {
+            lexer->result_symbol = DOC_COMMENT;
+            while (true) {
+              while (lexer->lookahead != '\n') {


I think at least one of the problems is in this line: it should check for lexer->lookahead != 0 as well

resolritter · 2022-01-10T23:21:00Z

I also just think that it retains more information to provide a node for each comment, and it's somewhat "lossy" to group them all into one node.

This is a good point. It is definitely "lossier" than other nodes since it also includes the leading whitespace for contiguous lines.

What's being gained by this approach is the ease of fetching the whole content directly, at the cost of less precision for the ranges. Feel free to close #128 if you feel like it isn't a good tradeoff.

maxbrunsfeld · 2022-01-10T23:23:03Z

What would be the alternative for highlighting the code below?

I think there are challenges either way, but it is more straightforward if you have a separate node for each comment.

Copying the doc comments' text into a separate buffer is not an option - we need to parse the code in place so that the positions of the nodes in the nested syntax tree correspond correctly to the original file. So what we need to do is to retrieve a list of ranges from the original file that should be parsed, together, in a nested language (markdown). We can then parse the contents of those ranges using Tree-sitter's set_included_ranges API.

If we have a separate node for each comment, then we need to

Find each run of consecutive doc comments. This can be done with a query like this:
```
((doc_comment)+ @doc)
```
Take the ranges of those nodes
Advance the start of those ranges by 3 characters (for /// or //!)

If, on the other hand, we have one giant node, then we need to:

Take the node's range and split it into lines
For each of those lines, re-examine the source code:
- Find the column where the doc comment begins
- Find the end of the line
Generate range between those two positions

resolritter · 2022-01-10T23:38:52Z

I was not aware that (query)+ existed. This seems to work:

> ./node_modules/.bin/tree-sitter query <(echo -e "((line_comment)+ @doc)") <(echo -e "// foo\nfoo;\n// foo\n// foo")
/dev/fd/61
  pattern: 0
    capture: 0 - doc, start: (0, 0), end: (0, 6), text: `// foo`
  pattern: 0
    capture: 0 - doc, start: (2, 0), end: (2, 6), text: `// foo`
    capture: 0 - doc, start: (3, 0), end: (3, 6), text: `// foo`

Since this use-case is supported by the query API, I am fine with closing #128.

aryx · 2022-01-11T08:34:10Z

What are those tree-sitter test suites that catch the regression? Could we run them in the CI of tree-sitter-rust?

Ref: tree-sitter/tree-sitter-rust#99

resolritter force-pushed the doc_comments branch 3 times, most recently from 314b2b8 to 59bf0d3 Compare January 10, 2021 16:35

resolritter force-pushed the doc_comments branch from 59bf0d3 to d25be16 Compare February 3, 2021 00:54

archseer mentioned this pull request Dec 1, 2021

Semantic syntax highlighting helix-editor/helix#1203

Closed

aryx requested review from maxbrunsfeld and dcreager December 1, 2021 12:55

dead10ck mentioned this pull request Dec 27, 2021

:h[elp] command and documentation helix-editor/helix#997

Open

koalp mentioned this pull request Jan 5, 2022

Discriminate doc comments from single line comments #88

Closed

resolritter force-pushed the doc_comments branch from d25be16 to b09ed6e Compare January 8, 2022 03:15

resolritter marked this pull request as draft January 8, 2022 05:13

resolritter force-pushed the doc_comments branch from b09ed6e to 2e6b9b1 Compare January 8, 2022 05:25

resolritter marked this pull request as ready for review January 8, 2022 05:25

pickfire reviewed Jan 8, 2022

View reviewed changes

aryx approved these changes Jan 9, 2022

View reviewed changes

resolritter added 2 commits January 9, 2022 19:04

designate doc comments

d8015ac

regenerate parser

5c30f3d

resolritter force-pushed the doc_comments branch from 2e6b9b1 to 5c30f3d Compare January 9, 2022 22:05

aryx merged commit 67d304c into tree-sitter:master Jan 10, 2022

poliorcetics mentioned this pull request Jan 10, 2022

Update to newest tree-sitter-rust helix-editor/helix#1473

Closed

theHamsta added a commit to theHamsta/nvim-treesitter that referenced this pull request Jan 10, 2022

highlights(rust): support doc_comment

1eda613

Ref: tree-sitter/tree-sitter-rust#99

theHamsta mentioned this pull request Jan 10, 2022

highlights(rust): support doc_comment nvim-treesitter/nvim-treesitter#2232

Closed

maxbrunsfeld added a commit that referenced this pull request Jan 10, 2022

Revert "Merge pull request #99 from resolritter/doc_comments"

eeb0702

This reverts commit 67d304c, reversing changes made to 5993f53.

resolritter commented Jan 10, 2022

View reviewed changes

resolritter mentioned this pull request Jan 10, 2022

Doc comments #2 #128

Closed

theHamsta added a commit to theHamsta/nvim-treesitter that referenced this pull request Jan 11, 2022

highlights(rust): support doc_comment

23bae0b

Ref: tree-sitter/tree-sitter-rust#99

theHamsta added a commit to theHamsta/nvim-treesitter that referenced this pull request Jan 14, 2022

highlights(rust): support doc_comment

c134c16

Ref: tree-sitter/tree-sitter-rust#99

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Designate doc comments #99

Designate doc comments #99

resolritter commented Jan 7, 2021

resolritter commented Jan 10, 2021 •

edited

Loading

resolritter commented Feb 3, 2021

Luni-4 commented Mar 24, 2021

dodomorandi commented Jul 3, 2021

bestouff commented Nov 26, 2021

pickfire commented Dec 1, 2021

resolritter commented Dec 1, 2021

aryx commented Dec 1, 2021

archseer commented Dec 1, 2021

aryx commented Jan 5, 2022

resolritter commented Jan 8, 2022

resolritter commented Jan 8, 2022 •

edited

Loading

pickfire Jan 8, 2022

resolritter Jan 9, 2022

pickfire Jan 10, 2022

bestouff Jan 10, 2022

aryx commented Jan 9, 2022

resolritter commented Jan 9, 2022

maxbrunsfeld commented Jan 10, 2022

maxbrunsfeld commented Jan 10, 2022

maxbrunsfeld commented Jan 10, 2022

maxbrunsfeld commented Jan 10, 2022

resolritter commented Jan 10, 2022

resolritter Jan 10, 2022

resolritter commented Jan 10, 2022

maxbrunsfeld commented Jan 10, 2022

resolritter commented Jan 10, 2022

aryx commented Jan 11, 2022

Designate doc comments #99

Designate doc comments #99

Conversation

resolritter commented Jan 7, 2021

resolritter commented Jan 10, 2021 • edited Loading

resolritter commented Feb 3, 2021

Luni-4 commented Mar 24, 2021

dodomorandi commented Jul 3, 2021

bestouff commented Nov 26, 2021

pickfire commented Dec 1, 2021

resolritter commented Dec 1, 2021

aryx commented Dec 1, 2021

archseer commented Dec 1, 2021

aryx commented Jan 5, 2022

resolritter commented Jan 8, 2022

resolritter commented Jan 8, 2022 • edited Loading

pickfire Jan 8, 2022

Choose a reason for hiding this comment

resolritter Jan 9, 2022

Choose a reason for hiding this comment

pickfire Jan 10, 2022

Choose a reason for hiding this comment

bestouff Jan 10, 2022

Choose a reason for hiding this comment

aryx commented Jan 9, 2022

resolritter commented Jan 9, 2022

maxbrunsfeld commented Jan 10, 2022

maxbrunsfeld commented Jan 10, 2022

maxbrunsfeld commented Jan 10, 2022

maxbrunsfeld commented Jan 10, 2022

resolritter commented Jan 10, 2022

resolritter Jan 10, 2022

Choose a reason for hiding this comment

resolritter commented Jan 10, 2022

maxbrunsfeld commented Jan 10, 2022

resolritter commented Jan 10, 2022

aryx commented Jan 11, 2022

resolritter commented Jan 10, 2021 •

edited

Loading

resolritter commented Jan 8, 2022 •

edited

Loading