Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trailing whitespace in labels is elided #22

Open
4 tasks done
andymatuschak opened this issue Dec 21, 2023 · 8 comments
Open
4 tasks done

Trailing whitespace in labels is elided #22

andymatuschak opened this issue Dec 21, 2023 · 8 comments
Labels
🤞 phase/open Post is being triaged manually

Comments

@andymatuschak
Copy link

Initial checklist

Affected packages and versions

[email protected] [email protected] [email protected]

Link to runnable example

https://astexplorer.net/#/gist/b3ff9dc85d8e49ef94791c73e645646f/21c9658bf3272b1922fb07d889351d8482c1c552

Steps to reproduce

In this AST Explorer demo, observe the parse tree for :redact[secret ]word.

(The same issue reproduces on my local machine using node@20, bun, macOS 14.1.2, and no build tools).

Expected behavior

I expect the textDirective node's child text node to have value secret (with the trailing space).

Actual behavior

The textDirective node's child text node has value secret (without the trailing space). The trailing space isn't represented anywhere else in the parse tree.

Runtime

Other (please specify in steps to reproduce)

Package manager

Other (please specify in steps to reproduce)

OS

Other (please specify in steps to reproduce)

Build and bundle tools

Other (please specify in steps to reproduce)

@github-actions github-actions bot added 👋 phase/new Post is being triaged automatically 🤞 phase/open Post is being triaged manually and removed 👋 phase/new Post is being triaged automatically labels Dec 21, 2023
@andymatuschak
Copy link
Author

andymatuschak commented Dec 21, 2023

I spent some time digging into this and trying to produce a fix. As far as I can tell, what's going on is:

  • Directive label contents are treated as a nested tokenization task by emitting a chunkText token surrounding their interior.
  • micromark:subtokenize eventually transforms chunkText token into a data token which still contains the trailing whitespace.
  • The final step of the subtokenization process calls micromark:resolveAllLineSuffixes, which splits the trailing whitespace out of the data token into a lineSuffix token.
  • mdast-util-from-markdown compiles the contents of the data token into the text node's value, but ignores the lineSuffix contents.

Assuming you consider this behavior to be a bug, I'm not sure whether you'd consider the defect to be in the micromark behavior (this nested content is not, in fact, at the end of a line) or in the compiler behavior (perhaps the lineSuffix should be emitted in this nested context?).

PS: Thank you for all your hard work!

@wooorm
Copy link
Member

wooorm commented Dec 21, 2023

Thanks for the investigation and your kind words!

resolveAllLineSuffixes exists because trailing spaces on a line are not “emitted”/“rendered” a -> <p>a</p>. This seems like a bug there as in :x[y ]z , the space after y should not be a “line suffix” but the one after z should be.

@wooorm
Copy link
Member

wooorm commented Dec 21, 2023

I’m not 100% this is a bug. Given a -> <p>a</p>, # b -> <h1>b</h1>, # c # -> <h1>c</h1>. Why would this here be different?

Why do you want trailing whitespace?

@andymatuschak
Copy link
Author

Thank you for digging in! It’s a fair question, and I don’t think it’s totally obvious that trailing whitespace should be preserved.

I’m hoping to use the inline text directive to produce behavior somewhat like other inline text directives—links, inline code, emphasis. Of those, the first two preserve trailing whitespace; the last does not, AFAICT to avoid spurious parse situations. In my mind, these directive labels are very congruent to link labels. Perhaps leaf and container directive labels are less obviously analogous to link labels than the text directive labels.

More concretely, I’m experimenting with text directives to create a “redact” markup for a flashcard system, and I imagined an interface where one can drag the handles of the redaction left to right across the text, character by character. It’s freeing to be able to drop the handle wherever, and feels weird if it “jumps” when I release it because the underlying representation can’t place the right edge in certain positions. I can make this work without actual syntactic support for the trailing space scenario, but it felt unintentional (given the line suffix token when it’s not actually at the end of a line) so I thought I’d write a bug.

Thanks for considering!

@wooorm
Copy link
Member

wooorm commented Dec 21, 2023

Links and emphasis are the same.

 *b 
c* 

 [b 
c](#) 

https://spec.commonmark.org/dingus/?text=%20*b%20%0Ac*%20%0A%0A%20%5Bb%20%0Ac%5D(%23)%20%0A

The thing with them though, is that they are parsed as separate things: *, *, [, ](). Everything goes from left to right.
So it’s the paragraph/heading parent, the content type (text), that deals with the trailing whitespace in the entire thing.

With content in the [ and ] of directives, it’s parsed separately. It’s as if it was its own paragraph or heading. Because it could be! That’s how directives work (also the leaf / container). You currently choose to use the content inside a paragraph (which I get). But it could be say a separate tooltip. It could be nice for folks to be able to pad with whitespace

But there are two things here: a) initial/trailing when looking at the whole, b) initial/trailing when looking at a line ending.

I assume you don’t see a reason for “keeping” the initial/trailing whitespace for :x[y \n z]a.
And that it doesn’t matter for leaf/container, as in, ::x[ Yyy zzz. ].

So if this would be implemented, it should be a) only for text directives, b) not affect whitespace around line endings.

Note: you can use a character reference btw: :x[y&#32;]z

@andymatuschak
Copy link
Author

But there are two things here: a) initial/trailing when looking at the whole, b) initial/trailing when looking at a line ending.

I assume you don’t see a reason for “keeping” the initial/trailing whitespace for :x[y \n z]a.

Ah, great point. No, in a document like this, you're right that I would expect the leading/trailing line whitespace to be stripped:

:x[y 
 z]a

And that it doesn’t matter for leaf/container, as in, ::x[ Yyy zzz. ].

Right. I agree with your argument that stripping whitespace here matches the behavior of other flow-level nodes.

So if this would be implemented, it should be a) only for text directives, b) not affect whitespace around line endings.

Right. I guess I'd expect the lineSuffix resolution behavior when the whitespace in question is in fact a line suffix. (And likewise for prefixes)

Note: you can use a character reference btw: :x[y ]z

Thanks!

@wooorm
Copy link
Member

wooorm commented Jan 10, 2024

lineSuffix

Without a final end-of-file end-of-line, it’s still the end of the line (a vs a \n). As this whole thing is parsed separately, it’s the start of the thing and end of the thing, even through there’s no \n. But these are internals, the terms don’t mean much.


I remain unsure whichever is better. Current state or proposed state. I can see arguments for both.

@andymatuschak
Copy link
Author

Fair enough! :) Thanks for your consideration.

One more thing I wanted to mention. You mention "Links and emphasis are the same."—and in your example, they are. I'm sure you know this, but I wanted to clarify that I was referring to the behavior of links when their label doesn't involve a line ending; i.e. [a ](#) does parse to a link containing a text node with value a . It's in this sense that I was hoping :redact[a ] would behave. Likewise for `a `. Whereas *a * doesn't parse to an emphasis at all, because of the flanking rules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤞 phase/open Post is being triaged manually
Development

No branches or pull requests

2 participants