-
Hi all! I've been using these tools to turn the HTML output from a tiptap-based editor into Markdown. In particular, I've been seeing weird behavior when the editor document ends in a line break:
The commonmark.org dingus says that the output of 4. is the Markdown that means the text "honk\" (five characters, the last of which is a backslash).
Clearly I don't want the Markdown output to have a stray backslash character at the end when the editor has no backslashes in it. The problem seems to be that it's OK to have a paragraph that ends in a line break in HTML, but in CommonMark it is not. In 3. Thank you! Here's the code I used to test: const html = "<p>honk<br /></p>";
const hast = unified().use(rehypeParse).parse(html);
const mdast = toMdast(hast); // from "hast-util-to-mdast"
const md = unified().use(remarkStringify).stringify(mdast);
const check = unified().use(remarkParse).parse(md); There's a codesandbox.io doc with the above, and here's a screenshot of the console output (with some of the AST fields stripped out for readability): |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hi there 👋 Yeah, it’s valid HTML. I think this would be something for I think it would be better to just remove them, instead of the alternative of injecting some non-breaking invisible character after them. Anyway, probably a |
Beta Was this translation helpful? Give feedback.
Hi there 👋
Yeah, it’s valid HTML. I think this would be something for
hast-util-to-mdast
.The same could be done with headings:
<h1>x<br></h1>
I think it would be better to just remove them, instead of the alternative of injecting some non-breaking invisible character after them.
Anyway, probably a
filter-trailing-breaks
or so utility in hast-util-to-mdast used for both headings andf paragraphs?