HTMLarkdown is a HTML-to-Markdown converter that's able to output HTML-syntax when required.
Like when center-aligning, or resizing images:
- Written completely in TypeScript.
- Has many Jest tests, covering many edge-case conversions.
Leave a issue/PR if you can think of more!
- For now, is designed for GFM.
- Try it out at the demo site below!
https://evitanrelta.github.io/htmlarkdown
Whenever elements cannot be represented in markdown-syntax, HTMLarkdown will switch to HTML-syntax:
Input HTML | Output Markdown |
---|---|
<h1>Normal-heading is <strong>boring</strong></h1>
<h1 align="center">
Centered-heading is <strong>da wae</strong>
</h1>
<p><img src="https://image.src" /></p>
<p><img width="80%" src="https://image.src" /></p> |
# Normal-heading is **boring**
<h1 align="center">
Centered-heading is <b>da wae</b>
</h1>
![](https://image.src)
<img width="80%" src="https://image.src" /> |
Note: The HTML-switching is controlled by the rules'
Rule.toUseHtmlPredicate
.
But HTMLarkdown tries to use as little HTML-syntax as possible. Mixing markdown and HTML if needed:
Input HTML | Output Markdown |
---|---|
<blockquote>
<p align="center">
Centered-paragraph
</p>
<p>Below is a horizontal-rule in blockquote:</p>
<hr>
</blockquote> |
> <p align="center">
> Centered-paragraph
> </p>
> Below is a horizontal-rule in blockquote:
>
> <hr> |
Depending on the situation, HTMLarkdown will switch between markdown's backslash-escaping or HTML-escaping:
Input HTML | Output Markdown |
---|---|
<!-- In markdown -->
<p><TAG>, **NOT BOLD**</p>
<!-- In in-line HTML -->
<p>
<sup><TAG>, **NOT BOLD**</sup>
</p>
<!-- In block HTML -->
<p align="center">
<TAG>, **NOT BOLD**
</p> |
\<TAG>, \*\*NOT BOLD\*\*
<sup>\<TAG>, \*\*NOT BOLD\*\*</sup>
<p align="center">
<TAG>, **NOT BOLD**
</p> |
Adding separators in-between adjacent lists to prevent them from being combined by markdown-renderers:
Input HTML | Output Markdown |
---|---|
<ul>
<li>List 1 > item 1</li>
<li>List 1 > item 2</li>
</ul>
<ul>
<li>List 2 > item 1</li>
<li>List 2 > item 2</li>
</ul> |
- List 1 > item 1
- List 1 > item 2
<!-- LIST_SEPARATOR -->
- List 2 > item 1
- List 2 > item 2 |
And more!
But this section is getting too long so...
npm install htmlarkdown
import { HTMLarkdown } from 'htmlarkdown'
/** Convert an element! */
const htmlarkdown = new HTMLarkdown()
const container = document.getElementById('container')
console.log(container.outerHTML)
// => '<div id="container"><h1>Heading</h1></div>'
htmlarkdown.convert(container)
// => '# Heading'
/**
* Or a HTML string!
* Whichever u prefer. It's 2022, I don't judge :^)
*/
const htmlString = `
<h1>Heading</h1>
<p>Paragraph</p>
`
const htmlStrWithContainer = `<div>${htmlString}</div>`
htmlarkdown.convert(htmlString)
// Set 2nd param 'hasContainer' to true, for container-wrapped string.
htmlarkdown.convert(htmlStrWithContainer, true)
// Both output => '# Heading\n\nParagraph'
Note: If an element is given to
convert
, it's deep-cloned before any processing/conversion.
Thus, you don't have to worry about it mutating the original element :)
/** Configure when creating an instance. */
const htmlarkdown = new HTMLarkdown({
htmlEscapingMode: '&<>',
maxPrettyTableWidth: Number.POSITIVE_INFINITY,
addTrailingLinebreak: true
})
/** Or on an existing instance. */
htmlarkdown.options.maxPrettyTableWidth = -1
Plugins are of type (htmlarkdown: HTMLarkdown): void
.
They take in a HTMLarkdown
instance and configure it by mutating it.
There's 2 plugin-options available in the options
object: preloadPlugins
and plugins
.
The difference is:
preloadPlugins
loads the plugins first, before your other options. (likes "presets")
Allowing you to overwrite the plugins' changes:const enableTrailingLinebreak: Plugin = (htmlarkdown) => { htmlarkdown.options.addTrailingLinebreak = true } const htmlarkdown = new HTMLarkdown({ addTrailingLinebreak: false, preloadPlugins: [enableTrailingLinebreak], }) htmlarkdown.options.preloadPlugins // false
plugins
loads the plugins after your other options.
Meaning, plugins can overwrite your options.const enableTrailingLinebreak: Plugin = (htmlarkdown) => { htmlarkdown.options.addTrailingLinebreak = true } const htmlarkdown = new HTMLarkdown({ addTrailingLinebreak: false, plugins: [enableTrailingLinebreak], }) htmlarkdown.options.preloadPlugins // true
You can also load plugins on existing instances:
htmlarkdown.loadPlugins([myPlugin])
The conversion of a HTMLarkdown
instance solely depends on its options
property.
Meaning, you create a copy of an instance like this:
const htmlarkdown = new HTMLarkdown()
const copy = new HTMLarkdown(htmlarkdown.options)
See this section for info on what the rules/processes do.
/**
* Overwriting default rules/processes.
* (does NOT include the defaults)
*/
const htmlarkdown = new HTMLarkdown({
preProcesses: [myPreProcess1, myPreProcess2],
rules: [myRule1, myRule2],
textProcesses: [myTextProcess1, myTextProcess2],
postProcesses: [myPostProcess1, myPostProcess2]
})
/**
* Adding on to default rules/processes.
* (includes the defaults)
*/
const htmlarkdown = new HTMLarkdown()
htmlarkdown.addPreProcess(myPreProcess)
htmlarkdown.addRule(myRule)
htmlarkdown.addTextProcess(myTextProcess)
htmlarkdown.addPostProcess(myPostProcess)
HTMLarkdown has 3 distinct phases:
-
Pre-processing
The container-element that's received (and deep-cloned) by theconvert
method is passed consecutively to eachPreProcess
inoptions.preProcesses
. -
Conversion
The pre-processed container-element is then recursively converted to markdown.
Elements are converted byRule
inoptions.rules
.
Text-nodes are converted byTextProcess
inoptions.textProcesses
.
The rule/text-process outputs strings are then appended to each other, to give the raw markdown. -
Post-processing
The raw markdown string is then passed consecutively to eachPostProcess
inoptions.postProcess
, to give the final markdown.
(image: the general conversion flow of HTMLarkdown)
HTMLarkdown is still under-development, so there'll likely be bugs.
So the easiest way to contribute is submit an issue (with the bug
label), especially for any incorrect markdown-conversions :)
For any incorrect markdown-conversions, state the:
- input HTML
- current incorrect markdown output
- expected markdown output
If you have any new elements-conversions / ideas / features / tests that you think should be added, leave an issue with feature
or improve
label!
feature
label is for new featuresimprove
label is for improvements on existing featuresUnderstandably, there are gray areas on what is a "feature" and what is an "improvement". So just go with whichever seems more appropriate :)
Currently, HTMLarkdown has been designed to output markdown for GitHub specifically (ie. GFM).
BUT, if there's another markdown spec. that you'd like to design for (maybe as a plugin?), do leave an issue/discussion :D
Code-formatting is handled by Prettier, so no need to worry bout it :)
Any new feature should
- be documented via TSDoc
- come with new unit-tests for them
- and should pass all new/existing tests
As for which merging method to use, check out the discussion.
So far it's just me, so pls send help! :^)
If you've any new ideas / features, check out the Contributing section for it!
- Headings (For now, only ATX-style)
- Paragraph
- Codeblock
- Blockquote
- Lists
(ordered, unordered, tight and loose) - (GFM) Table
- (GFM) Task-list
(Below are some planned block-elements that don't have markdown-equivalent) -
<span>
(handled by a noop-rule) -
<div>
(For now, handled by a noop-rule) - Definition list (ie.
<dl>
,<dt>
,<dd>
) - Collapsible section (ie.
<details>
)
- Bold (For now, only outputs in asterisks
**BOLD**
) - Italic (For now, only outputs in asterisks
*ITALIC*
) - (GFM)
Strikethrough -
Code
- Link (For now, only inline links)
- Superscript (ie.
<sup>
) - Subscript (ie.
<sub>
) - Underline (ie.
<u>
,<ins>
)
(didn't know underlines possible till recently)
Misc:
- Images (For now, only inline links)
- Horizontal-rule (ie.
<hr>
) - Linebreaks (ie.
<brr>
) - Preserved HTML comments (Issue #25)
(eg.
<!-- COMMENT -->
)
Features to be added:
- Custom
id
attributesGo to [section with id](#my-section) <p id="my-section"> My section </p>
- Reversing GitHub's Issue/PR autolinks
Input HTML Output Markdown <p> Issue autolink: <a href="https://github.com/user/repo/issues/7">#7</a> </p>
Issue autolink: #7
- Ability to customise how codeblock's syntax-highlighting langauge is obtained from the
<pre><code>
elements
noop-rule
:
They only pass-on their converted inner-contents to their parents.
They themselves don't have any markdown conversions, not even in HTML-syntax.
The MIT License (MIT).
So it's freeeeeee