orguli is a small tool that takes a markdown file with an optional style sheet and outputs html. It does half of what fully-fledged markdown parsers do with a tenth of the size (single ~500loc C file).
For single-file documents:
$ orguli -s README.md style.css > README.html
For part of a larger document (to concatenate/process/pipe content):
#! /bin/bash
# Processes and converts all md files to html
for file in *.md
do
htmlfile=${file%.md}.html
cat header.html >> $htmlfile
process-file $file | orguli >> $htmlfile
cat footer.html >> $htmlfile
done
To pipe content to orguli and specify a stylesheet at the same time, specify
/dev/stdin
as the input file:
$ cat README.md | orguli /dev/stdin style.css | sed 's/http:/https:/g' > out.html
I use orguli to render README files in my private git repositories by passing orguli to cgit.
Run ./build.sh
and then sudo install -m755 orguli /usr/local/bin/.
- No parser does what I want. Most of them have weird ways of behaving, which are not documented and only defined in code.
- Rules too complicated. Parsing malformed Markdown is surprizingly complicated and most people don't write complicated things with it, especially when it comes to README.md files.
- Extending the other tools is hard compared to a single C99 file. I've found that it's faster to just extend orguli than learn the quirks of some large parser though trial and error. orguli has it's quirks as well, but they're easier to learn and avoid.
- Other parsers are meant for untrusted use, thus adding even more code and
complications to parsing. orguli has no security at all and is only to be used
by trusted users. When I just want a simple
md2html
binary for my website, I don't care about XSS attacks. I can probably trust myself not to pwn me.
orguli supports most features defined in CommonMark and the extended syntax. I chose the parts which most people know and use, mostly in their READMEs.
Due to the design, anything that can't be determined with a single-line lookahead will forever be unsupported. This means orguli can never support nesting deeper than 2 levels.
- heading ids
- emoji
- task lists
- markdown tables (html
<table>
s already work normally) - definition lists
- footnotes
- reflinks
These features can be implemented with the help of simple preprocessors that convert unsupported markup into markup that orguli understands or just html which orguli will skip over.
For example, given the input file input.md
:
This is [my reflink][my-reflink].
[my-reflink]: https://example.com
we can build a pipeline
$ process-reflinks input.md | orguli > output.html
where process-reflinks
reads input.md
completely and outputs
This is [my reflink](https://example.com)
or
This is <a href="https://example.com">my reflink</a>
orguli would handle both variants easily, converting the first one and skipping the html in the second.
Writing and maintaining small preprocessors like these for whichever feature one wants is easier than modifying programs even as small as orguli.
The only extra feature not present in markdown is the @filename
specifier,
which embeds images and text directly. This is extremely useful for single-file
documents.
orguli is based on rxi's doq, which now seems to be abandoned. Here is a list of the changes I introduced.
-
support for nested lists 2 levels deep
-
support for numbered lists and mixed nesting
-
support for list items starting with
-
and+
-
support for auto-detecting and linking
http(s)://
-
support for
<inline links like this>
-
support for
![inline images](image_link)]
-
support for
[![images nested in links](img_src)](link_href)
-
support for nested fenced code blocks
-
support for
<pre><code> code blocks </code></pre>
with edge2 -
single line lookahead (to support h1 and h2 with
====
and h2----
) -
-s|--single-file
option, which outputs<head>
and others. When omitted, output without<head>
-
-h|--help
option -
support for autoclosing tags; drop inline flags at edges of headers, lists, and blocks
-
more efficient string functions in place of strstr()
-
support for
**
(strong) and__
(em). -
fix for false em, strike, and strong detection on separate chars:
2 * 10
will no longer turn into2 <em> 10
, but2 *10
will correctly turn into2 <em>10
-
support for
<hr>
(---
and___
for<hr>
,===
for<hr class="thick">
) -
support for inline html, including
<br>
tags -
comments where necessary
-
.markdown-body
class to main<div>
or<body>
-
support for h4 and h5
-
support for indented code blocks
this will be turned into code (not on github)
Just like
this fenced code block
- dependency on
stdbool.h
- static scope specifiers from all functions
- automatic escape on all text; If you want something escaped, use
backslashes.
For example,\<
turns into<
This library is free software; you can redistribute is and/or modify it under the terms of the MIT license. See LICENSE for details.