You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To make the tool simple we just take the HTML code reddit gives us and pass it on to the EPUB creator library as is. However EPUB only allows XHTML. Because of this there are bound to be various HTML specific thing (e.g. non breaking spaces) that leak into the generated EPUB. Most readers are quite lenient about this but iBooks seems to take the XHTML part seriously.
Ideally we should only emit valid XHTML to make the EPUB as compatible as possible. However this is probably only worth it if it is reasonable simple to implement.
As a workaround we could search & replacing some of the commonly broken things (e.g. nbsp). This obviously is very brittle but might be a decent tradeoff if we can assume the HTML coming out of reddit is restricted to begin with.
Validity of an EPUB can easily be checked by opening it with the Calibre book editor and using Tools->Check Book (F7). There's also https://github.com/IDPF/epubcheck which seems to be quite useful.
The text was updated successfully, but these errors were encountered:
EPUBs require valid XHTML which isn't what we get from reddit (see #2).
For now we solve the problem we actually encounter in the wild and that
is HTML entities (e.g. though reddit also forwards others like
£) slipping into the XHTML. This causes some readers taking XHTML
seriously like iBooks to abort parsing.
To solve this we now replace named entities from HTML4 with their
corresponding numbered variant. Another option would have been to declare
the entities to make them know to XHTML but we cannot easily inject
that into the template.
Obviously just doing this doesn't guarantee valid XHTML by a long shot so
this is just a first step. Should we encounter other issues in the wild
we can consider taking more extensive measures.
W.r.t to HTML to XHTML JavaScript seems to offer us a simple and portable way to do so as explained int https://stackoverflow.com/a/12092919 . This did not properly cope with entities (e.g. turned into \n) and adds a surrounding tag but is probably the way to go if we want actual XHTML.
To make the tool simple we just take the HTML code reddit gives us and pass it on to the EPUB creator library as is. However EPUB only allows XHTML. Because of this there are bound to be various HTML specific thing (e.g. non breaking spaces) that leak into the generated EPUB. Most readers are quite lenient about this but iBooks seems to take the XHTML part seriously.
Ideally we should only emit valid XHTML to make the EPUB as compatible as possible. However this is probably only worth it if it is reasonable simple to implement.
As a workaround we could search & replacing some of the commonly broken things (e.g. nbsp). This obviously is very brittle but might be a decent tradeoff if we can assume the HTML coming out of reddit is restricted to begin with.
Validity of an EPUB can easily be checked by opening it with the Calibre book editor and using Tools->Check Book (F7). There's also https://github.com/IDPF/epubcheck which seems to be quite useful.
The text was updated successfully, but these errors were encountered: