-
-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider a better support of ZIM files without books in HTML #95
Comments
@eshellman This ticket might be of interest for you |
Sure, sounds good but what would the final output look like compared to what we have now? |
@Popolechien Same without the book in HTML directly usable from the browser, in place we would have an info page explaining how to read the EPUB file from the Browsers, mobile, computer, etc... |
@kelson42 Well, if the idea is to save some space, how about offering either Gutenberg epubs or Gutenberg HTML and hope that people would know the difference? Much like we have Wikipedia with or without images, in a way. |
@Popolechien This might be done, can be already done, but this is not the point of the ticket which is about providing a better UX without HTML. Buy maybe you just want to say "I don't think we need that: we should provide one with HTML and one with EPUB and people can only have one or the either and live with that." |
Yes. |
@Popolechien To me this would be a fallback solution. But I believe we might be able to solve the problem properly. We could be able to solve (2) in an even better manner by using a pure javascript EPUB reader (so for the end user) it would be a similar experience as having the HTML in the ZIM file. We could for example use https://github.com/futurepress/epub.js/ |
only tricky thing deploying epub.js is overcoming same-origin javascript issues, but you probably are experienced with that
… On Nov 14, 2019, at 11:00 AM, Kelson ***@***.***> wrote:
@Popolechien <https://github.com/Popolechien> To me this would be a fallback solution. But I believe we might be able to solve the problem properly.
We could be able to solve (2) in an even better manner by using a pure javascript EPUB reader (so for the end user) it would be a similar experience as having the HTML in the ZIM file. We could for example use https://github.com/futurepress/epub.js/ <https://github.com/futurepress/epub.js/>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#95?email_source=notifications&email_token=AAHCGMKVWBMDWPFBYGF7GCDQTVYYBA5CNFSM4JMDYFJKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEECKPAY#issuecomment-553953155>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAHCGMKP2IMQALSTQ7XIDM3QTVYYBANCNFSM4JMDYFJA>.
|
This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions. |
Once #136 implemented, we should be able to implement this ticket. The scraper would download the EPUB, parse it to extra the key words for the search engine. Epub.js should be able to make the EPUB directly readable in the ZIM (to best tested). |
The most difficult part here is the one that's not been mentioned: the UI. With our generic UI that What does entries look like? An html shell that displays epub.js on size 100%? Should it include a link/button to download/open the epub should you have an external epub reader? I believe the search topic deserves its own ticket. openzim/libzim#289 seems like a wrong solution to the problem. We don't want libzim to index epub. If libzim does it, then search results would point to the .epub entry and not to our epub.js shell… If we want to index the shell, then we need the libzim NOT to index .epub ones, otherwise we'll double index size We'd need a scraper-level epub parser (and html, and pdf). Actually we could already (when also including HTML) build indexdata on the cover article and disable libzim one on the HTML book so that search points to the cover and not the HTML itself. Now one issue would be that books are very long and epub (and PDF) are paginated. If you're searching for an expression, is it acceptable to just link to the book cover? In a WP article, it's single page so despite being cumbersome, you can easily In epub.js there is no search-in-book feature (yet??) so if you were not looking for a book but for an extract, it's gonna be useless… and I believe finding books is not what fulltext index is about (home page search does it probably better) |
I risk sounding like a broken record, but please remember users with older browsers and OS's, as well as those with restrictive CSPs. HTML is a universal way to access content that is supported everywhere (at least, static HTML). While it's fine if we can include a system in the ZIM to convert EPUB or PDF content to accessible (and searchable) HTML, we would need to be sure that such readers run under old browsers and restrictive CSPs. Otherwise you risk making ZIMs even more inaccessible than they already are. Even a modern Chrome extension can't access the current dynamic UI due to its use of inline JS (#145), and that is only going to get worse with the stricter CSPs in manifest v3 extensions also: kiwix/kiwix-js#755. So, I agree with the caution expressed by @rgaudin, but for slightly different reasons. |
For those who do not yet knows about it, integrating an epub and a pdf reader has already been done for kolibri scraper. There is even a download button for those who prefer to use another reader. Other questions regarding resulting UI and the creation of multiple ZIMs (all, epub_only, html_only, pdf_only) are still relevant |
I think we should maybe consider a better support of ZIM files without HTML. The reasons are:
Currently I see two big reasons to keep the HTML versions:
1 - Full text engine applying to HTML only
2 - Ability to directly see the content
These two things might be fixed with:
1 - Support ability to fulltext index EPUBs (relatively easy) see openzim/libzim#289
2 - Providing readers for multiple platforms within the ZIM... even maybe a pure Web Epub reader?
The text was updated successfully, but these errors were encountered: