-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert ebooks storage to the database #336
Comments
I forgot to mention above, but I ran the script in #335 over a hundred or so ebooks, and it picked up the title, URL, and tags just fine. You can skip rebuilding the books and just update the database via: for BOOK in $(find /standardebooks.org/ebooks -maxdepth 1 -type d)
do
tsp nice /standardebooks.org/web/scripts/deploy-ebook-to-www -v --no-build --no-images --no-epubcheck --no-recompose --no-feeds --no-bulk-downloads --update-ebook-database $BOOK
done |
For this project, I don't think we need one very large branch. Since we can do this piecemeal and release these updates as we make them, while still leaving the existing filesystem-based system in place to fill in the blanks, we can do each piece of work as its own PR. |
I think we will probably drop |
Alex, the The If you have a working site from the
And you'll need to use these ALTER TABLE `Tags` ADD COLUMN `UrlName` varchar(255) NULL AFTER `Name`;
ALTER TABLE `Tags` ADD COLUMN `Type` enum('artwork', 'ebook') DEFAULT 'artwork' AFTER `UrlName`;
CREATE INDEX `index2` ON `Tags` (`Type`);
CREATE INDEX `index3` ON `Tags` (`UrlName`); You can test that a single ebook gets added to the DB with: /standardebooks.org/web/scripts/deploy-ebook-to-www --verbose david-lindsay_a-voyage-to-arcturus.git If all the ebooks have already been built, you can bulk update/insert them in the DB with something like this: for BOOK in $(find /standardebooks.org/ebooks -maxdepth 1 -type d)
do
tsp nice /standardebooks.org/web/scripts/deploy-ebook-to-www --verbose --no-build --no-images --no-recompose --no-epubcheck --no-feeds --no-bulk-downloads "$BOOK"
done which takes less than a minute on my machine to loop over the 1000+ books. Two other comments about search and performance: 1. SearchThere are some differences in how search will perform when the site is backed by the DB. Here are three I've noticed so far: Full word vs. word fragmentsSearching for "ovid" on the production site https://standardebooks.org/ebooks?query=ovid returns six results, but the DB search branch will return only these two: (Ovid is mentioned in the ToC (link)) The four other matches on the production site come from matching "providence". Similar story for searching for "grant" on the production site https://standardebooks.org/ebooks?query=grant which returns ten results, but the DB search branch will return only these two: (grant in the ToC (link)) https://standardebooks.org/ebooks/ambrose-bierce/poetry (grant in title and author) https://standardebooks.org/ebooks/ulysses-s-grant/personal-memoirs-of-ulysses-s-grant The other matches on the production site come from ToC entries, tags, or LoC subjects with these words:
The way the `FULLTEXT` index is built, it's not possible to match StopwordsStopwords searching for stopwords like Author sorting with Unicode charactersOn the production site when sorting by author name, Thomas à Kempis and Karel Čapek are at the end: https://standardebooks.org/ebooks?page=22&per-page=48&sort=author-alpha When MySQL sorts by author name, the book by Thomas à Kempis is now first, and the books by Karel Čapek are sorted between these two authors: John W. Campbell I believe this is a slight improvement. 2. PerformancePerformance on my test droplet seemed equivalent to the production site, but I didn't run a full load test. Let me know if I should once we fix other feedback you have about correctness. I had to increase the RAM on the droplet to 1GB to run the build suite and MySQL at the same time, so it was on the second smallest droplet from DigitalOcean. There are a fair number of queries required to render |
Sorting and text fragment improvements sound great. Is there any improvement on searching for Poe? At the moment you get pages of poetry and no E. A. |
Great Mike, I'm going to review this in the coming week. The first question is in But, we call Secondly, why do we care if there are differences? The script emits an error if there are differences - but when does this matter? Thirdly, the script emits an error after already having made changes. This is surprising - I would expect that when an error occurs, we stop processing and no changes are saved, instead of saving changes and then letting the user know that something went wrong. |
Can you also open a draft PR for this entire project, so I can add some inline comments? |
Yep, you got it: #401 I'll respond to your question about |
One preamble before I answer your specific questions about
If there are no bugs, the two During development, there were several times that the check found errors due to mistakes I made, so I found it useful. For example, I would miss a field or write data in the wrong order.
Knowing that there are no differences gives me the confidence to say the site can switch to the DB as the source of truth and users won't notice a difference.
That's a fair criticism. My approach was to answer the question: "Does reading an I can tell by your questions that you're considering removing the check from the script. If so, that's fine with me. No hard feelings. Maybe you'd rather have a second script, e.g., |
OK, so it sounds like it's for debugging, which is fine. We can keep it for now as a transitional debugging tool but once we're fully on the DB we can remove it. I think that if It already does throw a |
I updated (Edit: Added the class names when printing validation exceptions.)
|
It's been live for a few days and everything looks good. Great work Mike! |
This is a large project that may require multiple issues, but we'll start here.
To help with the discussion, I have a draft PR at #335 where I hacked up a schema, script, methods, etc. for
Ebooks
andEbookTags
. No hard feelings if we have to scrap that PR and start a different way.Question:
covers
? I think that would save you review time because you could review the changes periodically, and there would be no chance of breaking something inmaster
.Some notes about draft PR #335:
Ebook
constructor. Eventually we will have to break up that constructor into separate methods. Some of the logic is needed for populating the DB tables, and some is needed for setting request-time properties.Library::FilterEbooks()
,/ebooks/index.php
, or/ebooks/ebook.php
yet. I thought I would populate more fields in the DB tables first.The more I look, the more work I see to do, but we'll chip away at it. One nice thing is that the ebook webpages are all read-only (unlike the artwork site).
The text was updated successfully, but these errors were encountered: