Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redundant copies of London Stage PDFs #48

Open
wintere opened this issue Jan 28, 2025 · 0 comments
Open

Redundant copies of London Stage PDFs #48

wintere opened this issue Jan 28, 2025 · 0 comments
Assignees

Comments

@wintere
Copy link
Contributor

wintere commented Jan 28, 2025

The London Stage PDFs

directory size files active
/PDFS/source/ 835 MB 11 no
/images/pdfs/ 1.92 GB 10,455 yes

Under the current repository structure, most pages of the HathiTrust scans of The London stage, 1660-1800... are stored twice in the website repository. The whole PDFs are archived in the /PDFS/source folder, while the individual pages generated by the JavaScript pipeline are stored in /images/pdfs. At serving time, images are retrieved from static copies stored in the /images/pdfs/ folder of the repository.

It is important to archive how and why the individual page images were generated. However, especially given the size of the source files (85% of the recommended maximum size for an active development repository) is not an ideal place to do so.

Proposals:

  1. SHORT-TERM Move the PDFs/source/ folder and the Javascript pipeline to another repository, like the database-code or data. Remove the folder from the repository cache. Expected reduction in repository size: >=835 MB.
  2. LONG-TERM Move the individual PDF pages in /images/ to an agreed upon third-party hosting service and out of this repository, too. Remove the folder from the repository cache. Expected reduction in repository size: >=1.92 GB.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants