Skip to content

Automatic alignment of books between HathiTrust, Internet Archive, Google Books, etc.

License

Notifications You must be signed in to change notification settings

ryanfb/book-aligner

Folders and files

NameName
Last commit message
Last commit date
Jan 7, 2020
Jan 16, 2018
Mar 31, 2016
Mar 21, 2025
Mar 8, 2016
Jan 16, 2018
Mar 15, 2016
Apr 4, 2018
Jan 17, 2018
Jan 17, 2018
Jan 16, 2018
Oct 26, 2022
Mar 22, 2016

Repository files navigation

book-aligner

This repository is for experimental scripts to align books between HathiTrust, Internet Archive, Google Books, etc.

By "alignment", I mean that for a given volume in one repository, I want to try to find any matching volumes in the other repositories.

Ultimately, I want to be able to mash in a HT/IA/GB/etc. URL or other identifier and get a list of potential matches elsewhere on the web.

Requirements

  • make
  • curl
  • Ruby

Usage

The default make target should download and run everything.

WARNING: this currently produces about 4.3GB of output.

Algorithm

The book-aligner.rb script uses bulk metadata downloads from HathiTrust and the Internet Archive to find the complete set of identifiers that have any matching OCLC/LCCN/ISSN/ISBN identifier (~41M matches). These results are then filtered down to those that have a matching volume number or publication year.

HT/IA/GB Relationship Diagram

Because there's no freely-available bulk metadata download for Google Books, we'll have to rely on the 1.1M associations we get for free from Internet Archive metadata.

The second component of this project is a GitHub Pages HTML frontend which includes a small JavaSript library that queries book-aligner.rb matches loaded into Fusion Tables. The code for this is in js/book-aligner.coffee.

Examples

Some examples of what I want for "matching volumes":

About

Automatic alignment of books between HathiTrust, Internet Archive, Google Books, etc.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published