Skip to content

[WIP] Extract structured table of contents data from digitized books

License

Notifications You must be signed in to change notification settings

internetarchive/tocky

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tocky

A tool to extract table of contents data from Internet Archive books.

Phases

  • Detector: Responsible for finding the pages that contain the table of contents in the book.
  • Extractor: Given the pages containing the table of contents, this phase is responsible for converting those pages to a structured format.

About

[WIP] Extract structured table of contents data from digitized books

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published