-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mapping fulltext to book images via annotations #29
Comments
Thank you for providing this! I see five numbers for each word when I follow your link. I am used to seeing x, y, w, h. What is the fifth number? |
Not sure! @rchrd2 ? |
Unfortunately, I don't know either. I haven't modified the seach highlighting code. You may need to reverse engineer it a bit using a production book. The code that processes the search results (using the archive.org api, not the archivelabs one) is here https://github.com/internetarchive/bookreader/blob/master/BookReader/plugins/plugin.search.js#L206 |
Does this issue also cover indexing the annotations to make them available in IIIF search? |
Nope -- we expose raw (e.g. OCR) data but don't map it via any search API. Feel free to extend the current service to achieve this. We do / did have an experimental annotations service: But I'm not sure if it's still working. Here is a demo of when it worked: |
Awesome. We'll add this to our backlog now that we have a little more clarity on the issue. Thank you! |
Related to IIIF v3 rewrite underway and specifically #80 |
For a public/unrestricted book (e.g. https://archive.org/details/TheGeometry) one can get the fulltext for each page (with word regions) via the following API:
https://api.archivelab.org/books/<identifier>/pages/<page#>/ocr?mode=words
e.g.
https://api.archivelab.org/books/TheGeometry/pages/10/ocr?mode=words
One can also get the results by paragraph by removing
?mode=words
cc: @num170r
The text was updated successfully, but these errors were encountered: