Mapping fulltext to book images via annotations #29

mekarpeles · 2018-06-08T22:03:03Z

For a public/unrestricted book (e.g. https://archive.org/details/TheGeometry) one can get the fulltext for each page (with word regions) via the following API:

https://api.archivelab.org/books/<identifier>/pages/<page#>/ocr?mode=words

e.g.
https://api.archivelab.org/books/TheGeometry/pages/10/ocr?mode=words

One can also get the results by paragraph by removing ?mode=words

cc: @num170r

The text was updated successfully, but these errors were encountered:

jcmundy · 2019-03-11T20:32:20Z

Thank you for providing this! I see five numbers for each word when I follow your link. I am used to seeing x, y, w, h. What is the fifth number?

mekarpeles · 2019-03-11T20:40:44Z

Not sure! @rchrd2 ?

rchrd2 · 2019-03-12T05:28:58Z

Unfortunately, I don't know either. I haven't modified the seach highlighting code. You may need to reverse engineer it a bit using a production book.

The code that processes the search results (using the archive.org api, not the archivelabs one) is here https://github.com/internetarchive/bookreader/blob/master/BookReader/plugins/plugin.search.js#L206

amandelman · 2020-09-10T11:10:09Z

Does this issue also cover indexing the annotations to make them available in IIIF search?

mekarpeles · 2020-09-10T22:25:15Z

Nope -- we expose raw (e.g. OCR) data but don't map it via any search API. Feel free to extend the current service to achieve this.

We do / did have an experimental annotations service:
https://pragma.archivelab.org/
https://github.com/archivelabs/pragma.archivelab.org

But I'm not sure if it's still working.

Here is a demo of when it worked:
https://www.youtube.com/watch?v=FtcajyRQnqM

amandelman · 2020-09-11T07:52:31Z

Awesome. We'll add this to our backlog now that we have a little more clarity on the issue. Thank you!

hadro · 2023-03-17T02:10:46Z

Related to IIIF v3 rewrite underway and specifically #80

mekarpeles added mirador fulltext annotations labels Jun 8, 2018

glenrobson added this to the IIIF v3 Update - First steps milestone Mar 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mapping fulltext to book images via annotations #29

Mapping fulltext to book images via annotations #29

mekarpeles commented Jun 8, 2018 •

edited

Loading

jcmundy commented Mar 11, 2019 •

edited

Loading

mekarpeles commented Mar 11, 2019

rchrd2 commented Mar 12, 2019

amandelman commented Sep 10, 2020

mekarpeles commented Sep 10, 2020

amandelman commented Sep 11, 2020

hadro commented Mar 17, 2023

Mapping fulltext to book images via annotations #29

Mapping fulltext to book images via annotations #29

Comments

mekarpeles commented Jun 8, 2018 • edited Loading

jcmundy commented Mar 11, 2019 • edited Loading

mekarpeles commented Mar 11, 2019

rchrd2 commented Mar 12, 2019

amandelman commented Sep 10, 2020

mekarpeles commented Sep 10, 2020

amandelman commented Sep 11, 2020

hadro commented Mar 17, 2023

mekarpeles commented Jun 8, 2018 •

edited

Loading

jcmundy commented Mar 11, 2019 •

edited

Loading