daiR is an R package for Google Document AI, a powerful server-based OCR processor. The package provides a wrapper for the Document AI API and comes with additional tools for output file parsing and text reconstruction.
Google Document AI is a paid service that requires a Google Cloud account and a Google Storage bucket. I recommend using Mark Edmondson's googleCloudStorageR
package in combination with daiR
. See vignettes for more on authentication and setup.
daiR
is not yet on CRAN, but you can install the latest development version from Github:
devtools::install_github("hegghammer/daiR")