Easily OCR images, barcodes, forms, documents with machine readable zones, e.g. passports, right from R. Get the results in a wide variety of formats, from text files to detailed XMLs with information about bounding boxes, etc.
The package provides access to the Abbyy Cloud OCR SDK API. Details about results of calls to the API can be found here.
To get the latest version on CRAN:
install.packages("abbyyR")
To get the current development version from GitHub:
# install.packages("devtools")
devtools::install_github("soodoku/abbyyR", build_vignettes = TRUE)
To get acquainted with some of the important functions, read the vignettes:
# Overview of the package
vignette("introduction", package = "abbyyR")
# some functions are used along with output
vignette("example", package = "abbyyR")
# how to scrape text from a folder of images
vignette("wiscads", package = "abbyyR")
The final output quality varies by complexity of the layout to resolution to font face etc. To measure the final quality of ocr, you can measure the edit distance to `gold standard' coded sample using recognize. To do quick edit distance based search and replace to fix messy data, you can use turbo search and replace.
Scripts are released under the MIT License.
The project welcomes contributions from everyone! In fact, it depends on it. To maintain this welcoming atmosphere, and to collaborate in a fun and productive way, we expect contributors to the project to abide by the Contributor Code of Conduct.