Skip to content
This repository has been archived by the owner on Jul 4, 2023. It is now read-only.

gojiplus/captr

Repository files navigation

captr: R Client for the Captricity API

Build Status Build status CRAN_Status_Badge Coverage Status Research software impact Github Stars

OCR text and handwritten forms using Captricity. Captricity's big advantage over Abbyy Cloud OCR is that it allows the user to easily specify the position of text-blocks that want to OCR; they have a simple web-based UI. The quality of the OCR can be checked using compare_txt from recognize.

Installation

To get the latest version on CRAN:

install.packages("captr")

To get the current development version from GitHub:

install.packages("devtools")
devtools::install_github("soodoku/captr", build_vignettes = TRUE)

Using captr

Read the vignette:

vignette("using_captr", package = "captr")

or follow the overview below.

Start by getting an application token and setting it using:

set_token("token")

Then, create a batch using:

create_batch("batch_name")

Once you have created a batch, you need to get the template ID (it tells Captricity what data to pull from where). Captricity requires a template. These templates can be created using the Web UI.

set_template_id("id")

Next, assign the template ID to a batch:

set_batch_template("batch_id", "template_id")

Next, upload image(s) to a batch

upload_image(batch_id="batch_id", path_to_image="image_path")

Next, check whether the batch is ready to be processed:

test_readiness(batch_id="batch_id")

You may also want to find out how much would processing the batch set you back by:

batch_price(batch_id="batch_id")

Once you are ready, submit the batch:

submit_batch(batch_id="batch_id")

Captricity excels in nomenclature confusion. So once a batch is submitted, it is then called a job. The id for the job can be obtained from the list that is returned from submit_batch. The field name is related_job_id.

To track progress of a job, use:

track_progress(job_id ="job_id")

List all forms (instance sets) associated with a job:

list_instance_sets(job_id="job_id")

If you want to download data from a particular form, use the list_instance_sets to get the form (instance_set) id and run:

get_instance_set(instance_set_id="instance_set_id")

Get csv of all your results from a job:

get_all(job_id="job_id")

License

Scripts are released under the MIT License.

Contributor Code of Conduct

The project welcomes contributions from everyone! In fact, it depends on it. To maintain this welcoming atmosphere, and to collaborate in a fun and productive way, we expect contributors to the project to abide by the Contributor Code of Conduct.