Web OCR

About

A web interface for OCR of multiple languages. It uses tesseract ocr in the backend for processing. Currently, pdf files and image files are supported. Also, only english and hindi are currently added but other languages can be supported trivially

Main Features

Simple web based gui for OCR.
Supports both pdf and image inputs
High quality ocr using LSTM based version of Tesseract OCR
Sends mail on job completion

Requirements

Tesseract OCR
OCRmyPDF
img2pdf
Sendgrid (for sending email)
AWS S3 (for storing resulting pdfs)

Installation

Install OCRmyPDF
Clone this repository and do npm install to install required packages
Export the following environment variables:

S3_ACCESS_KEY          # AWS S3 access key
S3_SECRET_KEY          # AWS S3 secret key
SENDGRID_USER       # Sendgrid api id
SENDGRID_PASS       # Sendgrid api key
PORT                # Port to run webserver on
S3_BUCKET_NAME      # E.g. webocr
S3_ENDPOINT         # E.g. https://s3.ap-south-1.amazonaws.com
S3_REGION           # E.g. ap-south-1

Usage

npm start to run the webserver.

Docker usage

docker run -d -e "PORT=<port>" -e "ACCESS_KEY=<access_key>" \
              -e "SECRET_KEY=<secret_key>" -e "SENDGRID_USER=<user>" \
              -e "SENDGRID_PASS=<key>" -e "CONFIG=<user db>" \
              -v <volume dir>:/home/docker/app/uploads ozym4nd145/ocr_web

OR

docker run -d --env-file <env.list file> -p 3000:3000 -v <volume dir>:/home/docker/app/uploads ozym4nd145/ocr_web

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
bin		bin
mailers		mailers
public		public
routes		routes
views		views
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.js		app.js
docker_wrapper.sh		docker_wrapper.sh
package-lock.json		package-lock.json
package.json		package.json
plan.md		plan.md
sample_command.txt		sample_command.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web OCR

About

Main Features

Requirements

Installation

Usage

Docker usage

About

Releases

Packages

Languages

License

ozym4nd145/ocr_web

Folders and files

Latest commit

History

Repository files navigation

Web OCR

About

Main Features

Requirements

Installation

Usage

Docker usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages