tassadar_ocr

Tassadar is an OCR service based on tesseract and thrift.

Usage

API

Tassadar provides the following ocr APIs:

get_ocr(1:binary image): accept image binary data, return ocr text result.
line_ocr(1:bianry image): do ocr line by line.
version(): return the current version.
cut_image(1:binary image, 2:i8 cut_type): segment image into components in different levels:

0: block,
1, paragraph,
2, textline,
3, word,
4, symbol,

The default level is 4.

Docker

The recommended way to use tassadar is through docker. You can either choose a pre-build image from docker hub: fshen/tassadar_ocr:latest, or build a new one with dockerfile.

Quick start:

docker pull fshen/tassadar_ocr:latest
docker tag fshen/tassadar_ocr:latest tassadar

# get the ocr result of $WORK_DIR/IMAGE_PATH
docker run -it -d --rm --name ocr -p 9090:9090 -v $WORK_DIR:/app tassadar /root/tassadar_ocr/tassadar_server
docker exec -it ocr python3 -m tassadar_client --input /app/IMAGE_PATH

Build from source

You can also build tassadar from source code. Here are some tips.

First, make sure all the following dependencies are installed:

tesseract >= 4.0
thrift >= 0.11
python3

Default tessdata language in tassadar is chi_sim+eng. If you want to change it, please follow the instruction in tessdata.

# server
git clone https://github.com/shenfei/tassadar_ocr.git
cd tassadar_ocr && make
./tassadar_server --port 9090

# client
pip3 install -e tassadar_ocr/python/
python3 -m tassadar_client -h

After installation, you can start a tassadar server and test ocr in python:

from tassadar_client import TassadarClient

client = TassadarClient(host='localhost', port=9090)
with open(image_path, 'rb') as fin:
    image = fin.read()
print(client.get_ocr(image))

Acknowledgment

The original tassadar project was developed during 2014 to 2016 in Uda Inc., a start-up which was closed in 2016.

I chose the name tassadar because it's similar to tessearct and all project names in Uda were picked from StarCraft at that moment.

Most of the outdated codes are removed, such as preprocessing via OpenCV, single character classification via Caffee, etc. But I still thank those original contributors:

shenfei
Linusp
iwinux
He Neng

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
python		python
src		src
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
tassadar.dockerfile		tassadar.dockerfile
tassadar.thrift		tassadar.thrift

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tassadar_ocr

Usage

API

Docker

Build from source

Acknowledgment

About

Releases

Packages

Languages

License

shenfei/tassadar_ocr

Folders and files

Latest commit

History

Repository files navigation

tassadar_ocr

Usage

API

Docker

Build from source

Acknowledgment

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages