MindOCR is an open-source toolbox for OCR development and application based on MindSpore. It helps users to train and apply the best text detection and recognition models, such as DBNet/DBNet++ and CRNN/SVTR, to fulfill image-text understanding needs.
Major Features
- Modulation design: We decouple the OCR task into several configurable modules. Users can set up the training and evaluation pipeline easily for customized data and models with a few lines of modification.
- High-performance: MindOCR provides pretrained weights and the used training recipes that reach competitive performance on OCR tasks.
- Low-cost-to-apply: We provide easy-to-use inference tools to perform text detection and recognition tasks.
To install the dependency, please run
pip install -r requirements.txt
Additionally, please install MindSpore(>=1.9) following the official installation instructions for the best fit of your machine.
For distributed training, please install openmpi 4.0.3.
Environment | Version |
---|---|
MindSpore | >=1.9 |
Python | >=3.7 |
Notes:
- If you use MX Engine for Inference, the version of Python should be 3.9.
- If scikit_image cannot be imported, you can use the following command line to set environment variable
$LD_PRELOAD
referring to here. Changepath/to
to your directory.export LD_PRELOAD=path/to/scikit_image.libs/libgomp-d22c30c5.so.1.0.0:$LD_PRELOAD
Coming soon
The latest version of MindOCR can be installed as follows:
pip install git+https://github.com/mindspore-lab/mindocr.git
Notes: MindOCR is only tested on MindSpore>=1.9, Linux on GPU/Ascend devices currently.
We will take DBNet model and ICDAR2015 dataset as an example to illustrate how to configure the training process with a few lines of modification on the yaml file.
Please refer to DBNet readme for detailed instructions.
We will take CRNN model and LMDB dataset as an illustration on how to configure and launch the training process easily.
Detailed instructions can be viewed in CRNN readme.
Note:
The training pipeline is fully extendable. To train other text detection/recognition models on a new dataset, please configure the model architecture (backbone, neck, head) and data pipeline in the yaml file and launch the training script with python tools/train.py -c /path/to/yaml_config
.
MX, which is short for MindX, allows efficient model inference and deployment on Ascend devices.
MindOCR supports OCR model inference with MX Engine. Please refer to mx_infer for detailed illustrations.
Coming soon
Coming soon
Text Recognition
For the detailed performance of the trained models, please refer to configs.
For detailed inference performance using MX engine, please refer to mx inference performance
We give instructions on how to download the following datasets.
Text Detection
-
ICDAR2015 paper homepage download instruction
-
Total-Text paper homepage download instruction
-
Syntext150k paper homepage download instruction
-
MLT2017 paper homepage download instruction
After downloading these datasets in the DATASETS_DIR
folder, you can run bash tools/convert_datasets.sh
to convert all downloaded datasets into the target format. Here is an example of icdar2015 dataset converting.
- 2023/04/12
- Support parameter grouping, which can be configure by the
grouping_strategy
orno_weight_decay_params
arg.
- 2023/03/23
- Add dynamic loss scaler support, compatible with drop overflow update. To enable dynamic loss scaler, please set
type
ofloss_scale
asdynamic
. A YAML example can be viewed inconfigs/rec/crnn/crnn_icdar15.yaml
- 2023/03/20
- Arg names changed:
output_keys
->output_columns
,num_keys_to_net
->num_columns_to_net
- Data pipeline updated
- 2023/03/13
- Add system test and CI workflow.
- Add modelarts adapter to allow training on OpenI platform. To train on OpenI:
i) Create a new training task on the openi cloud platform.
ii) Link the dataset (e.g., ic15_mindocr) on the webpage.
iii) Add run parameter `config` and write the yaml file path on the website UI interface, e.g., '/home/work/user-job-dir/V0001/configs/rec/test.yaml'
iv) Add run parameter `enable_modelarts` and set True on the website UI interface.
v) Fill in other blanks and launch.
- 2023/03/08
- Add evaluation script with arg
ckpt_load_path
- Arg
ckpt_save_dir
is moved fromsystem
totrain
in yaml. - Add drop_overflow_update control
We appreciate all kinds of contributions including issues and PRs to make MindOCR better.
Please refer to CONTRIBUTING.md for the contributing guideline. Please follow the Model Template and Guideline for contributing a model that fits the overall interface :)
This project follows the Apache License 2.0 open-source license.
If you find this project useful in your research, please consider citing:
@misc{MindSpore OCR 2023,
title={{MindSpore OCR }:MindSpore OCR Toolbox},
author={MindSpore Team},
howpublished = {\url{https://github.com/mindspore-lab/mindocr/}},
year={2023}
}