The images are first processed by a CNN to extract features, then these extracted features are fed into a LSTM for character recognition.
CNN+LSTM+CTC based OCR(Optical Character Recognition) implemented using tensorflow.
I trained a model with 80k images using this code and got 99.98% accuracy on test dataset (20k images). The images in both dataset:
This project is based on the great work from here
Below improvements are made:
- correct the time step direction
Previously the time step direction is channel, which is incorrect. Now it has been corrected to the width direction. see here for more discussion on this issue. - optimize training scripts
Previously all training images are loaded into memroy, now a simple image generator is used to generate training batch. - metrics implementation implement the character and word accuracy in tensorflow.
please see this issue about dataset, the lable file (a .txt file) is in the same folder with images after extracting .tar.gz file.
-
TensorFlow 1.4
-
Numpy
python ./train_model.py
python ./eval_model.py