Configuration of dataset to train the ocr model on custom dataset. #13290

RRThivyan · 2024-07-08T03:59:34Z

RRThivyan
Jul 8, 2024

Hello community,
I am trying to read a hand written text from an image. The base model is able to read text that are very well readable, but not well for cursive writings. I have a dataset to train the model, but getting an error in configuration level. I am not sure if the dataset i created is incorrect or any other issue. Kindly help me in this. Also kindly let me know where can I find the materials for the finetuing if possible.

The labels I created are as follows
gs://iam_words_images/images/a01-000u-00-00.png A
gs://iam_words_images/images/a01-000u-00-01.png MOVE

the path of the image and the text present in the image. Do I have to include any further details apart from this?

This is the command i execute to run the train.py script to start the finetuning.

python3 /home/thivyan/PaddleOCR/tools/train.py -c /home/thivyan/PaddleOCR/configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml -o Global.pretrained_model= /home/thivyan/en_PP-OCRv3_rec_train/best_accuracy

The error i am getting is as follows. It says some details is missing in the config file, which I don't know.

Traceback (most recent call last):
File "/home/thivyan/PaddleOCR/tools/train.py", line 252, in
config, device, logger, vdl_writer = program.preprocess(is_train=True)
File "/home/thivyan/PaddleOCR/tools/program.py", line 712, in preprocess
FLAGS = ArgsParser().parse_args()
File "/home/thivyan/PaddleOCR/tools/program.py", line 58, in parse_args
args.opt = self._parse_opt(args.opt)
File "/home/thivyan/PaddleOCR/tools/program.py", line 67, in _parse_opt
k, v = s.split("=")
ValueError: not enough values to unpack (expected 2, got 1)

Below is the yaml file i created for this purpose.

Global:
debug: false
use_gpu: true
epoch_num: 500
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/v3_en_mobile
save_epoch_step: 3
eval_batch_step: [0, 2000]
cal_metric_during_train: true
pretrained_model:
checkpoints:
save_inference_dir:
use_visualdl: false
infer_img: doc/imgs_words/ch/word_1.jpg
character_dict_path: ppocr/utils/en_dict.txt
max_text_length: &max_text_length 25
infer_mode: false
use_space_char: true
distributed: true
save_res_path: gs://iam_words_images/ocr_output/rec/predicts_ppocrv3_en.txt

Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Cosine
learning_rate: 0.001
warmup_epoch: 5
regularizer:
name: L2
factor: 3.0e-05

Architecture:
model_type: rec
algorithm: SVTR_LCNet
Transform:
Backbone:
name: MobileNetV1Enhance
scale: 0.5
last_conv_stride: [1, 2]
last_pool_type: avg
last_pool_kernel_size: [2, 2]
Head:
name: MultiHead
head_list:
- CTCHead:
Neck:
name: svtr
dims: 64
depth: 2
hidden_dims: 120
use_guide: True
Head:
fc_decay: 0.00001
- SARHead:
enc_dim: 512
max_text_length: *max_text_length

Loss:
name: MultiLoss
loss_config_list:
- CTCLoss:
- SARLoss:

PostProcess:
name: CTCLabelDecode

Metric:
name: RecMetric
main_indicator: acc
ignore_space: False

Train:
dataset:
name: SimpleDataSet
data_dir: gs://iam_words_images/train_set
ext_op_transform_idx: 1
label_file_list:
- gs://iam_words_images/train.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- RecConAug:
prob: 0.5
ext_data_num: 2
image_shape: [48, 320, 3]
max_text_length: *max_text_length
- RecAug:
- MultiLabelEncode:
- RecResizeImg:
image_shape: [3, 48, 320]
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_sar
- length
- valid_ratio
loader:
shuffle: true
batch_size_per_card: 128
drop_last: true
num_workers: 4
Eval:
dataset:
name: SimpleDataSet
data_dir: gs://iam_words_images/val_set
label_file_list:
- gs://iam_words_images/val.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- MultiLabelEncode:
- RecResizeImg:
image_shape: [3, 48, 320]
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_sar
- length
- valid_ratio
loader:
shuffle: false
drop_last: false
batch_size_per_card: 128
num_workers: 4

GreatV · 2024-07-08T04:58:56Z

GreatV
Jul 8, 2024
Maintainer

Try removing the space near the =

python3 /home/thivyan/PaddleOCR/tools/train.py -c /home/thivyan/PaddleOCR/configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml -o Global.pretrained_model=/home/thivyan/en_PP-OCRv3_rec_train/best_accuracy

6 replies

GreatV Jul 8, 2024
Maintainer

you may need update your paddle version: https://www.paddlepaddle.org.cn/en/install/quick?docurl=/documentation/docs/en/develop/install/pip/windows-pip_en.html, and your paddleocr version

RRThivyan Jul 16, 2024
Author

Thanks a lot. with your assistance, I have successfully fine tuned and created a inference model which contains all 3 pdi files. After this what is the step to do further. Where to save this inference model and how to install this new model? Any documentation for this step?

GreatV Jul 16, 2024
Maintainer

please refer to: https://github.com/PaddlePaddle/PaddleOCR/blob/main/doc/doc_en/inference_ppocr_en.md#python-inference-for-pp-ocr-model-zoo

RRThivyan Jul 16, 2024
Author

Hi, currently i am working on capturing handwritten text from an image. the following is the text captured by default paddleocr model.

[2024/07/16 12:58:17] ppocr DEBUG: dt_boxes num : 22, elapsed : 0.5964698791503906
[2024/07/16 12:58:18] ppocr DEBUG: cls num : 22, elapsed : 0.1217045783996582
[2024/07/16 12:58:19] ppocr DEBUG: rec_res num : 22, elapsed : 1.6230945587158203
Cequua
Address: Wet Rimto
lmakatieity
Age9
M
Date:12-03
Sex:
Rx
(xomrpl
amoxirillin tomg
fe
dan
Seen
loilaay
Physician's Sig
Lic.No
123457
1234567
PTRNo
S2 No

I followed the same instructions mentioned in the link provided but its giving only one letter prediction instead of all text present in the image. the inference_model folder contains all 3 pdi files.

!python3 /content/drive/MyDrive/Verbalyze/IAM_Handwritten_dataset/PaddleOCR/tools/infer/predict_rec.py --image_dir='/content/img1.jpg' --rec_model_dir='/content/drive/MyDrive/Verbalyze/IAM_Handwritten_dataset/PaddleOCR/pretrain_models/inference_model/' --rec_char_dict_path="/content/drive/MyDrive/Verbalyze/IAM_Handwritten_dataset/PaddleOCR/ppocr/utils/en_dict.txt"

[2024/07/16 12:51:53] ppocr WARNING: The first GPU is used for inference by default, GPU ID: 0
[2024/07/16 12:51:55] ppocr INFO: In PP-OCRv3, rec_image_shape parameter defaults to '3, 48, 320', if you are using recognition model with PP-OCRv2 or an older version, please set --rec_image_shape='3,32,320
[2024/07/16 12:51:57] ppocr INFO: Predicts of /content/img1.jpg:('T', 0.9447541236877441)

RRThivyan Jul 23, 2024
Author

Hello, I am trying to use the inference model in the following manner, but the kernel is getting crashed. Although, if I used pretrained inference model, its working fine.

MohamedLahmeri01 · 2025-02-03T20:45:38Z

MohamedLahmeri01
Feb 3, 2025

@RRThivyan , @GreatV , i have the same issue , where the image folder say that is not found even though they are within the PaddleOCR folder. could you show me how did you setup you dataset and the multi-language file ? i use the following : "image_name label ".

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configuration of dataset to train the ocr model on custom dataset. #13290

{{title}}

Replies: 2 comments 6 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Configuration of dataset to train the ocr model on custom dataset. #13290

RRThivyan Jul 8, 2024

Replies: 2 comments · 6 replies

GreatV Jul 8, 2024 Maintainer

GreatV Jul 8, 2024 Maintainer

RRThivyan Jul 16, 2024 Author

GreatV Jul 16, 2024 Maintainer

RRThivyan Jul 16, 2024 Author

RRThivyan Jul 23, 2024 Author

MohamedLahmeri01 Feb 3, 2025

RRThivyan
Jul 8, 2024

Replies: 2 comments 6 replies

GreatV
Jul 8, 2024
Maintainer

GreatV Jul 8, 2024
Maintainer

RRThivyan Jul 16, 2024
Author

GreatV Jul 16, 2024
Maintainer

RRThivyan Jul 16, 2024
Author

RRThivyan Jul 23, 2024
Author

MohamedLahmeri01
Feb 3, 2025