This project focuses on extracting information from images and saving it in a JSON key-value pair format.
Ensure you have the following dependencies installed:
- PyTorch
- torchvision
This project requires a handwritten dataset. You can use the dataset example in handwritten-layoutlmv3/dataset/
. Follow these steps if you want create and label your dataset:
- Collect handwritten samples for your dataset.
- Install and set up Label Studio.
- Import your collected samples into Label Studio.
- Label the samples according to your project requirements.
Ensure the dataset is properly labeled and saved in a format compatible with the OCR models used in this project.
- Clone this repository.
- Download the model and place it in the appropriate folder (Dowload Model).
- Run the following command to install the necessary dependencies:
pip install -r requirements.txt
Note: Make sure to install PyTorch and torchvision before running pip install.
Run python convert_anno.py
first to convert the previous annotation format to the appropriate format.
Run python src/main.py
for training. Make sure the number of classes matches the annotation and the model architecture.
Run python src/inference.py
to perform inference. Adjust the image path and classes before running and comment out the loss function on the trainer to prevent errors during forward propagation.
While this project has demonstrated promising results, there are a few limitations to note:
- The bounding box predictions from the trained model may not always be accurate. This could lead to errors in text detection and subsequently in the recognition and extraction of information.
- The extraction of information into a JSON key-value pair format currently relies on manual logic. This may not be robust to variations in the data and could limit the scalability of the project.
These limitations present opportunities for future improvements and refinements to the project.