- 2024.10.15
- Completed the initial version of the code, including three modules: object detection, semantic segmentation, and corner direction recognition.
- 2024.11.2
- Added new YOLOv11 object detection models and edge detection models.
- Increased automatic downloading and reduced package size.
- Added ONNX-GPU inference support and provided benchmark test results.
- Added online example usage.
💡✨ RapidTableDetection is a powerful and efficient table detection system that supports various types of tables, including those in papers, journals, magazines, invoices, receipts, and sign-in sheets.
🚀 It supports versions derived from PaddlePaddle and YOLO, with the default model combination requiring only 1.2 seconds for single-image CPU inference, and 0.4 seconds for the smallest ONNX-GPU (V100) combination, or 0.2 seconds for the PaddlePaddle-GPU version.
🛠️ It supports free combination and independent training optimization of three modules, providing ONNX conversion scripts and fine-tuning training solutions.
🌟 The whl package is easy to integrate and use, providing strong support for downstream OCR, table recognition, and data collection.
Refer to the implementation solution of the 2nd place in the Baidu Table Detection Competition, and retrain with a large amount of real-world scenario data.
The training dataset is acknowledged. The author works on open-source projects during spare time, please support by giving a star.
- Document scenarios: No perspective rotation, use only object detection.
- Photography scenarios with small angle rotation (-90~90): Default top-left corner, do not use corner direction recognition.
- Use the online experience to find the suitable model combination for your scenario.
Models will be automatically downloaded, or you can download them from the repository modelscope model warehouse.
pip install rapid-table-det
Default values:
use_cuda: False
: Enable GPU acceleration for inference.obj_model_type="yolo_obj_det"
: Object detection model type.edge_model_type="yolo_edge_det"
: Edge detection model type.cls_model_type="paddle_cls_det"
: Corner direction classification model type.
Since ONNX has limited GPU acceleration, it is still recommended to directly use YOLOX or install PaddlePaddle for faster model execution (I can provide the entire process if needed). The PaddlePaddle S model, due to quantization, actually slows down and reduces accuracy, but significantly reduces model size.
model_type |
Task Type | Training Source | Size | Single Table Inference Time (V100-16G, cuda12, cudnn9, ubuntu) |
---|---|---|---|---|
yolo_obj_det | Table Object Detection | yolo11-l |
100m |
cpu:570ms, gpu:400ms |
paddle_obj_det |
Table Object Detection | paddle yoloe-plus-x |
380m |
cpu:1000ms, gpu:300ms |
paddle_obj_det_s |
Table Object Detection | paddle yoloe-plus-x + quantization |
95m |
cpu:1200ms, gpu:1000ms |
yolo_edge_det | Semantic Segmentation | yolo11-l-segment |
108m |
cpu:570ms, gpu:200ms |
yolo_edge_det_s |
Semantic Segmentation | yolo11-s-segment |
11m |
cpu:260ms, gpu:200ms |
paddle_edge_det |
Semantic Segmentation | paddle-dbnet |
99m |
cpu:1200ms, gpu:120ms |
paddle_edge_det_s |
Semantic Segmentation | paddle-dbnet + quantization |
25m |
cpu:860ms, gpu:760ms |
paddle_cls_det | Direction Classification | paddle pplcnet |
6.5m |
cpu:70ms, gpu:60ms |
Execution parameters:
det_accuracy=0.7
use_obj_det=True
use_edge_det=True
use_cls_det=True
from rapid_table_det.inference import TableDetector
img_path = f"tests/test_files/chip.jpg"
table_det = TableDetector()
result, elapse = table_det(img_path)
obj_det_elapse, edge_elapse, rotate_det_elapse = elapse
print(
f"obj_det_elapse:{obj_det_elapse}, edge_elapse={edge_elapse}, rotate_det_elapse={rotate_det_elapse}"
)
# Output visualization
# import os
# import cv2
# from rapid_table_det.utils.visuallize import img_loader, visuallize, extract_table_img
#
# img = img_loader(img_path)
# img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# file_name_with_ext = os.path.basename(img_path)
# file_name, file_ext = os.path.splitext(file_name_with_ext)
# out_dir = "rapid_table_det/outputs"
# if not os.path.exists(out_dir):
# os.makedirs(out_dir)
# extract_img = img.copy()
# for i, res in enumerate(result):
# box = res["box"]
# lt, rt, rb, lb = res["lt"], res["rt"], res["rb"], res["lb"]
# # With detection box and top-left corner position
# img = visuallize(img, box, lt, rt, rb, lb)
# # Perspective transformation to extract table image
# wrapped_img = extract_table_img(extract_img.copy(), lt, rt, rb, lb)
# cv2.imwrite(f"{out_dir}/{file_name}-extract-{i}.jpg", wrapped_img)
# cv2.imwrite(f"{out_dir}/{file_name}-visualize.jpg", img)
You must download the models and specify their locations!
#(default installation is GPU version, you can override with CPU version paddlepaddle)
pip install rapid-table-det-paddle
from rapid_table_det_paddle.inference import TableDetector
img_path = f"tests/test_files/chip.jpg"
table_det = TableDetector(
obj_model_path="models/obj_det_paddle",
edge_model_path="models/edge_det_paddle",
cls_model_path="models/cls_det_paddle",
use_obj_det=True,
use_edge_det=True,
use_cls_det=True,
)
result, elapse = table_det(img_path)
obj_det_elapse, edge_elapse, rotate_det_elapse = elapse
print(
f"obj_det_elapse:{obj_det_elapse}, edge_elapse={edge_elapse}, rotate_det_elapse={rotate_det_elapse}"
)
# more than one table in one image
# img = img_loader(img_path)
# file_name_with_ext = os.path.basename(img_path)
# file_name, file_ext = os.path.splitext(file_name_with_ext)
# out_dir = "rapid_table_det_paddle/outputs"
# if not os.path.exists(out_dir):
# os.makedirs(out_dir)
# extract_img = img.copy()
# for i, res in enumerate(result):
# box = res["box"]
# lt, rt, rb, lb = res["lt"], res["rt"], res["rb"], res["lb"]
# # With detection box and top-left corner position
# img = visuallize(img, box, lt, rt, rb, lb)
# # Perspective transformation to extract table image
# wrapped_img = extract_table_img(extract_img.copy(), lt, rt, rb, lb)
# cv2.imwrite(f"{out_dir}/{file_name}-extract-{i}.jpg", wrapped_img)
# cv2.imwrite(f"{out_dir}/{file_name}-visualize.jpg", img)
- Q: How to fine-tune the model for specific scenarios?
- A: Refer to this project, which provides detailed visualization steps and datasets. You can get the PaddlePaddle inference model from Baidu Table Detection Competition. For YOLOv11, use the official script, which is simple enough, and convert the data to COCO format for training as per the official guidelines.
- Q: How to export ONNX?
- A: For PaddlePaddle models, use the
onnx_transform.ipynb
file in thetools
directory of this project. For YOLOv11, follow the official method, which can be done in one line.
- A: For PaddlePaddle models, use the
- Q: Can distorted images be corrected?
- A: This project only handles rotation and perspective scenarios for table extraction. For distorted images, you need to correct the distortion first.
- 2nd Place Solution in Baidu Table Detection Competition
- WTW Natural Scene Table Dataset
- FinTabNet PDF Document Table Dataset
- TableBank Table Dataset
- TableGeneration Table Auto-Generation Tool
Pull requests are welcome. For major changes, please open an issue to discuss what you would like to change.
If you have other good suggestions and integration scenarios, the author will actively respond and support them.
This project is licensed under the Apache 2.0 open source license.