Skip to content

Commit 7f445fa

Browse files
committed
add matplotlib
1 parent 6e56d1a commit 7f445fa

File tree

3 files changed

+31
-0
lines changed

3 files changed

+31
-0
lines changed

comparison.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
|Deep |Link |Demo |Notebook |Deep?|Reads image?|Detectron?|OCR included?|Seems to work |get pandas df? |get text?|get image?|throughput (cpu)|
2+
|----------------------------------|---------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------|-----|------------|----------|-------------|------------------------------------|-----------------|---------|----------|----------------|
3+
|nougat |[github](https://github.com/facebookresearch/nougat) | |[Nougat eval](https://colab.research.google.com/drive/1B4agm6hwR-Ia-5AduEU-y7DteNAOxRhX) ||| ||✓✓ |latex table (mmd)|||~330 s/page |
4+
|gmft |[github](https://github.com/conjuncts/gmft) | |[gmft eval](https://colab.research.google.com/drive/1fEqsTdKcO5RNPV_b2v9cB4Y5We9Kv-hR) ||| ||✓✓ ||||~1.867 s/page |
5+
|img2table |[github](https://github.com/xavctn/img2table) | |[img2table eval](https://colab.research.google.com/drive/1_TD2U0JsaW0SqmuCUv7iSbAyJwvRuq_C) ||| ||✓✓ ||||~1.45 s/page |
6+
|unstructured |[docs.unstructured.io](https://docs.unstructured.io/examplecode/codesamples/apioss/table-extraction-from-pdf) | |[Unstructured eval](https://colab.research.google.com/drive/1k8IpVqyCW8DUZ8psRxHPCQSnE3XZBuOd) ||||||✓ (html -> df) ||? |~15.35 s/page |
7+
|open-parse (unitable) |[github](https://github.com/Filimoa/open-parse) |[openparse_quickstart.ipynb](https://colab.research.google.com/drive/1Z5B5gsnmhFKEFL-5yYIcoox7-jQao8Ep) |[open-parse eval](https://colab.research.google.com/drive/18r_0vfxbD-RsCqIcQE3Lo_nF4Lsh_z2s?ouid=110924231912857331758)||| | ||✓ (html -> df) ||✓ (custom)|~126 s/page |
8+
|open-parse (tatr) |[github](https://github.com/Filimoa/open-parse) | |[open-parse eval](https://colab.research.google.com/drive/18r_0vfxbD-RsCqIcQE3Lo_nF4Lsh_z2s?ouid=110924231912857331758)||| | ||✓ (html -> df) ||✓ (custom)|~4.992 s/page |
9+
|open-parse (pymupdf) |[github](https://github.com/Filimoa/open-parse) | |[open-parse eval](https://colab.research.google.com/drive/18r_0vfxbD-RsCqIcQE3Lo_nF4Lsh_z2s?ouid=110924231912857331758)||| | || | |✓ (custom)|~0.67 s/page |
10+
|deepdoctection, tatr |[github](https://github.com/deepdoctection/deepdoctection) | |[deepdoctection tatr eval](https://colab.research.google.com/drive/19c7uMC0Ya2tfZw1r2itstmuX2wxun86L) |||||✗ needs config | | |? |~58s per page |
11+
|surya |[github](https://github.com/VikParuchuri/surya) | |[surya eval](https://colab.research.google.com/drive/1LUqEIiiGt0EDK3jrypWQJKrrXW3nA9ty?usp=drive_link) ||| ||||||~60.679 s/page |
12+
|paddleocr |[github](https://github.com/PaddlePaddle/PaddleOCR/blob/main/README_en.md) | |https://medium.com/@malshanCS/automating-table-data-extraction-tools-and-techniques-for-efficiency-a29df313cbda#629d ||| | |? | | | | |
13+
|alibaba/omniparser |[github](https://github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/OCR/OmniParser) | | ||| | |? | | | | |
14+
|alibaba/DocXChain |[github](https://github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/Applications/DocXChain) | | ||| | |? | | | | |
15+
|layoutparser (no commit in 2 yrs?)|[github](https://github.com/Layout-Parser/layout-parser/blob/main/examples/OCR%20Tables%20and%20Parse%20the%20Output.ipynb)|https://github.com/Layout-Parser/layout-parser/blob/main/examples/OCR%20Tables%20and%20Parse%20the%20Output.ipynb| |||| |unmaintained | | | | |
16+
| | | | | | | | | | | | | |
17+
|doctr (not tbl focused) |[github](https://github.com/mindee/doctr) |https://huggingface.co/spaces/mindee/doctr | ||| | |N/A |N/A | | | |
18+
| | | | | | | | | | | | | |
19+
|Non-deep | | | | | | | | | | | | |
20+
|camelot |[github](https://github.com/camelot-dev/camelot) | |[camelot eval](https://colab.research.google.com/drive/1ORQPURWJuLvTOeboU0-t4Xg9t6iqTIPO) || | | |✓ many false positives, needs config|||possible |~1.82 s/page |
21+
|pdfplumber |[github](https://github.com/jsvine/pdfplumber) | |[pdfplumber eval](https://colab.research.google.com/drive/1DUmd_Sjzhp4ZrltxvXV0-F3fiBQhE8a6) || | | |✗ or needs config | | |possible |~0.273 s/page |
22+
|pymupdf |[github](https://github.com/pymupdf/PyMuPDF) | |[pymupdf eval](https://colab.research.google.com/drive/1ZBrAwrfOgDewXhyfDl5xN7mbGUM4idhW) || | | |✗ or needs config | | |possible |~0.250 s/page |
23+
|pdfminer |[github](https://github.com/pdfminer/pdfminer.six) | | || | | | | | | | |
24+
|Proprietary | | | | | | | | | | | | |
25+
|mathpix | | | || | | || | | | |
26+
|Adobe Sensei |[developer.adobe.com](https://developer.adobe.com/document-services/apis/pdf-extract/) | | || | | || | | | |
27+
|AWS TextExtract | | | || | | || | | | |
28+
|Azure Document Intelligence |[azure.microsoft.com](https://azure.microsoft.com/en-us/pricing/details/ai-document-intelligence/) | | || | | || | | | |
29+
|Google Document AI |[cloud.google.com](https://cloud.google.com/document-ai?hl=en) | | || | | || | | | |

pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ dependencies = [
2020
"timm",
2121
"pillow",
2222
"pandas",
23+
"matplotlib",
2324

2425
]
2526

requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
pypdfium2
22
transformers[torch]
33
timm
4+
matplotlib
45
pillow
56
pandas

0 commit comments

Comments
 (0)