Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading some numbers outputs symbols #4

Open
Bindslev opened this issue Jan 12, 2019 · 1 comment
Open

Reading some numbers outputs symbols #4

Bindslev opened this issue Jan 12, 2019 · 1 comment

Comments

@Bindslev
Copy link

Hello.
The numbers I am trying to capture with your tool can only contain numbers, not letters or symbols. However with the current version it sometimes reads some numbers as symbols instead. I suppose that if it was not looking for symbols, but only numbers, it would have a higher success rate in my case.

Is there a way to specify to only look for numbers (or maybe numbers and letters) and nothing else?

Super great tool!
Thank you.

@JessicaYeh
Copy link

I ran into this same problem. Under the hood this tool is using Tesseract for OCR, so I first tried to modify the command it's running here

Vis2/lib/Vis2.ahk

Lines 2114 to 2116 in 72698e8

_cmd .= q this.tesseract q " --tessdata-dir " q fast q " " q in q " " q SubStr(out, 1, -4) q
_cmd .= (this.language) ? " -l " q this.language q : ""
_cmd := ComSpec " /C " q _cmd q
to include the option -c tessedit_char_whitelist=0123456789 so that only those characters can appear in the output. This had no effect. I googled the problem, and it seems like a common problem that is not fixed yet, but the workaround is to add --oem 0 to use the legacy Tesseract engine; see tesseract-ocr/tesseract#751. Maybe I wasn't putting that in the correct place in the command, but everything I tried just crashed it. At the bottom of the comments in that issue, someone put a link to his repo that contains a trained data file for only digits, and optionally also includes dots, commas, etc. Just download the file you are interested in and drop it into bin/tesseract/tessdata_best. Then when you use the OCR function, add the language parameter, making sure the language matches the name of the file you downloaded. I've been using it for a couple days now and seems to work fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants