Reading some numbers outputs symbols #4

Bindslev · 2019-01-12T11:39:12Z

Hello.
The numbers I am trying to capture with your tool can only contain numbers, not letters or symbols. However with the current version it sometimes reads some numbers as symbols instead. I suppose that if it was not looking for symbols, but only numbers, it would have a higher success rate in my case.

Is there a way to specify to only look for numbers (or maybe numbers and letters) and nothing else?

Super great tool!
Thank you.

JessicaYeh · 2019-01-24T21:05:14Z

I ran into this same problem. Under the hood this tool is using Tesseract for OCR, so I first tried to modify the command it's running here

Vis2/lib/Vis2.ahk

Lines 2114 to 2116 in 72698e8

    
           _cmd .= q this.tesseract q " --tessdata-dir " q fast q " " q in q " " q SubStr(out, 1, -4) q 
        
           _cmd .= (this.language) ? " -l " q this.language q : "" 
        
           _cmd := ComSpec " /C " q _cmd q

to include the option -c tessedit_char_whitelist=0123456789 so that only those characters can appear in the output. This had no effect. I googled the problem, and it seems like a common problem that is not fixed yet, but the workaround is to add --oem 0 to use the legacy Tesseract engine; see tesseract-ocr/tesseract#751. Maybe I wasn't putting that in the correct place in the command, but everything I tried just crashed it. At the bottom of the comments in that issue, someone put a link to his repo that contains a trained data file for only digits, and optionally also includes dots, commas, etc. Just download the file you are interested in and drop it into bin/tesseract/tessdata_best. Then when you use the OCR function, add the language parameter, making sure the language matches the name of the file you downloaded. I've been using it for a couple days now and seems to work fine.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading some numbers outputs symbols #4

Reading some numbers outputs symbols #4

Bindslev commented Jan 12, 2019

JessicaYeh commented Jan 24, 2019

Reading some numbers outputs symbols #4

Reading some numbers outputs symbols #4

Comments

Bindslev commented Jan 12, 2019

JessicaYeh commented Jan 24, 2019