Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Improve positioning of symbol bounding boxes #3787

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Commits on Apr 10, 2022

  1. Configuration menu
    Copy the full SHA
    dbb2adb View commit details
    Browse the repository at this point in the history

Commits on May 15, 2022

  1. Improved character position tracking when LSTM models are used

    When using LSTM models the accuracy of character bounding boxes is low
    with many blobs assigned to wrong characters. This is caused by the fact
    that LSTM model output produces only approximate character positions
    without boundary data. As a result the input blobs cannot be accurately
    mapped to characters and which compromises the accuracy of character
    bounding boxes.
    
    Current this problem is solved as follows. The character boundaries are
    computed according to the character positions from the LSTM output by
    placing the boundaries at the middle between two character positions.
    The blobs are then assigned according to which character the center of
    the blob falls to. In other words the blobs are assigned to the nearest
    characters.
    
    This unfortunately produces a lot of errors because the character
    positions in the LSTM output have a tendency to drift, thus the nearest
    character is often not the right one.
    
    Fortunately while the LSTM model produces approximate positions, the
    blob boundaries produced by the regular segmenter are pretty good. Most
    of the time a single blob corresponds to a single character and
    vice-versa.
    
    The above is used to create an optimization algorithm that treats the
    output of the regular segmenter as a template to which LSTM model output
    is matched. The selection of best match is done by assigning each
    unwanted property of the outcome a cost and then minimizing the total
    cost of the solution.
    
    This reliably solves the most frequent error present in the current
    solution when blobs are simply assigned to wrong character. As a result
    the current algorithm produces up to 20 times less errors.
    
    Fixes tesseract-ocr#1712.
    p12tic committed May 15, 2022
    Configuration menu
    Copy the full SHA
    51a3398 View commit details
    Browse the repository at this point in the history