Improving TrOCR performance on LaTeX recognition tasks #266

only-yao · 2023-04-21T06:03:55Z

only-yao
Apr 21, 2023

I have experimented with the trcor-base model using around 5 million LaTeX samples, but found that the performance still lags behind Mathpix. I am looking for suggestions and potential improvements to the model. Here are some optimization approaches I have already tried:

Rendering with different fonts
Adding noise to the LaTeX images
Modifying the convolutional kernel in TrOCR's image preprocessing
I have encountered some bad cases in the current model, and I would like to know if there are other strategies that could be employed to improve the model's performance on LaTeX recognition tasks.

Any suggestions or insights would be greatly appreciated.

lukas-blecher · 2023-04-21T12:19:12Z

lukas-blecher
Apr 21, 2023
Maintainer

The TrOCR resizes all images to the same size. From a quick search it looks like a square with side length 384.
That's pretty low res and doesn't really match most formulas that have a skewed aspect ratio.
I could imagine that's why there is a dot instead of a minus

If you're looking for more augmentations that could help you, see pix2tex/dataset/transforms.py

LaTeX-OCR/pix2tex/dataset/transforms.py

Lines 4 to 21 in dd847d2

    
           train_transform = alb.Compose( 
        
               [ 
        
                   alb.Compose( 
        
                       [alb.ShiftScaleRotate(shift_limit=0, scale_limit=(-.15, 0), rotate_limit=1, border_mode=0, interpolation=3, 
        
                                             value=[255, 255, 255], p=1), 
        
                        alb.GridDistortion(distort_limit=0.1, border_mode=0, interpolation=3, value=[255, 255, 255], p=.5)], p=.15), 
        
                   # alb.InvertImg(p=.15), 
        
                   alb.RGBShift(r_shift_limit=15, g_shift_limit=15, 
        
                                b_shift_limit=15, p=0.3), 
        
                   alb.GaussNoise(10, p=.2), 
        
                   alb.RandomBrightnessContrast(.05, (-.2, 0), True, p=0.2), 
        
                   alb.ImageCompression(95, p=.3), 
        
                   alb.ToGray(always_apply=True), 
        
                   alb.Normalize((0.7931, 0.7931, 0.7931), (0.1738, 0.1738, 0.1738)), 
        
                   # alb.Sharpen() 
        
                   ToTensorV2(), 
        
               ] 
        
           )

Also I'm going to convert this to a discussion, since it's not really an issue with this repo.

0 replies

only-yao · 2023-04-23T01:17:13Z

only-yao
Apr 23, 2023
Author

hank you for your reply. Before raising the issue, I had actually made some modifications to the patch_embeddings section. The updated code is as follows:

class CustomViTModel(VisionEncoderDecoderModel): 
    def __init__(self, config): 
        super().__init__(config)
        self.encoder.embeddings.patch_embeddings.projection = nn.Conv2d(3, 768, kernel_size=(8, 32), stride=(8, 32))

With this modification, the input image size is now 192x768. However, the performance still has a significant gap compared to Mathpix.

0 replies

only-yao · 2023-04-23T02:30:12Z

only-yao
Apr 23, 2023
Author

Regarding adding noise, I initially tried different noise adding methods and found that the impact on the results was not significant. Therefore, I eventually used blurring, dilation, and erosion as the noise adding methods, which are closest to real-world scenarios. The code is as follows:

def apply_random_noise(image, max_kernel_size=5):
    operation = random.choice(["dilate", "erode", "blur"])
    if operation == "dilate":
        kernel_size = 2
        result = apply_dilate(image, kernel_size)
    elif operation == "erode":
        kernel_size = 2
        kernel = np.ones((kernel_size, kernel_size), np.uint8)
        result = cv2.erode(image, kernel, iterations=1)
    elif operation == "blur":
        result = cv2.GaussianBlur(image, (3, 3), 0)
    return result

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving TrOCR performance on LaTeX recognition tasks #266

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Improving TrOCR performance on LaTeX recognition tasks #266

only-yao Apr 21, 2023

Replies: 3 comments

lukas-blecher Apr 21, 2023 Maintainer

only-yao Apr 23, 2023 Author

only-yao Apr 23, 2023 Author

only-yao
Apr 21, 2023

lukas-blecher
Apr 21, 2023
Maintainer

only-yao
Apr 23, 2023
Author

only-yao
Apr 23, 2023
Author