Replies: 3 comments
-
The TrOCR resizes all images to the same size. From a quick search it looks like a square with side length 384. LaTeX-OCR/pix2tex/dataset/transforms.py Lines 4 to 21 in dd847d2 Also I'm going to convert this to a discussion, since it's not really an issue with this repo. |
Beta Was this translation helpful? Give feedback.
-
hank you for your reply. Before raising the issue, I had actually made some modifications to the patch_embeddings section. The updated code is as follows:
With this modification, the input image size is now 192x768. However, the performance still has a significant gap compared to Mathpix. |
Beta Was this translation helpful? Give feedback.
-
Regarding adding noise, I initially tried different noise adding methods and found that the impact on the results was not significant. Therefore, I eventually used blurring, dilation, and erosion as the noise adding methods, which are closest to real-world scenarios. The code is as follows:
|
Beta Was this translation helpful? Give feedback.
-
I have experimented with the trcor-base model using around 5 million LaTeX samples, but found that the performance still lags behind Mathpix. I am looking for suggestions and potential improvements to the model. Here are some optimization approaches I have already tried:
Rendering with different fonts
Adding noise to the LaTeX images
Modifying the convolutional kernel in TrOCR's image preprocessing
I have encountered some bad cases in the current model, and I would like to know if there are other strategies that could be employed to improve the model's performance on LaTeX recognition tasks.
Any suggestions or insights would be greatly appreciated.
Beta Was this translation helpful? Give feedback.
All reactions