OCR perform badly in Chinese content #365

hetailang · 2024-08-27T05:57:53Z

Describe the bug
I used the UI of llamaParse. When receive Chinese content, llamaParse output meanless content.
Files

Job ID
d873ef1c-3fb0-42df-b5e7-1482d0ee25f3

Screenshots
The output is shown as bellow

it only recognize english characters like https://movie.douban.com/subject/35351365

Options
I set OCR language field as ch_sim and leave other fields as default.

Additional context
without setting gpt4o_mode=True, the result still makes no sense using in python, so I guess the problem comes from the ocr used in llamaparse

The text was updated successfully, but these errors were encountered:

hetailang added the bug Something isn't working label Aug 27, 2024

BinaryBrain added the ocr label Aug 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OCR perform badly in Chinese content #365

OCR perform badly in Chinese content #365

hetailang commented Aug 27, 2024

OCR perform badly in Chinese content #365

OCR perform badly in Chinese content #365

Comments

hetailang commented Aug 27, 2024