Add a new OCR model #753

Ivachnenco · 2025-01-30T13:28:01Z

Google Lens stopped working again, which is a shame since it was the best model for capturing English text in manga in the program. Therefore, is it possible to add more sources of text recognition? The same PaddleOCR for example, which is used in luna translator and recognizes English letters perfectly. And also add Ocrspace which can be used thanks to a free API key. It also works very well and can be an alternative to google lens as an online service

elfcute · 2025-01-30T17:59:18Z

Can large model OCRs like Gemini 2.0 flash-exp, DeepSeek be added?

abysmli · 2025-01-31T01:10:43Z

could you please add google vision (paid version) plugin?

bropines · 2025-01-31T13:28:57Z

Can large model OCRs like Gemini 2.0 flash-exp, DeepSeek be added?

I have a module ready, but I'm too lazy to release it.

elfcute · 2025-02-01T02:44:51Z

Can large model OCRs like Gemini 2.0 flash-exp, DeepSeek be added?

I have a module ready, but I'm too lazy to release it.

You're too good

elfcute · 2025-02-01T02:45:13Z

Can large model OCRs like Gemini 2.0 flash-exp, DeepSeek be added?

I have a module ready, but I'm too lazy to release it.

Hi, I have another question. Do you remember that I contacted you on Telegram previously? One of the issues I brought up was that some large language models, like Gemini (gemini-2.0-flash-exp, gemini-exp-1206, gemini-2.0-flash-thinking-exp), sometimes encounter errors when translating text from a single image dialog box, especially when there is a large amount of text within that single dialog box.

Is this something that can be optimized and resolved?

Thanks.

bropines · 2025-02-01T22:45:33Z

Can large model OCRs like Gemini 2.0 flash-exp, DeepSeek be added?

I have a module ready, but I'm too lazy to release it.

Hi, I have another question. Do you remember that I contacted you on Telegram previously? One of the issues I brought up was that some large language models, like Gemini (gemini-2.0-flash-exp, gemini-exp-1206, gemini-2.0-flash-thinking-exp), sometimes encounter errors when translating text from a single image dialog box, especially when there is a large amount of text within that single dialog box.

Is this something that can be optimized and resolved?

Thanks.

From a part. I plan to use structured responses, but unfortunately not all apis are able to work with them.
The previous logic worked based on the fact that we stupidly order the neural network to follow the rules, which it constantly violated. Google, openai, and claude have structured response functions, but the problem is that this needs to be integrated into the current translation method + a new ocr engine needs to be added. I'm not really lazy. It's just that I'm a student, and I don't have time to finish the modules during the session. After I get some rest, I'll make a new ocr module and update the translator.

elfcute · 2025-02-02T13:24:59Z

Can large model OCRs like Gemini 2.0 flash-exp, DeepSeek be added?

I have a module ready, but I'm too lazy to release it.

Hi, I have another question. Do you remember that I contacted you on Telegram previously? One of the issues I brought up was that some large language models, like Gemini (gemini-2.0-flash-exp, gemini-exp-1206, gemini-2.0-flash-thinking-exp), sometimes encounter errors when translating text from a single image dialog box, especially when there is a large amount of text within that single dialog box.
Is this something that can be optimized and resolved?
Thanks.

From a part. I plan to use structured responses, but unfortunately not all apis are able to work with them. The previous logic worked based on the fact that we stupidly order the neural network to follow the rules, which it constantly violated. Google, openai, and claude have structured response functions, but the problem is that this needs to be integrated into the current translation method + a new ocr engine needs to be added. I'm not really lazy. It's just that I'm a student, and I don't have time to finish the modules during the session. After I get some rest, I'll make a new ocr module and update the translator.

Oh, you're a student. You're amazing. You've been working so hard recently. Once you're done with your studies, make sure to rest well and recharge. We are all really looking forward to your new OCR engine.

bropines · 2025-02-22T10:15:44Z

By the way. I already added an ocd module a couple of days ago, based on Gemini and OpenAI. Test it out

Bubucenter · 2025-02-26T08:33:01Z

By the way. I already added an ocd module a couple of days ago, based on Gemini and OpenAI. Test it out

Hi thanks for adding ocr openai. I hope you can add 3rd party api url. So i can add 3rd party API. Thanks you very much

bropines · 2025-02-26T09:23:40Z

By the way. I already added an ocd module a couple of days ago, based on Gemini and OpenAI. Test it out

Hi thanks for adding ocr openai. I hope you can add 3rd party api url. So i can add 3rd party API. Thanks you very much

Ummm. He's there. if your third-party API supports OpenAI SCHEMES, then everything will work, the main thing is to insert the link to the endpoint.

Bubucenter · 2025-02-26T11:03:50Z

By the way. I already added an ocd module a couple of days ago, based on Gemini and OpenAI. Test it out

Hi thanks for adding ocr openai. I hope you can add 3rd party api url. So i can add 3rd party API. Thanks you very much

Ummm. He's there. if your third-party API supports OpenAI SCHEMES, then everything will work, the main thing is to insert the link to the endpoint.

I added your OpenAI translator. It works great. Hopefully the OCR part will do the same.

bropines · 2025-02-26T11:09:21Z

By the way. I already added an ocd module a couple of days ago, based on Gemini and OpenAI. Test it out

Hi thanks for adding ocr openai. I hope you can add 3rd party api url. So i can add 3rd party API. Thanks you very much

Ummm. He's there. if your third-party API supports OpenAI SCHEMES, then everything will work, the main thing is to insert the link to the endpoint.

I added your OpenAI translator. It works great. Hopefully the OCR part will do the same.

It's basically no different from the translator, except that it sends pictures and promt. When you test it, write back, we'll close the issue

Bubucenter · 2025-02-27T03:44:32Z

By the way. I already added an ocd module a couple of days ago, based on Gemini and OpenAI. Test it out

Hi thanks for adding ocr openai. I hope you can add 3rd party api url. So i can add 3rd party API. Thanks you very much

Ummm. He's there. if your third-party API supports OpenAI SCHEMES, then everything will work, the main thing is to insert the link to the endpoint.

I added your OpenAI translator. It works great. Hopefully the OCR part will do the same.

It's basically no different from the translator, except that it sends pictures and promt. When you test it, write back, we'll close the issue
Can you create an ocr module that has components like this translate item?

bropines · 2025-02-27T04:04:39Z

By the way. I already added an ocd module a couple of days ago, based on Gemini and OpenAI. Test it out

Hi thanks for adding ocr openai. I hope you can add 3rd party api url. So i can add 3rd party API. Thanks you very much

Ummm. He's there. if your third-party API supports OpenAI SCHEMES, then everything will work, the main thing is to insert the link to the endpoint.

I added your OpenAI translator. It works great. Hopefully the OCR part will do the same.

It's basically no different from the translator, except that it sends pictures and promt. When you test it, write back, we'll close the issue
Can you create an ocr module that has components like this translate item?

Apparently you didn't really want to figure it out.

Bubucenter · 2025-02-27T06:23:48Z

Nhân tiện. Tôi đã thêm một mô-đun ocd vài ngày trước, dựa trên Gemini và OpenAI. Hãy thử nghiệm nhé

Xin chào, cảm ơn vì đã thêm ocr openai. Tôi hy vọng bạn có thể thêm url api của bên thứ 3. Vì vậy, tôi có thể thêm API của bên thứ 3. Cảm ơn bạn rất nhiều

Ừm. Anh ấy ở đó. Nếu API của bên thứ ba của bạn hỗ trợ OpenAI SCHEMES thì mọi thứ sẽ hoạt động, điều quan trọng nhất là chèn liên kết đến điểm cuối.

Tôi đã thêm trình dịch OpenAI của bạn. Nó hoạt động rất tốt. Hy vọng phần OCR cũng sẽ hoạt động như vậy.

Về cơ bản thì nó không khác gì trình dịch, ngoại trừ việc nó gửi hình ảnh và promt. Khi bạn kiểm tra, hãy viết lại, chúng tôi sẽ đóng vấn đề
Bạn có thể tạo một mô-đun ocr có các thành phần như mục dịch này không?

Có vẻ như bạn không thực sự muốn tìm hiểu vấn đề này.

I did it. Thank you for your guidance. Thank you very much.

bropines · 2025-02-27T06:37:48Z

I think it's time to close the issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a new OCR model #753

Add a new OCR model #753

Ivachnenco commented Jan 30, 2025

elfcute commented Jan 30, 2025

abysmli commented Jan 31, 2025

bropines commented Jan 31, 2025

elfcute commented Feb 1, 2025

elfcute commented Feb 1, 2025

bropines commented Feb 1, 2025

elfcute commented Feb 2, 2025

bropines commented Feb 22, 2025

Bubucenter commented Feb 26, 2025

bropines commented Feb 26, 2025

Bubucenter commented Feb 26, 2025

bropines commented Feb 26, 2025

Bubucenter commented Feb 27, 2025

bropines commented Feb 27, 2025

Bubucenter commented Feb 27, 2025

bropines commented Feb 27, 2025

Add a new OCR model #753

Add a new OCR model #753

Comments

Ivachnenco commented Jan 30, 2025

elfcute commented Jan 30, 2025

abysmli commented Jan 31, 2025

bropines commented Jan 31, 2025

elfcute commented Feb 1, 2025

elfcute commented Feb 1, 2025

bropines commented Feb 1, 2025

elfcute commented Feb 2, 2025

bropines commented Feb 22, 2025

Bubucenter commented Feb 26, 2025

bropines commented Feb 26, 2025

Bubucenter commented Feb 26, 2025

bropines commented Feb 26, 2025

Bubucenter commented Feb 27, 2025

bropines commented Feb 27, 2025

Bubucenter commented Feb 27, 2025

bropines commented Feb 27, 2025