Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new OCR model #753

Open
Ivachnenco opened this issue Jan 30, 2025 · 16 comments
Open

Add a new OCR model #753

Ivachnenco opened this issue Jan 30, 2025 · 16 comments

Comments

@Ivachnenco
Copy link

Google Lens stopped working again, which is a shame since it was the best model for capturing English text in manga in the program. Therefore, is it possible to add more sources of text recognition? The same PaddleOCR for example, which is used in luna translator and recognizes English letters perfectly. And also add Ocrspace which can be used thanks to a free API key. It also works very well and can be an alternative to google lens as an online service

@elfcute
Copy link

elfcute commented Jan 30, 2025

Can large model OCRs like Gemini 2.0 flash-exp, DeepSeek be added?

@abysmli
Copy link

abysmli commented Jan 31, 2025

could you please add google vision (paid version) plugin?

@bropines
Copy link
Collaborator

Can large model OCRs like Gemini 2.0 flash-exp, DeepSeek be added?

I have a module ready, but I'm too lazy to release it.

@elfcute
Copy link

elfcute commented Feb 1, 2025

Can large model OCRs like Gemini 2.0 flash-exp, DeepSeek be added?

I have a module ready, but I'm too lazy to release it.

You're too good

@elfcute
Copy link

elfcute commented Feb 1, 2025

Can large model OCRs like Gemini 2.0 flash-exp, DeepSeek be added?

I have a module ready, but I'm too lazy to release it.

Hi, I have another question. Do you remember that I contacted you on Telegram previously? One of the issues I brought up was that some large language models, like Gemini (gemini-2.0-flash-exp, gemini-exp-1206, gemini-2.0-flash-thinking-exp), sometimes encounter errors when translating text from a single image dialog box, especially when there is a large amount of text within that single dialog box.

Is this something that can be optimized and resolved?

Thanks.

@bropines
Copy link
Collaborator

bropines commented Feb 1, 2025

Can large model OCRs like Gemini 2.0 flash-exp, DeepSeek be added?

I have a module ready, but I'm too lazy to release it.

Hi, I have another question. Do you remember that I contacted you on Telegram previously? One of the issues I brought up was that some large language models, like Gemini (gemini-2.0-flash-exp, gemini-exp-1206, gemini-2.0-flash-thinking-exp), sometimes encounter errors when translating text from a single image dialog box, especially when there is a large amount of text within that single dialog box.

Is this something that can be optimized and resolved?

Thanks.

From a part. I plan to use structured responses, but unfortunately not all apis are able to work with them.
The previous logic worked based on the fact that we stupidly order the neural network to follow the rules, which it constantly violated. Google, openai, and claude have structured response functions, but the problem is that this needs to be integrated into the current translation method + a new ocr engine needs to be added. I'm not really lazy. It's just that I'm a student, and I don't have time to finish the modules during the session. After I get some rest, I'll make a new ocr module and update the translator.

@elfcute
Copy link

elfcute commented Feb 2, 2025

Can large model OCRs like Gemini 2.0 flash-exp, DeepSeek be added?

I have a module ready, but I'm too lazy to release it.

Hi, I have another question. Do you remember that I contacted you on Telegram previously? One of the issues I brought up was that some large language models, like Gemini (gemini-2.0-flash-exp, gemini-exp-1206, gemini-2.0-flash-thinking-exp), sometimes encounter errors when translating text from a single image dialog box, especially when there is a large amount of text within that single dialog box.
Is this something that can be optimized and resolved?
Thanks.

From a part. I plan to use structured responses, but unfortunately not all apis are able to work with them. The previous logic worked based on the fact that we stupidly order the neural network to follow the rules, which it constantly violated. Google, openai, and claude have structured response functions, but the problem is that this needs to be integrated into the current translation method + a new ocr engine needs to be added. I'm not really lazy. It's just that I'm a student, and I don't have time to finish the modules during the session. After I get some rest, I'll make a new ocr module and update the translator.

Oh, you're a student. You're amazing. You've been working so hard recently. Once you're done with your studies, make sure to rest well and recharge. We are all really looking forward to your new OCR engine.

@bropines
Copy link
Collaborator

By the way. I already added an ocd module a couple of days ago, based on Gemini and OpenAI. Test it out

@Bubucenter
Copy link

By the way. I already added an ocd module a couple of days ago, based on Gemini and OpenAI. Test it out

Hi thanks for adding ocr openai. I hope you can add 3rd party api url. So i can add 3rd party API. Thanks you very much

@bropines
Copy link
Collaborator

By the way. I already added an ocd module a couple of days ago, based on Gemini and OpenAI. Test it out

Hi thanks for adding ocr openai. I hope you can add 3rd party api url. So i can add 3rd party API. Thanks you very much

Ummm. He's there. if your third-party API supports OpenAI SCHEMES, then everything will work, the main thing is to insert the link to the endpoint.

@Bubucenter
Copy link

By the way. I already added an ocd module a couple of days ago, based on Gemini and OpenAI. Test it out

Hi thanks for adding ocr openai. I hope you can add 3rd party api url. So i can add 3rd party API. Thanks you very much

Ummm. He's there. if your third-party API supports OpenAI SCHEMES, then everything will work, the main thing is to insert the link to the endpoint.

I added your OpenAI translator. It works great. Hopefully the OCR part will do the same.

@bropines
Copy link
Collaborator

By the way. I already added an ocd module a couple of days ago, based on Gemini and OpenAI. Test it out

Hi thanks for adding ocr openai. I hope you can add 3rd party api url. So i can add 3rd party API. Thanks you very much

Ummm. He's there. if your third-party API supports OpenAI SCHEMES, then everything will work, the main thing is to insert the link to the endpoint.

I added your OpenAI translator. It works great. Hopefully the OCR part will do the same.

It's basically no different from the translator, except that it sends pictures and promt. When you test it, write back, we'll close the issue

@Bubucenter
Copy link

By the way. I already added an ocd module a couple of days ago, based on Gemini and OpenAI. Test it out

Hi thanks for adding ocr openai. I hope you can add 3rd party api url. So i can add 3rd party API. Thanks you very much

Ummm. He's there. if your third-party API supports OpenAI SCHEMES, then everything will work, the main thing is to insert the link to the endpoint.

I added your OpenAI translator. It works great. Hopefully the OCR part will do the same.

It's basically no different from the translator, except that it sends pictures and promt. When you test it, write back, we'll close the issue
Can you create an ocr module that has components like this translate item?
Image

@bropines
Copy link
Collaborator

By the way. I already added an ocd module a couple of days ago, based on Gemini and OpenAI. Test it out

Hi thanks for adding ocr openai. I hope you can add 3rd party api url. So i can add 3rd party API. Thanks you very much

Ummm. He's there. if your third-party API supports OpenAI SCHEMES, then everything will work, the main thing is to insert the link to the endpoint.

I added your OpenAI translator. It works great. Hopefully the OCR part will do the same.

It's basically no different from the translator, except that it sends pictures and promt. When you test it, write back, we'll close the issue
Can you create an ocr module that has components like this translate item?
Image

Apparently you didn't really want to figure it out.
Image

@Bubucenter
Copy link

Nhân tiện. Tôi đã thêm một mô-đun ocd vài ngày trước, dựa trên Gemini và OpenAI. Hãy thử nghiệm nhé

Xin chào, cảm ơn vì đã thêm ocr openai. Tôi hy vọng bạn có thể thêm url api của bên thứ 3. Vì vậy, tôi có thể thêm API của bên thứ 3. Cảm ơn bạn rất nhiều

Ừm. Anh ấy ở đó. Nếu API của bên thứ ba của bạn hỗ trợ OpenAI SCHEMES thì mọi thứ sẽ hoạt động, điều quan trọng nhất là chèn liên kết đến điểm cuối.

Tôi đã thêm trình dịch OpenAI của bạn. Nó hoạt động rất tốt. Hy vọng phần OCR cũng sẽ hoạt động như vậy.

Về cơ bản thì nó không khác gì trình dịch, ngoại trừ việc nó gửi hình ảnh và promt. Khi bạn kiểm tra, hãy viết lại, chúng tôi sẽ đóng vấn đề
Bạn có thể tạo một mô-đun ocr có các thành phần như mục dịch này không?
Image

Có vẻ như bạn không thực sự muốn tìm hiểu vấn đề này. Image

I did it. Thank you for your guidance. Thank you very much.

@bropines
Copy link
Collaborator

I think it's time to close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants