-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a new OCR model #753
Comments
Can large model OCRs like Gemini 2.0 flash-exp, DeepSeek be added? |
could you please add google vision (paid version) plugin? |
I have a module ready, but I'm too lazy to release it. |
You're too good |
Hi, I have another question. Do you remember that I contacted you on Telegram previously? One of the issues I brought up was that some large language models, like Gemini (gemini-2.0-flash-exp, gemini-exp-1206, gemini-2.0-flash-thinking-exp), sometimes encounter errors when translating text from a single image dialog box, especially when there is a large amount of text within that single dialog box. Is this something that can be optimized and resolved? Thanks. |
From a part. I plan to use structured responses, but unfortunately not all apis are able to work with them. |
Oh, you're a student. You're amazing. You've been working so hard recently. Once you're done with your studies, make sure to rest well and recharge. We are all really looking forward to your new OCR engine. |
By the way. I already added an ocd module a couple of days ago, based on Gemini and OpenAI. Test it out |
Hi thanks for adding ocr openai. I hope you can add 3rd party api url. So i can add 3rd party API. Thanks you very much |
Ummm. He's there. if your third-party API supports OpenAI SCHEMES, then everything will work, the main thing is to insert the link to the endpoint. |
I added your OpenAI translator. It works great. Hopefully the OCR part will do the same. |
It's basically no different from the translator, except that it sends pictures and promt. When you test it, write back, we'll close the issue |
I think it's time to close the issue. |
Google Lens stopped working again, which is a shame since it was the best model for capturing English text in manga in the program. Therefore, is it possible to add more sources of text recognition? The same PaddleOCR for example, which is used in luna translator and recognizes English letters perfectly. And also add Ocrspace which can be used thanks to a free API key. It also works very well and can be an alternative to google lens as an online service
The text was updated successfully, but these errors were encountered: