-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Machine translation - long sentences cause incomplete translation #32
Comments
@gregorybrooks sorry for the delayed response! 👋
|
Same Problem when I try it on google colab.
Result:
You can see the the tokenizer is doing a good job but the model is really limiting the output length. A work around is to add
Result:
|
I'm translating English sentences into Farsi with mt5-base-parsinlu-translation_en_fa (from Huggingface). Sentences longer than around 8 words result in the translation of the first part of the sentence, but the rest of the sentence is ignored. For example:
English sentences:
Terry's side fell to their second Premier League loss of the season at Loftus Road
Following a four-day hiatus, UN envoy Ismail Ould Cheikh Ahmed on Thursday will resume mediation efforts in the second round of Kuwait-hosted peace talks between Yemen’s warring rivals.
Mark Woods is a writer and broadcaster who has covered the NBA, and British basketball, for over a decade.
Translations:
طرفدار تری در فوتبال دوم فصل در لئوپوس رود به
پس از چهار روز توقف، سفیر سازمان ملل، ایمیل اولد شیخ
مارک ولز نویسنده و پخش کننده ای است که بیش از یک دهه
which according to Google Translate translates back to this:
More fans in the second football season in Leopard
After a four-day hiatus, the ambassador to the United Nations, Old Sheikh Sheikh
Mark Wells has been a writer and broadcaster for over a decade
I can't find any configuration settings that would be limiting the number of tokens being translated
Here is my code:
The text was updated successfully, but these errors were encountered: