-
-
Notifications
You must be signed in to change notification settings - Fork 359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
youtubetranscript.com cc selection option #179
Comments
Same error here. Maybe adding an option to select language solves the problem :) |
yeah same here, option to select would be good. |
Hi @pasdesinfos, I actually thought about introducing this as an optional feature before, but there is an implementation detail that stopped me from doing so: if we want to automatically translate to the user-requested language, which transcript do we choose to translate from (if there are multiple)? Depending on the transcript we are translating from, the quality of the output will vary. A few things to consider:
So which heuristic for choosing the transcript to translate from, is most likely to yield the highest quality transcript? Any thoughts on this? |
@jdepoix First of all, I don't know what it means to translate transcripts, but the ASRs created in Turkish were understandable, if not completely accurate. |
Hi, IMHO the problem is when the main language of the video is in another language different to English, @toprak, @pasdeinfos and I are talking about adding an option (or allow automatically) the option of getting the source video original generated subtitles, not about translating them. If you get any Spanish video like this one: https://youtubetranscript.com/?v=Dby0_0vdr30 you will see the error, in the CLI tool you have to set the Spanish language to allow getting the correct transcript Hope this explains the use case, best regards! |
Hi @toprak and @erseco, |
I trust everything is well. That's right @jdepoix. For instance, in the output for the video https://youtu.be/BOKqyl0VT7A , https://youtubetranscript.com/?v=BOKqyl0VT7A, indicates "No transcripts were found for any of the requested language codes: ('en',)", however it appears that "transcripts are available in the following languages: (MANUALLY CREATED) None (GENERATED) - fr ("French (auto-generated)")[TRANSLATABLE] ". Could the heuristic be obtaining, by default, the auto-translated english version, when GENERATED transcript exists and is TRANSLATABLE. Ergo the output ":( Unknown error" will appear only in the event no transcripts at all exist. Kind regards to everyone! |
Hi all. Same here, only if the YT source isn't in EN. As mentioned, just a selector can handle it. |
Hi @jdepoix, @toprak, @erseco, @toniseldr, I wanted to take a moment to express my heartfelt gratitude to each of you for your invaluable contributions, unwavering dedication, commitment, and hard work. Your efforts have truly made a significant impact in making lives more wonderful. 🙏🎉 I mean, let's be honest here, without your brilliance, I'd probably be lost in a sea of confusion and chaos. 🌊😅 With self-deprecating humor and sincere appreciation, |
Hi @pasdesinfos, thank you very much for the kind words! 😊 However, this hasn't been implemented so I think it is okay for the ticket to stay open. Although I am not actively working on this, it might be something that someone wants to contribute to! |
Hi, from llama_index.readers.youtube_transcript import YoutubeTranscriptReader loader = YoutubeTranscriptReader() Do you have any idea how can this be solved ? |
Hi @MarouaneZhani, |
Hi @jdepoix I saw somewhere in the error the available code language was something like that "de-DE" and it worked after trying it ! Thanks |
Is your feature request related tweets o a problem? Please describe.
:( Unknown error: Could not retrieve a transcript for the video http://www.youtube.com/watch?v=oBfDbucxPU4! This is most likely caused by: No transcripts were found for any of the requested language codes: ('en',) For this video (oBfDbucxPU4) transcripts are available in the following languages: (MANUALLY CREATED) None (GENERATED) - es ("Spanish (auto-generated)")[TRANSLATABLE] (TRANSLATION LANGUAGES) - af ("Afrikaans") - ak ("Akan") - sq ("Albanian") - am ("Amharic") - ar ("Arabic") - hy ("Armenian") - as ("Assamese") - ay ("Aymara") - az ("Azerbaijani") - bn ("Bangla") - eu ("Basque") - be ("Belarusian") - bho ("Bhojpuri") - bs ("Bosnian") - bg ("Bulgarian") - my ("Burmese") - ca ("Catalan") - ceb ("Cebuano") - zh-Hans ("Chinese (Simplified)") - zh-Hant ("Chinese (Traditional)") - co ("Corsican") - hr ("Croatian") - cs ("Czech") - da ("Danish") - dv ("Divehi") - nl ("Dutch") - en ("English") - eo ("Esperanto") - et ("Estonian") - ee ("Ewe") - fil ("Filipino") - fi ("Finnish") - fr ("French") - gl ("Galician") - lg ("Ganda") - ka ("Georgian") - de ("German") - el ("Greek") - gn ("Guarani") - gu ("Gujarati") - ht ("Haitian Creole") - ha ("Hausa") - haw ("Hawaiian") - iw ("Hebrew") - hi ("Hindi") - hmn ("Hmong") - hu ("Hungarian") - is ("Icelandic") - ig ("Igbo") - id ("Indonesian") - ga ("Irish") - it ("Italian") - ja ("Japanese") - jv ("Javanese") - kn ("Kannada") - kk ("Kazakh") - km ("Khmer") - rw ("Kinyarwanda") - ko ("Korean") - kri ("Krio") - ku ("Kurdish") - ky ("Kyrgyz") - lo ("Lao") - la ("Latin") - lv ("Latvian") - ln ("Lingala") - lt ("Lithuanian") - lb ("Luxembourgish") - mk ("Macedonian") - mg ("Malagasy") - ms ("Malay") - ml ("Malayalam") - mt ("Maltese") - mi ("Māori") - mr ("Marathi") - mn ("Mongolian") - ne ("Nepali") - nso ("Northern Sotho") - no ("Norwegian") - ny ("Nyanja") - or ("Odia") - om ("Oromo") - ps ("Pashto") - fa ("Persian") - pl ("Polish") - pt ("Portuguese") - pa ("Punjabi") - qu ("Quechua") - ro ("Romanian") - ru ("Russian") - sm ("Samoan") - sa ("Sanskrit") - gd ("Scottish Gaelic") - sr ("Serbian") - sn ("Shona") - sd ("Sindhi") - si ("Sinhala") - sk ("Slovak") - sl ("Slovenian") - so ("Somali") - st ("Southern Sotho") - es ("Spanish") - su ("Sundanese") - sw ("Swahili") - sv ("Swedish") - tg ("Tajik") - ta ("Tamil") - tt ("Tatar") - te ("Telugu") - th ("Thai") - ti ("Tigrinya") - ts ("Tsonga") - tr ("Turkish") - tk ("Turkmen") - uk ("Ukrainian") - und ("Unknown Language") - ur ("Urdu") - ug ("Uyghur") - uz ("Uzbek") - vi ("Vietnamese") - cy ("Welsh") - fy ("Western Frisian") - xh ("Xhosa") - yi ("Yiddish") - yo ("Yoruba") - zu ("Zulu") If you are sure that the described cause is not responsible for this error and that a transcript should be retrievable, please create an issue at https://github.com/jdepoix/youtube-transcript-api/issues. Please add which version of youtube_transcript_api you are using and provide the information needed to replicate the error. Also make sure that there are no open issues which already describe your problem!
Describe the solution you'd like
When available auto-generated subtitl, to be translated to en and transcribed as per default
Describe alternatives you've considered
cc selection option
Additional context
n/a
The text was updated successfully, but these errors were encountered: