-
Notifications
You must be signed in to change notification settings - Fork 296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increasing Transcription Delays Over Time #152
Comments
use
In this case, it is expected that the quality and latency is poor. You can check if the whisper model you're using is able to transcribe/translate your audio correctly in offline mode. If not, you need a better model. If yes, you can try bigger min-chunk-size.
This is challenging for Whisper, especially for ASR. There is automatic language detection from every incoming chunk. If it's unclear or ambiguous, it can be wrong and then transcript is wrong. Bigger min-chunk-size could help. Or updating whisper-streaming code so that the lang. detection is more robust, like select only from preselected options, or use bigger chunk to detect the language and then keep it until the next silence/swap of speakers. Diarization could be added for that. Good luck! Reopen if you need to continue with discussion. |
Thank you for the response! The issue seems to be that it’s sending 40 minutes of audio continuously to Faster Whisper every second, and I also think this won’t help the transcription, which makes it pretty much unusable. What’s worse is that the length of the audio keeps growing longer over time. Maybe there could be an option to reset the process when this happens, ignore prefix, or just something to force the audio buffer under a limit? That might help keep things running smoothly. What do you think? |
yes, I also notice it, but it's rare and never found replicable input to debug it. Can you share the options, model and audio input where the bug can be replicated? |
|
Processing audio with duration 39:49.882
from faster whisper backend, which is a whole 39 minutes of audioCode to Reproduce:
The text was updated successfully, but these errors were encountered: