Partially garbled audio on Huggingface online demo, for a short English input #709
Open
6 tasks done
Labels
bug
Something isn't working
Self Checks
Cloud or Self Hosted
Cloud
Environment Details
Huggingface online demo (
V1.5
medium)Steps to Reproduce
Tried to synthesize the text "We are not responsible for any misuse of the model, please consider your local laws and regulations before using it."
All parameters are left as default:
✔️ Expected Behavior
Normal speech.
❌ Actual Behavior
The audio starts normally as "We are not", and then followed by garbled audio, that sounds sped-up. The total audio duration is only 3 seconds.
2024-12-05.07-38-05.mp4
audio.zip
I got this after trying the model with only 2 test inputs, meaning that it's not that rare. If I try to synthesize the same text several times again, I get other voices, and they don't seem to have this issue (as much as I've tested).
Related issues
Seems closely related to issue #632, but I decided to open a new issue because:
The text was updated successfully, but these errors were encountered: