Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partially garbled audio on Huggingface online demo, for a short English input #709

Open
6 tasks done
rotemdan opened this issue Dec 5, 2024 · 0 comments
Open
6 tasks done
Labels
bug Something isn't working

Comments

@rotemdan
Copy link

rotemdan commented Dec 5, 2024

Self Checks

  • This template is only for bug reports. For questions, please visit Discussions.
  • I have thoroughly reviewed the project documentation (installation, training, inference) but couldn't find information to solve my problem. English 中文 日本語 Portuguese (Brazil)
  • I have searched for existing issues, including closed ones. Search issues
  • I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • Please do not modify this template and fill in all required fields.

Cloud or Self Hosted

Cloud

Environment Details

Huggingface online demo (V1.5 medium)

Steps to Reproduce

Tried to synthesize the text "We are not responsible for any misuse of the model, please consider your local laws and regulations before using it."

All parameters are left as default:

Screenshot_1

✔️ Expected Behavior

Normal speech.

❌ Actual Behavior

The audio starts normally as "We are not", and then followed by garbled audio, that sounds sped-up. The total audio duration is only 3 seconds.

2024-12-05.07-38-05.mp4

audio.zip

I got this after trying the model with only 2 test inputs, meaning that it's not that rare. If I try to synthesize the same text several times again, I get other voices, and they don't seem to have this issue (as much as I've tested).

Related issues

Seems closely related to issue #632, but I decided to open a new issue because:

  • My input was in English
  • Issue #632 is described as "Swallowing words, reading normally at first, then speeding up, and then not reading the last word of the sentence completely", but it's not exactly what is seen here. Here it starts normally but continues with a completely garbled audio
  • A comment on that issue says the issue is resolved
  • It was produced by the online demo, and the latest model (1.5 medium)
@rotemdan rotemdan added the bug Something isn't working label Dec 5, 2024
@rotemdan rotemdan changed the title Garbled speech on Huggingface demo, for a short English prompt Partially garbled audio on Huggingface demo, for a short English prompt Dec 5, 2024
@rotemdan rotemdan changed the title Partially garbled audio on Huggingface demo, for a short English prompt Partially garbled audio on Huggingface online demo, for a short English input Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant