Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

about diffVC on Mandarin datasets #24

Open
Theweekfoolish229 opened this issue Nov 11, 2022 · 4 comments
Open

about diffVC on Mandarin datasets #24

Theweekfoolish229 opened this issue Nov 11, 2022 · 4 comments

Comments

@Theweekfoolish229
Copy link

Hello, I adapted the diffvc code on Mandarin datasets. However, the audio after VC has the problem of tone sandhi. I want to ask the performance is normal ?

@Theweekfoolish229
Copy link
Author

It's diffvc performance on Mandarin datasetshttps://www.yuque.com/qinjitao/te3orn/zfgbp9

@yaoxunji
Copy link

Hello, I adapted the diffvc code on Mandarin datasets. However, the audio after VC has the problem of tone sandhi. I want to ask the performance is normal ?

Hi, I meet the same problem, did you solve it?

@Theweekfoolish229
Copy link
Author

Theweekfoolish229 commented Mar 28, 2023 via email

@li1jkdaw
Copy link

Hi, @Theweekfoolish229 !

Sorry to hear about that. Actually, we tested our voice conversion model only on the English dataset, and even for this language, despite the overall good quality, there were some problems with the source prosody preservation (including mispronunciation issues as discussed in Section 4.2 of our paper). This is because our DiffVC model has only the so-called "average mel-spectrograms" as the information about the content of the source utterance. For tone languages like Mandarin it may be insufficient.

However, in our recent paper we proposed a sampling method based on the optimal transport property that keeps source prosody better compared to vanilla sampling from a diffusion model. Although we didn't test it on Mandarin, we suppose it should work better on it as well and help to reduce the problem with tones. I hope that in the nearest future we'll add the sampling method described in the mentioned paper to our repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants