about diffVC on Mandarin datasets #24

Theweekfoolish229 · 2022-11-11T02:17:32Z

Hello, I adapted the diffvc code on Mandarin datasets. However, the audio after VC has the problem of tone sandhi. I want to ask the performance is normal ?

Theweekfoolish229 · 2022-11-11T02:49:13Z

It's diffvc performance on Mandarin datasetshttps://www.yuque.com/qinjitao/te3orn/zfgbp9

yaoxunji · 2023-03-22T08:06:57Z

Hello, I adapted the diffvc code on Mandarin datasets. However, the audio after VC has the problem of tone sandhi. I want to ask the performance is normal ?

Hi, I meet the same problem, did you solve it？

Theweekfoolish229 · 2023-03-28T00:53:43Z

I didn't solve the problem, but lost my job because of it. Finally, I chose this solution: ppg+speaker_ id+decoder ----- 原始邮件 ----- 发件人：落日的悲哀 ***@***.***> 收件人：huawei-noah/Speech-Backbones ***@***.***> 抄送人：Theweekfoolish229 ***@***.***>, Author ***@***.***> 主题：Re: [huawei-noah/Speech-Backbones] about diffVC on Mandarin datasets (Issue #24) 日期：2023年03月22日 16点07分 Hello, I adapted the diffvc code on Mandarin datasets. However, the audio after VC has the problem of tone sandhi. I want to ask the performance is normal ? Hi, I meet the same problem, did you solve it？ — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

li1jkdaw · 2023-05-14T16:32:17Z

Hi, @Theweekfoolish229 !

Sorry to hear about that. Actually, we tested our voice conversion model only on the English dataset, and even for this language, despite the overall good quality, there were some problems with the source prosody preservation (including mispronunciation issues as discussed in Section 4.2 of our paper). This is because our DiffVC model has only the so-called "average mel-spectrograms" as the information about the content of the source utterance. For tone languages like Mandarin it may be insufficient.

However, in our recent paper we proposed a sampling method based on the optimal transport property that keeps source prosody better compared to vanilla sampling from a diffusion model. Although we didn't test it on Mandarin, we suppose it should work better on it as well and help to reduce the problem with tones. I hope that in the nearest future we'll add the sampling method described in the mentioned paper to our repo.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

about diffVC on Mandarin datasets #24

about diffVC on Mandarin datasets #24

Theweekfoolish229 commented Nov 11, 2022

Theweekfoolish229 commented Nov 11, 2022

yaoxunji commented Mar 22, 2023

Theweekfoolish229 commented Mar 28, 2023 via email

li1jkdaw commented May 14, 2023

about diffVC on Mandarin datasets #24

about diffVC on Mandarin datasets #24

Comments

Theweekfoolish229 commented Nov 11, 2022

Theweekfoolish229 commented Nov 11, 2022

yaoxunji commented Mar 22, 2023

Theweekfoolish229 commented Mar 28, 2023 via email

li1jkdaw commented May 14, 2023