Releases: yxlllc/DDSP-SVC
5.0: Improved DDSP Cascade Diffusion Model
4.0: DDSP Cascade Diffusion Model
Unzip the demo model into exp
directory, unzip the sample audios to the main directory, then run the demo samples:
# opencpop (1st speaker)
python main_diff.py -i samples/source.wav -diff exp/diffusion-new-demo/model_200000.pt -o samples/svc-opencpop+12key.wav -id 1 -k 12 -kstep 100
# kiritan (2nd speaker)
python main_diff.py -i samples/source.wav -diff exp/diffusion-new-demo/model_200000.pt -o samples/svc-kiritan+12key.wav -id 2 -k 12 -kstep 100
# mix the timbre of opencpop and kiritan in a 0.5 to 0.5 ratio
python main_diff.py -i samples/source.wav -diff exp/diffusion-new-demo/model_200000.pt -o samples/svc-opencpop_kiritan_mix+12key.wav -mix "{1:0.5,2:0.5}" -k 12 -kstep 100
The training data of this 2-speaker model is from opencpop and kiritan
Thanks to CN_ChiTu for helping to train this model.
3.0: Dramatically improve audio quality with a shallow diffusion model
Unzip the two demo models into exp
directory, then run the demo samples:
# opencpop (1st speaker)
python main_diff.py -i samples/source.wav -ddsp exp/ddsp-demo/model_300000.pt -diff exp/diffusion-demo/model_400000.pt -o samples/svc-opencpop+12key.wav -id 1 -k 12 -kstep 300
# kiritan (2nd speaker)
python main_diff.py -i samples/source.wav -ddsp exp/ddsp-demo/model_300000.pt -diff exp/diffusion-demo/model_400000.pt -o samples/svc-kiritan+12key.wav -id 2 -k 12 -kstep 300
# mix the timbre of opencpop and kiritan in a 0.5 to 0.5 ratio
python main_diff.py -i samples/source.wav -ddsp exp/ddsp-demo/model_300000.pt -diff exp/diffusion-demo/model_400000.pt -o samples/svc-opencpop_kiritan_mix+12key.wav -mix "{1:0.5,2:0.5}" -k 12 -kstep 300
The training data of this 2-speaker model is from opencpop and kiritan
Thanks to lafi2333 for helping to train the demo models.
2.0:Greatly optimized training speed
Unzip the pretrained model into exp
directory, then run the demo samples:
# opencpop (1st speaker)
python main.py -i samples/source.wav -m exp/multi_speaker/model_300000.pt -o samples/svc-opencpop+12key.wav -k 12 -id 1
# kiritan (2nd speaker)
python main.py -i samples/source.wav -m exp/multi_speaker/model_300000.pt -o samples/svc-kiritan+12key.wav -k 12 -id 2
# mix the timbre of opencpop and kiritan in a 0.5 to 0.5 ratio
python main.py -i samples/source.wav -m exp/multi_speaker/model_300000.pt -o samples/svc-opencpop_kiritan_mix+12key.wav -k 12 -mix "{1:0.5, 2:0.5}"
The training data of this 2-speaker model is from opencpop and kiritan
Thanks to CN_ChiTu for helping to train this model.
Multi-speaker support and timbre mixing
Unzip the pretrained model into exp
directory, then run the demo samples:
# opencpop (1st speaker)
python main.py -i samples/source.wav -m exp/multi_speaker/model_300000.pt -o samples/svc-opencpop+12key.wav -k 12 -pe crepe -e true -id 1
# kiritan (2nd speaker)
python main.py -i samples/source.wav -m exp/multi_speaker/model_300000.pt -o samples/svc-kiritan+12key.wav -k 12 -pe crepe -e true -id 2
# mix the timbre of opencpop and kiritan in a 0.5 to 0.5 ratio
python main.py -i samples/source.wav -m exp/multi_speaker/model_300000.pt -o samples/svc-opencpop_kiritan_mix+12key.wav -k 12 -pe crepe -e true -mix "{1:0.5, 2:0.5}"
The training data of this 2-speaker model is from opencpop and kiritan
Thanks to CN_ChiTu for helping to train this model.
1.0
Unzip the pretrained model into exp
directory, then run the demo samples:
# origin output
python main.py -i samples/source.wav -m exp/opencpop/model_300000.pt -o samples/svc-opencpop+10key-origin.wav -k 10 -pe crepe
# enhanced output
python main.py -i samples/source.wav -m exp/opencpop/model_300000.pt -o samples/svc-opencpop+10key-enhance.wav -k 10 -pe crepe -e true
The training data is from opencpop