Speechdiff, a framework for diffusion applied to speech

The goal of this repository is to make it easy to experiment with audiodatasets and diffusion. This is powered by hydra, scorebased generative models (Song et al.) and Grad-TTS (Popov et al.).

At the moment it's currently a revamp of Grad-TTS

Installation

Python 3.9.17

pip install cython

pip install -r requirements.txt

cd model/monotonic_align; python setup.py build_ext --inplace; cd ../..

Train a Grad-TTS model

Create a filelist in the form f'{audio_path}|{transcription}|{speaker_id}' where speaker_id is an integer. Edit config/data/data.yaml to suit your dataset.

Run

python train_multi_speaker.py --config-name=config +data=data

Edit config as desired, or make use of hydra's multirun utility.

Evaluate a TTS model

First generate predictions for your dataset

python generate_tts_preds.py --config-name=config +data=delete_this +eval=eval

Then calculate log-f0 rmse

python evaluate_tts.py --config-name=config +data=data +eval=eval

Compute log-likelihoods

Coming soon

Citation

@Misc{Cross2023SpeechDiff,
  author =       {Mattias Cross},
  title =        {Speech diff, a framework for diffusion applied to speech},
  howpublished = {Github},
  year =         {2023},
  url =          {https://github.com/Mattias421/speech-diff}
}

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
__pycache__		__pycache__
config		config
likelihood		likelihood
model		model
outputs/2023-11-03/15-32-06		outputs/2023-11-03/15-32-06
resources		resources
text		text
data.py		data.py
evaluate_tts.py		evaluate_tts.py
finetune_LTTS.py		finetune_LTTS.py
generate_tts_preds.py		generate_tts_preds.py
inference_example.py		inference_example.py
inference_ljs.py		inference_ljs.py
readme.md		readme.md
requirements.txt		requirements.txt
train_multi_speaker.py		train_multi_speaker.py
train_unconditional.py		train_unconditional.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speechdiff, a framework for diffusion applied to speech

Installation

Train a Grad-TTS model

Evaluate a TTS model

Compute log-likelihoods

Citation

About

Releases

Packages

Contributors 2

Languages

Mattias421/speechdiff

Folders and files

Latest commit

History

Repository files navigation

Speechdiff, a framework for diffusion applied to speech

Installation

Train a Grad-TTS model

Evaluate a TTS model

Compute log-likelihoods

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages