Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deepspeed for XTTS #569

Open
hslr4 opened this issue Jul 2, 2024 · 3 comments
Open

deepspeed for XTTS #569

hslr4 opened this issue Jul 2, 2024 · 3 comments

Comments

@hslr4
Copy link

hslr4 commented Jul 2, 2024

Since the coqui docs recommend the use of deepspeed to speed up their XTTS model I wanted to give this a try.

To make it work I did the following:

  • I had to rebuild pytorch with USE_NCCL=1 because deepspeed requires nccl.
  • I also installed libaio-dev since a warning during the installation of deepspeed recommends it (though I am not sure if it is actually required).
  • I also had to downgrade setuptools to 69.5.1.
  • Since I built torch 2.1 (I don't know why 😅 ) I also had to rebuild torchaudio, which might not be necessary if I had rebuilt the same version of torch as was installed in the first place.
  • Finally I installed deepspeed with TORCH_CUDA_ARCH_LIST="7.2;8.7" pip3 install deepspeed.

Indeed inference of XTTS is about twice as fast as without deepspeed (still slower than other TTS models, that do not support voice cloning or multilinguality).

Would it make sense to provide pre-built pytorch versions with nccl for such use-cases?

@dusty-nv
Copy link
Owner

dusty-nv commented Jul 2, 2024

Thanks @hslr4 , those are interesting findings! I will try rebuilding PyTorch here with USE_NCCL=1 and perhaps make a container component for DeepSpeed. Do you know if it is faster than the use_tensorrt mode I added to XTTS? https://github.com/dusty-nv/TTS/commits/dev/

@dusty-nv
Copy link
Owner

dusty-nv commented Jul 2, 2024

OK, pytorch passed with USE_NCCL=1! The pytorch 2.2 wheel for JP6 / Python 3.10 is up awhile here: http://jetson.webredirect.org/jp6/cu122/+f/3a2/9b5771b21e4e1/torch-2.2.0-cp310-cp310-linux_aarch64.whl

You should not need to recompile torchvision or torchaudio wheels I don't believe.

@hslr4
Copy link
Author

hslr4 commented Jul 3, 2024

Thank you for the quick reply and help @dusty-nv !

When I wanted to do some testing regarding inference speed, I noticed that tensorrt is currently only used in inference_stream but not in inference (https://github.com/dusty-nv/TTS/blob/b452e628316e0fe33b8842e5d6ec1eb01fc46ef3/TTS/tts/models/xtts.py#L584)

After I added it to inference I used (a customized) tts server to generate a set of different length sentences (45 sentences with 65 characters on avg and 2927 in total). These are the mean times for generating the speech for these sentences:

mean time trt deepspeed
5.04 false false
4.94 fp16 false
2.11 false true
1.94 fp16 true

Seems like using deepspeed on xtts's gpt is more helpful than tensorrt on the hifigan_decoder. Actually the speedup by trt appears comparably low, so I'm still not sure if I'm actually using it right 🙈 Did you test how effective the hifigan_decoder_trt is before?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants