Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tiny generation slower and yields "IndexError: tuple out of range" #144

Open
ScottMcMac opened this issue Oct 3, 2024 · 0 comments
Open

Comments

@ScottMcMac
Copy link

ScottMcMac commented Oct 3, 2024

Using the same code that worked for me with parler-tts/parler-tts-mini-expresso yields slower much slower generations (like 4-5x) and an error with parler-tts/parler-tts-tiny-v1.

I then went an used the "specific voice" and "random voice" suggested scripts from the hugging face repo for tiny. Specifically the error I get is:

sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/scott/miniconda3/envs/parler/lib/python3.11/site-packages/soundfile.py", line 342, in write
    channels = data.shape[1]
               ~~~~~~~~~~^^^
IndexError: tuple index out of range

I'm running it on an RTX 3090, Ubuntu 24.04. I just confirmed this problem in a brand new conda env:

conda create -n parler-bare python=3.11
conda activate parler-bare
pip install git+https://github.com/huggingface/parler-tts.git
python

#Run example code blocks from huggingface, e.g. 
import torch
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf

device = "cuda:0" if torch.cuda.is_available() else "cpu"

model = ParlerTTSForConditionalGeneration.from_pretrained("parler-tts/parler-tts-tiny-v1").to(device)
tokenizer = AutoTokenizer.from_pretrained("parler-tts/parler-tts-tiny-v1")

prompt = "Hey, how are you doing today?"
description = "A female speaker delivers a slightly expressive and animated speech with a moderate speed and pitch. The recording is of very high quality, with the speaker's voice sounding clear and very close up."

input_ids = tokenizer(description, return_tensors="pt").input_ids.to(device)
prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
audio_arr = generation.cpu().numpy().squeeze()
sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)

(I also got the warning The attention mask is not set and cannot be inferred from input because pad token is same as eos token.As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results. after the generation =... line. I was getting (different) attention mask warnings with my working mini-expresso code though.)

P.S. Thanks for these great models!

@ScottMcMac ScottMcMac changed the title Tiny generation slower and "IndexError: tuple out of shape" Tiny generation slower and "IndexError: tuple out of range" Oct 3, 2024
@ScottMcMac ScottMcMac changed the title Tiny generation slower and "IndexError: tuple out of range" Tiny generation slower and yields "IndexError: tuple out of range" Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant