Tiny generation slower and yields "IndexError: tuple out of range" #144

ScottMcMac · 2024-10-03T11:06:54Z

Using the same code that worked for me with parler-tts/parler-tts-mini-expresso yields slower much slower generations (like 4-5x) and an error with parler-tts/parler-tts-tiny-v1.

I then went an used the "specific voice" and "random voice" suggested scripts from the hugging face repo for tiny. Specifically the error I get is:

sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/scott/miniconda3/envs/parler/lib/python3.11/site-packages/soundfile.py", line 342, in write
    channels = data.shape[1]
               ~~~~~~~~~~^^^
IndexError: tuple index out of range

I'm running it on an RTX 3090, Ubuntu 24.04. I just confirmed this problem in a brand new conda env:

conda create -n parler-bare python=3.11
conda activate parler-bare
pip install git+https://github.com/huggingface/parler-tts.git
python

#Run example code blocks from huggingface, e.g. 
import torch
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf

device = "cuda:0" if torch.cuda.is_available() else "cpu"

model = ParlerTTSForConditionalGeneration.from_pretrained("parler-tts/parler-tts-tiny-v1").to(device)
tokenizer = AutoTokenizer.from_pretrained("parler-tts/parler-tts-tiny-v1")

prompt = "Hey, how are you doing today?"
description = "A female speaker delivers a slightly expressive and animated speech with a moderate speed and pitch. The recording is of very high quality, with the speaker's voice sounding clear and very close up."

input_ids = tokenizer(description, return_tensors="pt").input_ids.to(device)
prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
audio_arr = generation.cpu().numpy().squeeze()
sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)

(I also got the warning The attention mask is not set and cannot be inferred from input because pad token is same as eos token.As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results. after the generation =... line. I was getting (different) attention mask warnings with my working mini-expresso code though.)

P.S. Thanks for these great models!

The text was updated successfully, but these errors were encountered:

ScottMcMac changed the title ~~Tiny generation slower and "IndexError: tuple out of shape"~~ Tiny generation slower and "IndexError: tuple out of range" Oct 3, 2024

ScottMcMac changed the title ~~Tiny generation slower and "IndexError: tuple out of range"~~ Tiny generation slower and yields "IndexError: tuple out of range" Oct 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tiny generation slower and yields "IndexError: tuple out of range" #144

Tiny generation slower and yields "IndexError: tuple out of range" #144

ScottMcMac commented Oct 3, 2024 •

edited

Loading

Tiny generation slower and yields "IndexError: tuple out of range" #144

Tiny generation slower and yields "IndexError: tuple out of range" #144

Comments

ScottMcMac commented Oct 3, 2024 • edited Loading

ScottMcMac commented Oct 3, 2024 •

edited

Loading