Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempting! #20

Open
binarythinktank opened this issue Sep 11, 2020 · 11 comments
Open

Attempting! #20

binarythinktank opened this issue Sep 11, 2020 · 11 comments

Comments

@binarythinktank
Copy link

Hi

Trying to get this to work, with a custom sample set. I'm stuck at the end of this: https://github.com/ivanvovk/DurIAN#6-how-to-align-your-own-data

I don't see how to go from the many .TextGrid files to the 2 filelists that the config needs.
You mentioned using textgrid in python but I'm not a python dev, do you have a script to convert all of the textgrid files?

Cheers

@binarythinktank
Copy link
Author

binarythinktank commented Sep 12, 2020

ok, i have figured out how to read the data files:

import textgrid
import sys

f = './alligned/wavs3/{}.TextGrid'.format(sys.argv[1])
tg = textgrid.TextGrid.fromFile(f)

for i in tg.tiers[1].intervals:
print(i)

which gives me a bunch of these: Interval(0.0, 0.12, sil)

i don't entirely see how this converts to your filelist format though...

looks something like:
[original text]|[the first numbers from all of the intervals??]|[the second numbers from all of the intervals??]|[the text from all of the intervals]|[filename]

you say to "convert to frames in the doc", is that just x 100?

@carankt
Copy link
Collaborator

carankt commented Sep 13, 2020

On https://github.com/carankt/FastSpeech2.git
Check the Generate_FileList.ipynb
Hope it Helps!

@binarythinktank
Copy link
Author

many thanks!

@binarythinktank
Copy link
Author

@carankt tried running the notebook but getting:

ModuleNotFoundError Traceback (most recent call last)
in
3 import librosa
4 import textgrid
----> 5 from utils.files import get_files

ModuleNotFoundError: No module named 'utils.files'

@carankt
Copy link
Collaborator

carankt commented Sep 14, 2020

@binarythinktank clone the repo and then run. The notebook requires a module from the repo located in utils/files

@binarythinktank
Copy link
Author

binarythinktank commented Sep 14, 2020

@carankt same...
(to be clear, i cloned your repo, https://github.com/carankt/FastSpeech2)

@carankt
Copy link
Collaborator

carankt commented Sep 14, 2020

Take the latest version of the repo. Added the missing module.

@binarythinktank
Copy link
Author

binarythinktank commented Sep 14, 2020

thanks, now it works.
this approach is definitely different to my attempt to pull the relevant data out.
Your train.txt file is only 4mb, in my attempt, it somehow ended up being 10Gb....

python3 .\train_fastspeech.py is failing though, I'll need to debug again.

@binarythinktank
Copy link
Author

nope, i'm stuck again. If anyone has any idea...

2020-09-14 16:28:53,162 - INFO - ngpu: 1
2020-09-14 16:28:53,162 - INFO - random seed = 1
Using cache found in C:\Users\ml/.cache\torch\hub\seungwonpark_melgan_master
Model is loaded ...
New Training
Batch Size : 16
Trainable Parameters: 26.691M
Loading train data: 0%| | 0/160 [00:06<?, ?it/s]
Traceback (most recent call last):
File ".\train_fastspeech.py", line 452, in
main(sys.argv[1:])
File ".\train_fastspeech.py", line 448, in main
train(args, hp, hp_str, logger, vocoder)
File ".\train_fastspeech.py", line 93, in train
for data in pbar:
File "C:\Users\ml\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\tqdm\std.py", line 1130, in iter
for obj in iterable:
File "C:\Users\ml\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\torch\utils\data\dataloader.py", line 354, in next
data = self._next_data()
File "C:\Users\ml\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\torch\utils\data\dataloader.py", line 980, in _next_data
return self._process_data(data)
File "C:\Users\ml\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\torch\utils\data\dataloader.py", line 1005, in process_data
data.reraise()
File "C:\Users\ml\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\torch_utils.py", line 395, in reraise
raise self.exc_type(msg)
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "C:\Users\ml\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\torch\utils\data_utils\worker.py", line 185, in worker_loop
data = fetcher.fetch(index)
File "C:\Users\ml\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\torch\utils\data_utils\fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "C:\Users\ml\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\torch\utils\data_utils\fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "D:\home\Downloads\FastSpeech2\dataset\dataloader.py", line 51, in getitem
x = phonemes_to_sequence(x
)
File "D:\home\Downloads\FastSpeech2\dataset\texts_init
.py", line 175, in phonemes_to_sequence
sequence = [phoneme_to_id[s] for s in string]
File "D:\home\Downloads\FastSpeech2\dataset\texts_init
.py", line 175, in
sequence = [_phoneme_to_id[s] for s in string]
KeyError: 'ER0'

@carankt
Copy link
Collaborator

carankt commented Sep 15, 2020

The filelist you generated using my repo uses different symbols than this one. You must make adjustments to the filelist or the type of symbols in this repo to move forward.

@binarythinktank
Copy link
Author

binarythinktank commented Sep 15, 2020

which symbols? not the separator, both are using the pipe | character?

I will, in parallel, also attempt to use your version.

Have I put the correct values into your config?

wav_path = lab_path = '../melgan/wavs3' (contains the .wav and .lab files)
csv_path = '../melgan/texts.csv' (not sure what this should be, I put a csv file I used to create the .lab files, it has [filename without .wav]|[spoken text]\n )
dict_path = 'cmudict.txt' (not sure what this is, is the dictionary file in MFA? mine is called english.dict)
output_directory = 'training_log' (created this folder)
log_directory = 'fastspeech2-wavs3' (created this folder)
data_path = data_dir = '../melgan/data/' (output folder of preprocess from this repo, will it work?)
filelist_alignment_dir=teacher_dir = '../melgan/alligned/wavs3' (raw output from mfa, will it work?)
training_files='../melgan/filelists/train.txt' (converted from mfa dir by your repo)
validation_files='../melgan/filelists/valid.txt' (converted from mfa dir by your repo)

wrong csv_path file format?

Traceback (most recent call last):
File "train.py", line 144, in
main()
File "train.py", line 48, in main
train_loader, val_loader, collate_fn = prepare_dataloaders(hparams)
File "FastSpeech2\utils\utils.py", line 13, in prepare_dataloaders
trainset = TMDPESet(hparams.training_files, hparams)
File "FastSpeech2\utils\data_utils.py", line 20, in init
self.audiopaths_and_text = load_filepaths_and_text(audiopaths_and_text, hparams.data_path)
File "FastSpeech2\utils\data_utils.py", line 12, in load_filepaths_and_text
file_name, text1, text2 = line.strip().split('|')
ValueError: too many values to unpack (expected 3)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants