-
Notifications
You must be signed in to change notification settings - Fork 4.8k
Tips on training #75
Comments
Sending a feedback on my expoeriments. on epoch 2000 I start to get some littel understanding. look like the speaker is talking from very far. I also noticed that other models train for about 190k epochs. So I fell that to make it better is just a matter of letting it train |
It depends on your dataset size and whether you use the pre-trained model. Generally, training HiFiGAN (vocoder) from scratch will take about 300k to 1 million steps to get a good result. |
the pre trained model would be those files G_0.pth and D_0.pth? For testing I'm using 5 clips of 10 seconds, the same I used with Tortoise TTS, that brought a very good result on Tortoise. Thanks a lot for the clarifications. |
Yes, those pths are the pre-trained models. 50s data is not useful for training from scratch. |
How much data do you think is useful? |
I was making tests with this code and my experience is that training until 300 epochs brings terrible results, uncomprehensive sounds.
Its possible that the sound I'm using as input is the problem? Or by 300 epochs its normal to have only noise as result of the inference?
And what is the minimum to have something comprehensible as a result from the inference?
Thanks a lot for the help
The text was updated successfully, but these errors were encountered: