Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensorboard ? :D #3

Open
dathudeptrai opened this issue Jul 5, 2020 · 7 comments
Open

Tensorboard ? :D #3

dathudeptrai opened this issue Jul 5, 2020 · 7 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@dathudeptrai
Copy link

Do you have any tensorboard :D, an audio samples sound good except background noise, maybe training longer will solve this problem haha :D, great job :D.

@dathudeptrai dathudeptrai changed the title Tensorboard :D Tensorboard ? :D Jul 5, 2020
@rishikksh20
Copy link
Owner

Tensorboard

@rishikksh20
Copy link
Owner

Tensorboard

@rishikksh20
Copy link
Owner

rishikksh20 commented Jul 5, 2020

@dathudeptrai Currently I am using raw pitch and energy with MSE that's why error looks so high but if required I will standardize or normalize in future. Pitch and energy both seem to be static after 10k steps and going towards over-fitting.

@rishikksh20 rishikksh20 self-assigned this Jul 5, 2020
@rishikksh20 rishikksh20 added the documentation Improvements or additions to documentation label Jul 5, 2020
@dathudeptrai
Copy link
Author

@rishikksh20 is the l1 loss use masked ? . I use the same preprocessing as ESPNET but my valid l1 loss is around < 0.3.

@rishikksh20
Copy link
Owner

rishikksh20 commented Jul 6, 2020

@dathudeptrai
I am using l1 loss masked actually l1_loss is combined loss of before and after Postnet l1 loss that's why it's very, although before and after l1 losses are also very high around 0.47 it should be around < 0.3 I don't know why is this happening whereas the quality of generated audio and spectrogram seems fine. And I am not using ESPNet pre-processing actually I am using Nvidia's pre-processing for simplicity and easy integration with Waveglow and my own implementation of Melgan. You can look at the code especially fastspeech.py, if you find any irregularities in implementation, please let me know.

@dathudeptrai
Copy link
Author

@rishikksh20 is nvidia norm from 0->4 ?

@rishikksh20
Copy link
Owner

@dathudeptrai I don't think so

def mel_spectrogram(self, y):

they are using spectral normalization:
def spectral_normalize(self, magnitudes):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants