Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why doesn't the loss decrease during training? #11

Open
taoumb opened this issue Nov 22, 2024 · 5 comments
Open

Why doesn't the loss decrease during training? #11

taoumb opened this issue Nov 22, 2024 · 5 comments

Comments

@taoumb
Copy link

taoumb commented Nov 22, 2024

Can this code be trained correctly?

@xy-lin77
Copy link

I have run 50,000 iterations, it seems that the loss is always between 1.2-1.4

@EdoardoBotta
Copy link
Owner

Are you using the default hyperparameters for the model size? If yes, those generate a very small model and I do not expect to do well. I am planning to experiment with bigger model size to ensure the model can be trained.

@xy-lin77
Copy link

Thank you for your reply. I will try the other hyperparameters.

@TobinShaw
Copy link

@EdoardoBotta @xy-lin77 @taoumb I also encountered this issue. Have you made any new attempts and got some findings?

@EdoardoBotta
Copy link
Owner

EdoardoBotta commented Dec 13, 2024

I have updated the model size and dataset with a3812ff, making the model size the same as the original one from the paper and adding a textual embedding of the movie title as additional features. The loss reflected in the progress bar is now a moving average and it steadily decreases during execution of the training loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants