Why doesn't the loss decrease during training？ #11

taoumb · 2024-11-22T20:37:23Z

Can this code be trained correctly?

xy-lin77 · 2024-11-27T08:33:35Z

I have run 50,000 iterations, it seems that the loss is always between 1.2-1.4

EdoardoBotta · 2024-11-27T14:06:06Z

Are you using the default hyperparameters for the model size? If yes, those generate a very small model and I do not expect to do well. I am planning to experiment with bigger model size to ensure the model can be trained.

xy-lin77 · 2024-11-27T14:08:46Z

Thank you for your reply. I will try the other hyperparameters.

TobinShaw · 2024-12-03T03:17:36Z

@EdoardoBotta @xy-lin77 @taoumb I also encountered this issue. Have you made any new attempts and got some findings?

EdoardoBotta · 2024-12-13T13:26:14Z

I have updated the model size and dataset with a3812ff, making the model size the same as the original one from the paper and adding a textual embedding of the movie title as additional features. The loss reflected in the progress bar is now a moving average and it steadily decreases during execution of the training loop.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why doesn't the loss decrease during training？ #11

Why doesn't the loss decrease during training？ #11

taoumb commented Nov 22, 2024

xy-lin77 commented Nov 27, 2024

EdoardoBotta commented Nov 27, 2024

xy-lin77 commented Nov 27, 2024

TobinShaw commented Dec 3, 2024

EdoardoBotta commented Dec 13, 2024 •

edited

Loading

Why doesn't the loss decrease during training？ #11

Why doesn't the loss decrease during training？ #11

Comments

taoumb commented Nov 22, 2024

xy-lin77 commented Nov 27, 2024

EdoardoBotta commented Nov 27, 2024

xy-lin77 commented Nov 27, 2024

TobinShaw commented Dec 3, 2024

EdoardoBotta commented Dec 13, 2024 • edited Loading

EdoardoBotta commented Dec 13, 2024 •

edited

Loading