You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello!Thanks for your code! Have you observed the loss? I had loaded the code and executed it. However, the loss didn't seems to be convergent. It descends rapidly at first, but with the increase of the epoch, it looks like a cosine function and the amplitude increases too.
The text was updated successfully, but these errors were encountered:
However, the loss didn't seems to be convergent. It descends rapidly at first, but with the increase of the epoch, it looks like a cosine function and the amplitude increases too.
The fluctuation in the loss is probably caused by corresponding variation in the learning rate. It's using a cosine annealing learning rate schedule, in which the learning rate is lowered from its starting point to zero over the course of every epoch, and then reset at the start of the next one. This has been found to work in certain contexts, because it allows convergence to local minima within an epoch, but allows the optimizer to escape local minima by periodically resetting the learning rate.
Personally I've had more success with just straight SGD with the learning rate being exponentially lowered every epoch.
Hello!Thanks for your code! Have you observed the loss? I had loaded the code and executed it. However, the loss didn't seems to be convergent. It descends rapidly at first, but with the increase of the epoch, it looks like a cosine function and the amplitude increases too.
The text was updated successfully, but these errors were encountered: