Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative Optimizers #1164

Closed
wants to merge 4 commits into from
Closed

Conversation

KGrewal1
Copy link
Contributor

Example of what a Nesterov momentum enhanced may look like: I will try to split this into a separate crate and will almost certainly modify the params struct in order to duplicate the optionality of pytorch's SGD

@LaurentMazare
Copy link
Collaborator

LaurentMazare commented Oct 24, 2023

Thanks for the PR. This looks actually fairly simple, and nice to have put a test against pytorch too. If you're happy with it, I think it's worth merging already (just maybe rebase to get rid of the gelu grad diff and rerun your new test as it fails on the CI with some numerical precision issue).

@KGrewal1
Copy link
Contributor Author

Thanks: definitely curious as to the differences in to the floating point results on Mac as opposed to Linux; however I don't think this should be upstreamed as it is currently implemented as it only adds Nesterov momentum to the current implementation as opposed to implementing the full optionality of the PyTorch SGD.

Regarding an implementation of the full SGD, PyTorch doesn't currently allow negative momentum values, 0 momentum value whe Nesterov is enabled and dampening with Nesterov momentum despite those being algorithmically possible: the negative momentum specifically does seem to have possible uses in optimisation (https://www.sciencedirect.com/science/article/pii/S0165168421002395).

I can upstream a PR with the PyTorch SGD implemented hopefully this evening / tomorrow though this may be more indication that further optimisers are better suited to a separate crate, to prevent significant API changes in candle-nn.

@LaurentMazare
Copy link
Collaborator

Sounds great, yes let's see how it goes with the more feature complete optimizers.
On one hand having basic optimizers in the main candle is good but on the other hand having them close to more exotic optimizers such as lbfgs also makes sense. And whatever we decide, we can also change it in the future so no pressure at all.

* add bce with logit loss

* add bce with logit loss

* remove imports

* fix tiny bug

* add test documentation and refactor function

* fix test cases and formatting
@KGrewal1
Copy link
Contributor Author

I've created a repo at https://github.com/KGrewal1/optimisers/tree/master with momentum added SGD, AdaGrad and AdaDelta. I'm going to close this PR but create a new one specifically suggesting a candle-optim within the crate similar to pytorch optim module (and try not to accidentally create diff's where there shouldn't be this time)

@KGrewal1 KGrewal1 closed this Oct 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants