Alternative Optimizers #1164

KGrewal1 · 2023-10-23T20:16:01Z

Example of what a Nesterov momentum enhanced may look like: I will try to split this into a separate crate and will almost certainly modify the params struct in order to duplicate the optionality of pytorch's SGD

LaurentMazare · 2023-10-24T04:15:21Z

Thanks for the PR. This looks actually fairly simple, and nice to have put a test against pytorch too. If you're happy with it, I think it's worth merging already (just maybe rebase to get rid of the gelu grad diff and rerun your new test as it fails on the CI with some numerical precision issue).

KGrewal1 · 2023-10-24T09:11:09Z

Thanks: definitely curious as to the differences in to the floating point results on Mac as opposed to Linux; however I don't think this should be upstreamed as it is currently implemented as it only adds Nesterov momentum to the current implementation as opposed to implementing the full optionality of the PyTorch SGD.

Regarding an implementation of the full SGD, PyTorch doesn't currently allow negative momentum values, 0 momentum value whe Nesterov is enabled and dampening with Nesterov momentum despite those being algorithmically possible: the negative momentum specifically does seem to have possible uses in optimisation (https://www.sciencedirect.com/science/article/pii/S0165168421002395).

I can upstream a PR with the PyTorch SGD implemented hopefully this evening / tomorrow though this may be more indication that further optimisers are better suited to a separate crate, to prevent significant API changes in candle-nn.

LaurentMazare · 2023-10-24T09:42:06Z

Sounds great, yes let's see how it goes with the more feature complete optimizers.
On one hand having basic optimizers in the main candle is good but on the other hand having them close to more exotic optimizers such as lbfgs also makes sense. And whatever we decide, we can also change it in the future so no pressure at all.

* add bce with logit loss * add bce with logit loss * remove imports * fix tiny bug * add test documentation and refactor function * fix test cases and formatting

KGrewal1 · 2023-10-24T16:57:26Z

I've created a repo at https://github.com/KGrewal1/optimisers/tree/master with momentum added SGD, AdaGrad and AdaDelta. I'm going to close this PR but create a new one specifically suggesting a candle-optim within the crate similar to pytorch optim module (and try not to accidentally create diff's where there shouldn't be this time)

KGrewal1 added 3 commits October 23, 2023 17:55

derivative for GELU

322f756

add tests

4ee3d48

alternative optimisers

0374abd

Add Binary Cross Entropy With Logit Loss to nn crate (huggingface#1157)

7a9b5f5

* add bce with logit loss * add bce with logit loss * remove imports * fix tiny bug * add test documentation and refactor function * fix test cases and formatting

KGrewal1 closed this Oct 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alternative Optimizers #1164

Alternative Optimizers #1164

KGrewal1 commented Oct 23, 2023

LaurentMazare commented Oct 24, 2023 •

edited

Loading

KGrewal1 commented Oct 24, 2023

LaurentMazare commented Oct 24, 2023

KGrewal1 commented Oct 24, 2023

Alternative Optimizers #1164

Alternative Optimizers #1164

Conversation

KGrewal1 commented Oct 23, 2023

LaurentMazare commented Oct 24, 2023 • edited Loading

KGrewal1 commented Oct 24, 2023

LaurentMazare commented Oct 24, 2023

KGrewal1 commented Oct 24, 2023

LaurentMazare commented Oct 24, 2023 •

edited

Loading