Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

derivative for GELU #1160

Merged
merged 2 commits into from
Oct 23, 2023
Merged

derivative for GELU #1160

merged 2 commits into from
Oct 23, 2023

Conversation

KGrewal1
Copy link
Contributor

Implementation of backprop derivative for GELU using the analytical derivative of the GELU approximation
$$0.5v\left(1+\tanh\left(\sqrt{\frac{2}{\pi}} v(1+0.044715v^{2})\right)\right)$$
as
$$0.5 + (0.398942 v + 0.0535161 v^3) (1 - \tanh^{2}(0.797885 v + 0.0356774 v^{3})) + 0.5 \tanh(0.797885 v + 0.0356774 v^{3})$$

@LaurentMazare
Copy link
Collaborator

Looks nice, could you also add a test to unary_grad and ensure that it matches the PyTorch output?
Also probably better to compute arg.powf(3.) only once rather than twice.

@KGrewal1
Copy link
Contributor Author

Tests added (using same input tensor as other tests) and changed to cube tensor only once

@LaurentMazare LaurentMazare merged commit 807e3f9 into huggingface:main Oct 23, 2023
10 of 12 checks passed
@LaurentMazare
Copy link
Collaborator

Great, thanks for the PR!

EricLBuehler pushed a commit to EricLBuehler/candle that referenced this pull request Oct 25, 2023
* derivative for GELU

* add tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants