Fix Elu gradient NaN on large input #2328

agerasev · 2024-07-11T06:14:02Z

Current Elu backward pass implementation produce NaN when an argument is a large positive number.

Let variable x has value:

[ -3.0092e2,  -2.7229e2,  -2.4637e2,  -2.2293e2,  -2.0171e2,  -1.8252e2,
  -1.6515e2,  -1.4943e2,  -1.3521e2,  -1.2234e2,  -1.1070e2,  -1.0017e2,
  -9.0633e1,  -8.2008e1,  -7.4203e1,  -6.7141e1,  -6.0751e1,  -5.4969e1,
  -4.9737e1,  -4.5003e1,  -4.0719e1,  -3.6843e1,  -3.3336e1,  -3.0162e1,
  -2.7290e1,  -2.4691e1,  -2.2339e1,  -2.0211e1,  -1.8286e1,  -1.6543e1,
  -1.4965e1,  -1.3538e1,  -1.2246e1,  -1.1076e1,  -1.0018e1,  -9.0596e0,
  -8.1919e0,  -7.4063e0,  -6.6948e0,  -6.0502e0,  -5.4663e0,  -4.9370e0,
  -4.4571e0,  -4.0219e0,  -3.6269e0,  -3.2682e0,  -2.9422e0,  -2.6456e0,
  -2.3756e0,  -2.1293e0,  -1.9043e0,  -1.6984e0,  -1.5095e0,  -1.3357e0,
  -1.1752e0,  -1.0265e0, -8.8811e-1, -7.5859e-1, -6.3666e-1, -5.2110e-1,
 -4.1076e-1, -3.0452e-1, -2.0134e-1, -1.0017e-1, -3.9339e-6,  1.0016e-1,
  2.0133e-1,  3.0452e-1,  4.1075e-1,  5.2109e-1,  6.3665e-1,  7.5858e-1,
  8.8810e-1,   1.0265e0,   1.1752e0,   1.3356e0,   1.5095e0,   1.6984e0,
   1.9043e0,   2.1293e0,   2.3756e0,   2.6456e0,   2.9422e0,   3.2682e0,
   3.6268e0,   4.0218e0,   4.4571e0,   4.9369e0,   5.4662e0,   6.0502e0,
   6.6947e0,   7.4062e0,   8.1919e0,   9.0595e0,   1.0018e1,   1.1076e1,
   1.2246e1,   1.3538e1,   1.4965e1,   1.6543e1,   1.8285e1,   2.0211e1,
   2.2339e1,   2.4691e1,   2.7290e1,   3.0162e1,   3.3335e1,   3.6843e1,
   4.0719e1,   4.5003e1,   4.9737e1,   5.4969e1,   6.0751e1,   6.7141e1,
   7.4203e1,   8.2007e1,   9.0633e1,   1.0017e2,   1.1070e2,   1.2234e2,
   1.3521e2,   1.4943e2,   1.6515e2,   1.8252e2,   2.0171e2,   2.2293e2,
   2.4637e2,   2.7228e2,   3.0092e2,   3.3257e2]

then this code:

x.elu(1.0)?.backward()?.get(&x).unwrap().clone()

will produce this result:

[  0.0000e0,   0.0000e0,   0.0000e0,   0.0000e0,   0.0000e0,   0.0000e0,
   0.0000e0,   0.0000e0,   0.0000e0,   0.0000e0,   0.0000e0, 3.0829e-44,
 4.3490e-40, 2.4231e-36, 5.9417e-33, 6.9330e-30, 4.1314e-27, 1.3403e-24,
 2.5084e-22, 2.8537e-20, 2.0692e-18, 9.9817e-17, 3.3302e-15, 7.9587e-14,
 1.4064e-12, 1.8913e-11, 1.9865e-10,  1.6685e-9,  1.1447e-8,  6.5404e-8,
  3.1667e-7,  1.3199e-6,  4.8047e-6,  1.5472e-5,  4.4594e-5,  1.1627e-4,
  2.7687e-4,  6.0742e-4,  1.2374e-3,  2.3573e-3,  4.2271e-3,  7.1762e-3,
  1.1596e-2,  1.7919e-2,  2.6599e-2,  3.8076e-2,  5.2750e-2,  7.0960e-2,
  9.2961e-2,  1.1892e-1,  1.4893e-1,  1.8298e-1,  2.2103e-1,  2.6299e-1,
  3.0875e-1,  3.5825e-1,  4.1143e-1,  4.6833e-1,  5.2906e-1,  5.9387e-1,
  6.6315e-1,  7.3747e-1,  8.1763e-1,  9.0468e-1,   1.0000e0,   1.0000e0,
   1.0000e0,   1.0000e0,   1.0000e0,   1.0000e0,   1.0000e0,   1.0000e0,
   1.0000e0,   1.0000e0,   1.0000e0,   1.0000e0,   1.0000e0,   1.0000e0,
   1.0000e0,   1.0000e0,   1.0000e0,   1.0000e0,   1.0000e0,   1.0000e0,
   1.0000e0,   1.0000e0,   1.0000e0,   1.0000e0,   1.0000e0,   1.0000e0,
   1.0000e0,   1.0000e0,   1.0000e0,   1.0000e0,   1.0000e0,   1.0000e0,
   1.0000e0,   1.0000e0,   1.0000e0,   1.0000e0,   1.0000e0,   1.0000e0,
   1.0000e0,   1.0000e0,   1.0000e0,   1.0000e0,   1.0000e0,   1.0000e0,
   1.0000e0,   1.0000e0,   1.0000e0,   1.0000e0,   1.0000e0,   1.0000e0,
   1.0000e0,   1.0000e0,        NaN,        NaN,        NaN,        NaN,
        NaN,        NaN,        NaN,        NaN,        NaN,        NaN,
        NaN,        NaN,        NaN,        NaN]

NaNs appear on multiplication of zeros in negative_mask by infinity returned by exp.

This was mitigated by limiting an exponent value.

MilkFather · 2024-07-11T09:44:40Z

Good to check whether there would be Inf in your forward pass. I checked the forward implementation and see no safeguard against inf.

candle/candle-core/src/cpu_backend/mod.rs

Lines 1532 to 1538 in a226a97

 fn elu<T: num_traits::Float>(v: T, alpha: T) -> T { 

 if v.is_sign_positive() { 

 v 

 } else { 

 (v.exp() - T::one()) * alpha 

 } 

 }

Furthermore, since $$\frac{\mathrm{d}}{\mathrm{d}x}\alpha\cdot\left(e^x-1\right) = \alpha\cdot e^x = \left[\alpha\cdot\left(e^x-1\right)\right] + \alpha,$$ we might directly utilize the results of the forward pass and add $\alpha$ to it. This reduces the operation count a little.

agerasev · 2024-07-11T10:21:40Z

we might directly utilize the results of the forward pass and add α to it. This reduces the operation count a little.

Nice idea, thanks!

Good to check whether there would be Inf in your forward pass. I checked the forward implementation and see no safeguard against inf.

This function returns inf only when v is +inf or alpha is inf (and so for NaN), but this seems to be the expected behavior. What other case should be checked?

Upd: Also this function returns NaN when alpha is inf and v is -0.0. But using alpha = inf doesn't make sense.

MilkFather · 2024-07-11T12:55:47Z

Looks fine to me. My test confirms that in the current pull request, the NaN issue is solved and the performance is slightly better.

Good to add some test and benchmark-related code.

LaurentMazare · 2024-07-16T12:41:21Z

Thanks!

Fix Elu gradient NaN on large input

f352248

Reuse previously computed exp in Elu

8a3e657

LaurentMazare approved these changes Jul 16, 2024

View reviewed changes

LaurentMazare merged commit 6a4741b into huggingface:main Jul 16, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Elu gradient NaN on large input #2328

Fix Elu gradient NaN on large input #2328

agerasev commented Jul 11, 2024 •

edited

Loading

MilkFather commented Jul 11, 2024

agerasev commented Jul 11, 2024 •

edited

Loading

MilkFather commented Jul 11, 2024

LaurentMazare commented Jul 16, 2024

Fix Elu gradient NaN on large input #2328

Fix Elu gradient NaN on large input #2328

Conversation

agerasev commented Jul 11, 2024 • edited Loading

MilkFather commented Jul 11, 2024

agerasev commented Jul 11, 2024 • edited Loading

MilkFather commented Jul 11, 2024

LaurentMazare commented Jul 16, 2024

agerasev commented Jul 11, 2024 •

edited

Loading

agerasev commented Jul 11, 2024 •

edited

Loading