Differentiating mvnormal #1554

matbesancon · 2022-05-23T01:19:52Z

No description provided.

…cr-mvnormal

codecov-commenter · 2022-05-23T01:36:19Z

Codecov Report

Base: 85.95% // Head: 86.02% // Increases project coverage by +0.06% 🎉

Coverage data is based on head (8ebf419) compared to base (a31ebc4).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1554      +/-   ##
==========================================
+ Coverage   85.95%   86.02%   +0.06%     
==========================================
  Files         129      129              
  Lines        8105     8144      +39     
==========================================
+ Hits         6967     7006      +39     
  Misses       1138     1138

Impacted Files	Coverage Δ
src/multivariate/mvnormal.jl	`80.93% <100.00%> (+3.41%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

devmotion

Are the rules for sqmahal etc. needed? Would it maybe be sufficient to just make sure that PDMats works and is optimized for ChainRules?

devmotion · 2022-05-23T12:06:47Z

src/multivariate/mvnormal.jl

+    c0, Δc0 = ChainRulesCore.frule((ChainRulesCore.NoTangent(), Δd), mvnormal_c0, d)
+    sq, Δsq = ChainRulesCore.frule((ChainRulesCore.NoTangent(), Δd, Δx), sqmahal, d, x)


I think we should use rather frule_via_ad here for calling back into the AD system, in case it wants to define its own, possibly improved derivatives.

devmotion · 2022-05-23T12:07:10Z

src/multivariate/mvnormal.jl

+    return c0 - sq/2, ChainRulesCore.@thunk(begin
+        Δc0 = ChainRulesCore.unthunk(Δc0)
+        Δsq = ChainRulesCore.unthunk(Δsq)
+        Δc0 - Δsq/2
+    end)


Derivatives should not be thunked if there's only one of them.

devmotion · 2022-05-23T12:07:29Z

src/multivariate/mvnormal.jl

+    c0, c0_pullback = ChainRulesCore.rrule(mvnormal_c0, d)
+    sq, sq_pullback = ChainRulesCore.rrule(sqmahal, d, x)


Same here, this should probably be rrule_via_ad.

src/multivariate/mvnormal.jl

devmotion · 2022-05-23T12:11:31Z

src/multivariate/mvnormal.jl

+    Δy = ChainRulesCore.@thunk(begin
+        Δd = ChainRulesCore.unthunk(Δd)
+        -dot(Δd.Σ, invcov(d)) / 2
+    end)


Same here, no thunks.

Suggested change

Δy = ChainRulesCore.@thunk(begin

Δd = ChainRulesCore.unthunk(Δd)

-dot(Δd.Σ, invcov(d)) / 2

end)

Δy = -dot(Δd.Σ, invcov(d)) / 2

devmotion · 2022-05-23T12:11:55Z

src/multivariate/mvnormal.jl

+        ∂d = ChainRulesCore.@thunk(begin
+            dy = ChainRulesCore.unthunk(dy)
+            ∂Σ = -dy/2 * invcov(d)
+            ChainRulesCore.Tangent{typeof(d)}(μ = ChainRulesCore.ZeroTangent(), Σ = ∂Σ)
+        end)


No thunk 🙂

src/multivariate/mvnormal.jl

devmotion · 2022-05-23T12:14:43Z

test/mvnormal.jl

+            (y, Δy) = @inferred ChainRulesCore.frule((ChainRulesCore.NoTangent(), t), Distributions.mvnormal_c0, d)
+            y_r, c0_pullback = @inferred ChainRulesCore.rrule(Distributions.mvnormal_c0, d)
+            @test y_r ≈ y
+            y2 = Distributions.mvnormal_c0(MvNormal(d.μ, d.Σ + t.Σ))
+            @test unthunk(Δy) ≈ y2 - y atol= n * 1e-4
+            y3 = Distributions.mvnormal_c0(MvNormal(d.μ, d.Σ - t.Σ))
+            @test unthunk(Δy) ≈ y - y3 atol = n * 1e-4
+            (_, ∇c0) = c0_pullback(1.0)
+            ∇c0 = ChainRulesCore.unthunk(∇c0)
+            @test dot(∇c0.Σ, t.Σ) ≈ y2 - y atol = n * 1e-4
+            @test dot(∇c0.Σ, t.Σ) ≈ y - y3 atol = n * 1e-4
+            # sqmahal
+            x = randn(n)
+            Δx = 0.0001 * randn(n)
+            (y, Δy) = @inferred ChainRulesCore.frule((ChainRulesCore.NoTangent(), t, Δx), sqmahal, d, x)
+            (yr, sqmahal_pullback) = @inferred ChainRulesCore.rrule(sqmahal, d, x)
+            (_, ∇s_d, ∇s_x) = @inferred sqmahal_pullback(1.0)
+            ∇s_d = ChainRulesCore.unthunk(∇s_d)
+            ∇s_x = ChainRulesCore.unthunk(∇s_x)
+            @test yr ≈ y
+            y2 = Distributions.sqmahal(MvNormal(d.μ + t.μ, d.Σ + t.Σ), x + Δx)
+            y3 = Distributions.sqmahal(MvNormal(d.μ - t.μ, d.Σ - t.Σ), x - Δx)
+            @test unthunk(Δy) ≈ y2 - y atol = n * 1e-4
+            @test unthunk(Δy) ≈ y - y3 atol = n * 1e-4
+            @test dot(∇s_d.Σ, t.Σ) + dot(∇s_d.μ, t.μ) + dot(∇s_x, Δx) ≈ y2 - y atol = n * 1e-4
+            @test dot(∇s_d.Σ, t.Σ) + dot(∇s_d.μ, t.μ) + dot(∇s_x, Δx) ≈ y - y3 atol = n * 1e-4
+            # _logpdf
+            (y, Δy) = @inferred ChainRulesCore.frule((ChainRulesCore.NoTangent(), t, Δx), Distributions._logpdf, d, x)
+            (yr, logpdf_MvNormal_pullback) = @inferred ChainRulesCore.rrule(Distributions._logpdf, d, x)
+            @test y ≈ yr
+            # inference broken
+            # (_, ∇s_d, ∇s_x) = @inferred logpdf_MvNormal_pullback(1.0)
+            (_, ∇s_d, ∇s_x) = logpdf_MvNormal_pullback(1.0)
+
+            y2 = Distributions._logpdf(MvNormal(d.μ + t.μ, d.Σ + t.Σ), x + Δx)
+            y3 = Distributions._logpdf(MvNormal(d.μ - t.μ, d.Σ - t.Σ), x - Δx)
+            @test unthunk(Δy) ≈ y - y3 atol = n * 1e-4
+            @test unthunk(Δy) ≈ y2 - y atol = n * 1e-4
+            @test dot(∇s_d.Σ, t.Σ) + dot(∇s_d.μ, t.μ) + dot(∇s_x, Δx) ≈ y2 - y atol = n * 1e-4
+            @test dot(∇s_d.Σ, t.Σ) + dot(∇s_d.μ, t.μ) + dot(∇s_x, Δx) ≈ y - y3 atol = n * 1e-4


This seems very complicated. Ideally we should just use test_rrule and test_frule.

this is nontrivial in the case of the perturbation of the covariance matrix

Yeah, sometimes one has to add custom tests. But the standard should be test_frule and test_rrule since they check multiple other parts of the CR interface as well, in addition to numerical accuracy and type inference. And even for cases that are problematic for finite differencing (e.g. due to singularities and domain constraints) it is sometimes possible to use test_frule and test_rrule by specifying custom finite differencing methods, such as e.g. in #1555.

Co-authored-by: David Widmann <[email protected]>

devmotion · 2022-05-24T05:29:37Z

src/multivariate/mvnormal.jl

+function ChainRulesCore.frule((_, Δd, Δx)::Tuple{Any,Any,Any}, ::typeof(_logpdf), d::AbstractMvNormal, x::AbstractVector)
+    c0, Δc0 = ChainRulesCore.frule((ChainRulesCore.NoTangent(), Δd), mvnormal_c0, d)
+    sq, Δsq = ChainRulesCore.frule((ChainRulesCore.NoTangent(), Δd, Δx), sqmahal, d, x)
+    Δc0 = ChainRulesCore.unthunk(Δc0)
+    Δsq = ChainRulesCore.unthunk(Δsq)
+    return c0 - sq/2, Δc0 - Δsq/2
+end


I don't think it's useful to add this definition. This is exactly what AD systems do anyway.

Suggested change

function ChainRulesCore.frule((_, Δd, Δx)::Tuple{Any,Any,Any}, ::typeof(_logpdf), d::AbstractMvNormal, x::AbstractVector)

c0, Δc0 = ChainRulesCore.frule((ChainRulesCore.NoTangent(), Δd), mvnormal_c0, d)

sq, Δsq = ChainRulesCore.frule((ChainRulesCore.NoTangent(), Δd, Δx), sqmahal, d, x)

Δc0 = ChainRulesCore.unthunk(Δc0)

Δsq = ChainRulesCore.unthunk(Δsq)

return c0 - sq/2, Δc0 - Δsq/2

end

It doesn't cost us much to add this definition and lets us have derivatives built-in, we can also re-add specialized methods for some MvNormal if necessary

Actually, it can be quite problematic to define derivatives that overrule the AD system if they are not needed and e.g. to generic (as possibly the case here). I've ran into multiple issues of this kind with ChainRules, which then requires e.g. packages that otherwise would just work (without even knowing about ChainRules) to use ChainRules.@opt_out or define their own rules. The rule here will catch every AbstractMvNormal and AbstractVector which can be problematic e.g. similar to TuringLang/DistributionsAD.jl#180.

So I strongly recommend not adding rules that are not needed.

devmotion · 2022-05-24T05:30:27Z

src/multivariate/mvnormal.jl

+function ChainRulesCore.rrule(::typeof(_logpdf), d::MvNormal, x::AbstractVector)
+    c0, c0_pullback = ChainRulesCore.rrule(mvnormal_c0, d)
+    sq, sq_pullback = ChainRulesCore.rrule(sqmahal, d, x)
+    function logpdf_MvNormal_pullback(dy)
+        dy = ChainRulesCore.unthunk(dy)
+        (_, ∂d_c0) = c0_pullback(dy)
+        ∂d_c0 = ChainRulesCore.unthunk(∂d_c0)
+        (_, ∂d_sq, ∂x_sq) = sq_pullback(dy)
+        ∂d_sq = ChainRulesCore.unthunk(∂d_sq)
+        ∂x_sq = ChainRulesCore.unthunk(∂x_sq)
+        backing = NamedTuple{(:μ, :Σ), Tuple{typeof(∂d_sq.μ), typeof(∂d_sq.Σ)}}((
+            (∂d_c0.μ - 0.5 * ∂d_sq.μ),
+            (∂d_c0.Σ - 0.5 * ∂d_sq.Σ),
+        ))
+        ∂d = ChainRulesCore.Tangent{typeof(d), typeof(backing)}(backing)
+        return ChainRulesCore.NoTangent(), ∂d, ∂x_sq / (-2)
+    end
+    return c0 - sq / 2, logpdf_MvNormal_pullback
+end


Same here, it seems this is exactly what AD does if no rule is defined.

Suggested change

function ChainRulesCore.rrule(::typeof(_logpdf), d::MvNormal, x::AbstractVector)

c0, c0_pullback = ChainRulesCore.rrule(mvnormal_c0, d)

sq, sq_pullback = ChainRulesCore.rrule(sqmahal, d, x)

function logpdf_MvNormal_pullback(dy)

dy = ChainRulesCore.unthunk(dy)

(_, ∂d_c0) = c0_pullback(dy)

∂d_c0 = ChainRulesCore.unthunk(∂d_c0)

(_, ∂d_sq, ∂x_sq) = sq_pullback(dy)

∂d_sq = ChainRulesCore.unthunk(∂d_sq)

∂x_sq = ChainRulesCore.unthunk(∂x_sq)

backing = NamedTuple{(:μ, :Σ), Tuple{typeof(∂d_sq.μ), typeof(∂d_sq.Σ)}}((

(∂d_c0.μ - 0.5 * ∂d_sq.μ),

(∂d_c0.Σ - 0.5 * ∂d_sq.Σ),

))

∂d = ChainRulesCore.Tangent{typeof(d), typeof(backing)}(backing)

return ChainRulesCore.NoTangent(), ∂d, ∂x_sq / (-2)

end

return c0 - sq / 2, logpdf_MvNormal_pullback

end

devmotion · 2022-05-24T05:31:38Z

src/multivariate/mvnormal.jl

@@ -372,7 +372,7 @@ struct MvNormalStats <: SufficientStats
    tw::Float64         # total sample weight
 end

-function suffstats(D::Type{MvNormal}, x::AbstractMatrix{Float64})
+function suffstats(::Type{MvNormal}, x::AbstractMatrix{Float64})


Maybe move unrelated changes to a separate PR?

there were all relatively minor things (unused variables)

matbesancon · 2022-05-24T14:54:27Z

│   %34 = Base.getproperty(∂d_sq::Tangent{FullNormal}, :μ)::Any
│   %35 = (0.5 * %34)::Any
│   %36 = (%33 - %35)::Any
│   %37 = Base.getproperty(∂d_c0, :Σ)::Matrix{Float64}
│   %38 = Base.getproperty(∂d_sq::Tangent{FullNormal}, :Σ)::Any

One of the instability issues seems to be that the type of the members of Tangent{FullNormal} don't seem to be inferrable

matbesancon · 2022-05-24T17:17:58Z

Alright I could fix inference but it took some assumptions which might be a bit restrictive

src/multivariate/mvnormal.jl

matbesancon · 2022-07-28T20:20:59Z

@devmotion I think everything discussed has been adapted/validated

devmotion · 2022-07-28T21:57:31Z

src/multivariate/mvnormal.jl

@@ -372,7 +372,7 @@ struct MvNormalStats <: SufficientStats
    tw::Float64         # total sample weight
 end

-function suffstats(D::Type{MvNormal}, x::AbstractMatrix{Float64})
+function suffstats(::Type{MvNormal}, x::AbstractMatrix{Float64})


Can you revert non-CR changes? It seems not only unused names were removed but also some types and dispatches changed, creating slight inconsistencies and related to the open issue about type parameters in fit. IMO it would br much cleaner to avoid these additonal changes in this PR here and instead fix the dispatches (and names) in a separate PR in a consistent way.

src/multivariate/mvnormal.jl

devmotion · 2022-07-29T06:31:25Z

src/multivariate/mvnormal.jl

+    (_, Δd, Δx) = dargs
+    Δd = ChainRulesCore.unthunk(Δd)
+    Δx = ChainRulesCore.unthunk(Δx)
+    Σinv = invcov(d)


Could we avoid computing the inverse?

src/multivariate/mvnormal.jl

Co-authored-by: David Widmann <[email protected]>

…into cr-mvnormal

Co-authored-by: David Widmann <[email protected]>

…cr-mvnormal

…into cr-mvnormal

matbesancon · 2022-07-31T07:08:57Z

not computing the inverse at all would be a bit of a hassle here since it's reused several times in the whole function. I removed the materialization of the inverse as a matrix here to lighten the computations, we are using the inverse of the Cholesky decomposition directly

matbesancon · 2022-07-31T08:55:09Z

That change here: b225298
was somehow incorrect

Co-authored-by: David Widmann <[email protected]>

matbesancon · 2022-07-31T19:09:47Z

Reverted the changes, the inverse covariance matrix is needed here, there is no way around it since the derivative is just the invcov scaled

…cr-mvnormal

matbesancon · 2022-08-20T16:42:13Z

Are the rules for sqmahal etc. needed? Would it maybe be sufficient to just make sure that PDMats works and is optimized for ChainRules?

I would say not since the operations done on top are non-trivial and could be costly

…cr-mvnormal

matbesancon · 2022-10-16T21:38:56Z

ping @devmotion for another round of review

matbesancon added 4 commits April 18, 2022 16:51

test and code for frule

789ad0a

added tests

24861ca

Merge branch 'master' of github.com:JuliaStats/Distributions.jl into …

6e6c029

…cr-mvnormal

diff on MvNormal only for now

2bcc217

devmotion reviewed May 23, 2022

View reviewed changes

matbesancon and others added 3 commits May 23, 2022 09:47

unthunk when only one element

1e9571b

Update src/multivariate/mvnormal.jl

ac0995e

Co-authored-by: David Widmann <[email protected]>

Update src/multivariate/mvnormal.jl

522c13e

Co-authored-by: David Widmann <[email protected]>

devmotion reviewed May 24, 2022

View reviewed changes

matbesancon added 2 commits May 24, 2022 10:35

no backing

6529a4a

conflict

4e4d982

fix inference

d398140

unthunk common computation

661de16

devmotion reviewed May 26, 2022

View reviewed changes

src/multivariate/mvnormal.jl Outdated Show resolved Hide resolved

matbesancon added 2 commits May 26, 2022 18:37

unthunk common computation

3c5007c

remove rules for logpdf

1f18e67

matbesancon marked this pull request as draft May 28, 2022 14:53

matbesancon requested a review from devmotion July 28, 2022 20:18

matbesancon marked this pull request as ready for review July 28, 2022 20:18

devmotion reviewed Jul 29, 2022

View reviewed changes

matbesancon and others added 3 commits July 30, 2022 17:14

revert changes

b60147d

Update src/multivariate/mvnormal.jl

375cad8

Co-authored-by: David Widmann <[email protected]>

Update src/multivariate/mvnormal.jl

c34a3ed

Co-authored-by: David Widmann <[email protected]>

matbesancon and others added 6 commits July 30, 2022 17:20

revert changes

2d96680

Merge branch 'cr-mvnormal' of github.com:JuliaStats/Distributions.jl …

f32c223

…into cr-mvnormal

Update src/multivariate/mvnormal.jl

b225298

Co-authored-by: David Widmann <[email protected]>

Merge branch 'master' of github.com:JuliaStats/Distributions.jl into …

cf1242a

…cr-mvnormal

Merge branch 'cr-mvnormal' of github.com:JuliaStats/Distributions.jl …

e2846e8

…into cr-mvnormal

avoid materializing Matrix

6bc85da

matbesancon added 4 commits July 31, 2022 09:09

revert

e303416

fix revert

e21506a

no alloc

fd272ae

revert fdiff

8b7d451

matbesancon and others added 3 commits July 31, 2022 20:50

Update src/multivariate/mvnormal.jl

53601fe

Co-authored-by: David Widmann <[email protected]>

fix op

2c75061

fix op bis, revert to invcov

6d88e4d

Merge branch 'master' of github.com:JuliaStats/Distributions.jl into …

1f00a3b

…cr-mvnormal

Merge branch 'master' of github.com:JuliaStats/Distributions.jl into …

8ebf419

…cr-mvnormal

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Differentiating mvnormal #1554

Differentiating mvnormal #1554

matbesancon commented May 23, 2022

codecov-commenter commented May 23, 2022 •

edited

Loading

devmotion left a comment

devmotion May 23, 2022

devmotion May 23, 2022

devmotion May 23, 2022

devmotion May 23, 2022

devmotion May 23, 2022

devmotion May 23, 2022

matbesancon May 23, 2022

devmotion May 24, 2022

devmotion May 24, 2022

matbesancon May 24, 2022

devmotion May 25, 2022

devmotion May 24, 2022

devmotion May 24, 2022

matbesancon May 26, 2022

matbesancon commented May 24, 2022

matbesancon commented May 24, 2022

matbesancon commented Jul 28, 2022

devmotion Jul 28, 2022

devmotion Jul 29, 2022

matbesancon commented Jul 31, 2022

matbesancon commented Jul 31, 2022

matbesancon commented Jul 31, 2022

matbesancon commented Aug 20, 2022

matbesancon commented Oct 16, 2022

		c0, Δc0 = ChainRulesCore.frule((ChainRulesCore.NoTangent(), Δd), mvnormal_c0, d)
		sq, Δsq = ChainRulesCore.frule((ChainRulesCore.NoTangent(), Δd, Δx), sqmahal, d, x)

		c0, c0_pullback = ChainRulesCore.rrule(mvnormal_c0, d)
		sq, sq_pullback = ChainRulesCore.rrule(sqmahal, d, x)

Differentiating mvnormal #1554

Are you sure you want to change the base?

Differentiating mvnormal #1554

Conversation

matbesancon commented May 23, 2022

codecov-commenter commented May 23, 2022 • edited Loading

Codecov Report

devmotion left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matbesancon commented May 24, 2022

matbesancon commented May 24, 2022

matbesancon commented Jul 28, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matbesancon commented Jul 31, 2022

matbesancon commented Jul 31, 2022

matbesancon commented Jul 31, 2022

matbesancon commented Aug 20, 2022

matbesancon commented Oct 16, 2022

codecov-commenter commented May 23, 2022 •

edited

Loading