-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with the dot-tilde (i.e. .~
) syntax
#761
Comments
I would also be in favour of removing the |
Might poll our userbase on Slack, see if anyone is desperate to keep |
There are "semantic" differences. x ~ filldist(Normal(), 2) treats x .~ Normal() treats |
Here is the idea I mentioned briefly during the meeting. We lower x .~ Normal() to x = tilde_assume!!(__context__, IIDDistribution(Normal()), __varinfo__) Then, we overload We could make this work for other distribution sugars by overloading the EDIT: I am happy to remove the |
Left this comment in one of the other I tried to make a simple benchmark to see the effect of julia> module MWE
using Turing
import ReverseDiff
adbackend = AutoReverseDiff()
alg = NUTS(; adtype=adbackend)
n_samples = 20
y = randn(1000)
@model function test_tilde(y, ::Type{T}=Float64) where {T}
num_values = length(y)
x1 = Vector{T}(undef, num_values)
x1 .~ Normal()
x2 = Vector{T}(undef, num_values)
x2 .~ Gamma()
m = sum(x1 .* x2)
y .~ Normal(m)
end
m_tilde = test_tilde(y)
@time sample(m_tilde, alg, n_samples)
@model function test_loop(y, ::Type{T}=Float64) where {T}
num_values = length(y)
x1 = Vector{T}(undef, num_values)
for i in eachindex(x1)
x1[i] ~ Normal()
end
x2 = Vector{T}(undef, num_values)
for i in eachindex(x2)
x2[i] ~ Gamma()
end
m = sum(x1 .* x2)
for i in eachindex(y)
y[i] ~ Normal(m)
end
end
m_loop = test_loop(y)
@time sample(m_loop, alg, n_samples)
end
┌ Info: Found initial step size
└ ϵ = 0.000390625
Sampling 100%|█████████████████████████████████████████████████████████████████████████████████| Time: 0:03:03
186.599898 seconds (3.79 G allocations: 166.483 GiB, 9.57% gc time, 1.02% compilation time)
┌ Info: Found initial step size
└ ϵ = 0.000390625
Sampling 100%|█████████████████████████████████████████████████████████████████████████████████| Time: 0:03:33
216.282827 seconds (4.17 G allocations: 181.365 GiB, 8.79% gc time, 0.91% compilation time)
Main.MWE Is there some case where it makes a more dramatic difference? If so, can someone write an example? Unless it makes a very substantial difference in some common use case, I'm in favour of keeping the syntax for backwards compatibility, but turning every |
FWIW, I find with Mooncake that the loopy version is faster: julia> module MWE
using Turing
import Mooncake
adbackend = AutoMooncake(config=nothing)
alg = NUTS(; adtype=adbackend)
n_samples = 20
y = randn(1000)
@model function test_tilde(y, ::Type{T}=Float64) where {T}
num_values = length(y)
x1 = Vector{T}(undef, num_values)
x1 .~ Normal()
x2 = Vector{T}(undef, num_values)
x2 .~ Gamma()
m = sum(x1 .* x2)
y .~ Normal(m)
end
m_tilde = test_tilde(y)
@time sample(m_tilde, alg, n_samples)
@model function test_loop(y, ::Type{T}=Float64) where {T}
num_values = length(y)
x1 = Vector{T}(undef, num_values)
for i in eachindex(x1)
x1[i] ~ Normal()
end
x2 = Vector{T}(undef, num_values)
for i in eachindex(x2)
x2[i] ~ Gamma()
end
m = sum(x1 .* x2)
for i in eachindex(y)
y[i] ~ Normal(m)
end
end
m_loop = test_loop(y)
@time sample(m_loop, alg, n_samples)
end
WARNING: replacing module MWE.
┌ Info: Found initial step size
└ ϵ = 0.000390625
Sampling 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:01:53
116.111583 seconds (1.60 G allocations: 95.795 GiB, 6.01% gc time, 4.90% compilation time)
┌ Info: Found initial step size
└ ϵ = 0.000390625
Sampling 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:59
61.705976 seconds (1.11 G allocations: 73.435 GiB, 8.65% gc time, 7.44% compilation time) (on the second run -- first run Mooncake has to compile some of its internals) |
Could we lower @model function f_tilde()
x = Array{Float64,2}(undef, 2, 4)
d = MvNormal([-10.0, 10.0], I)
# d = Normal()
x .~ d
end to something like @model function f_loop()
x = Array{Float64,2}(undef, 2, 4)
d = MvNormal([-10.0, 10.0], I)
# d = Normal()
lhs_axes = axes(x)
num_dims_d = length(size(d))
axes_to_loop = lhs_axes[(begin + num_dims_d):end]
colons = fill(:, num_dims_d)
for idx in Iterators.product(axes_to_loop...)
x[colons..., idx...] ~ d
end
end ? |
That seems sensible - I'm not usually a TDD proponent but I think this is indeed a situation where it'd be helpful to enumerate a set of test cases first so that we know what we're dealing with |
Is there a reason why comparisons aren't done via TuringBenchmarking.jl here instead of timing smaple calls? The former should be a much more reliable, no? |
No. Using the same two models I had above: julia> @model function test_dot_tilde(y, ::Type{T}=Vector{Float64}) where {T}
num_values = length(y)
x1 = T(undef, num_values)
x1 .~ Normal()
x2 = T(undef, num_values)
x2 .~ Gamma()
m = sum(x1 .* x2)
y .~ Normal(m)
end
test_dot_tilde (generic function with 4 methods)
julia> @model function test_loop(y, ::Type{T}=Vector{Float64}) where {T}
num_values = length(y)
x1 = T(undef, num_values)
for i in eachindex(x1)
x1[i] ~ Normal()
end
x2 = T(undef, num_values)
for i in eachindex(x2)
x2[i] ~ Gamma()
end
m = sum(x1 .* x2)
for i in eachindex(y)
y[i] ~ Normal(m)
end
end
test_loop (generic function with 4 methods)
julia> y = randn(1000);
julia> m_dot_tilde = test_dot_tilde(y);
julia> m_loop = test_loop(y);
julia> benchmark_model(m_dot_tilde)
┌ Warning: Gradient computation (without linking) failed for AutoZygote(): ErrorException("Mutating arrays is not supported -- called copyto!(Vector{Float64}, ...)\nThis error occurs when you ask Zygote to differentiate operations that change\nthe elements of arrays in place (e.g. setting values with x .= ...)\n\nPossible fixes:\n- avoid mutating operations (preferred)\n- or read the documentation and solutions for this error\n https://fluxml.ai/Zygote.jl/latest/limitations\n")
└ @ TuringBenchmarking ~/.julia/packages/TuringBenchmarking/VqHuu/src/TuringBenchmarking.jl:213
┌ Warning: Gradient computation (with linking) failed for AutoZygote(): ErrorException("Mutating arrays is not supported -- called copyto!(Vector{Float64}, ...)\nThis error occurs when you ask Zygote to differentiate operations that change\nthe elements of arrays in place (e.g. setting values with x .= ...)\n\nPossible fixes:\n- avoid mutating operations (preferred)\n- or read the documentation and solutions for this error\n https://fluxml.ai/Zygote.jl/latest/limitations\n")
└ @ TuringBenchmarking ~/.julia/packages/TuringBenchmarking/VqHuu/src/TuringBenchmarking.jl:243
2-element BenchmarkTools.BenchmarkGroup:
tags: []
"evaluation" => 2-element BenchmarkTools.BenchmarkGroup:
tags: []
"linked" => Trial(511.834 μs)
"standard" => Trial(322.708 μs)
"gradient" => 4-element BenchmarkTools.BenchmarkGroup:
tags: []
"AutoReverseDiff(compile=true)" => 2-element BenchmarkTools.BenchmarkGroup:
tags: ["ReverseDiff [compiled]"]
"linked" => Trial(2.141 ms)
"standard" => Trial(1.687 ms)
"AutoForwardDiff(chunksize=0)" => 2-element BenchmarkTools.BenchmarkGroup:
tags: ["ForwardDiff"]
"linked" => Trial(135.956 ms)
"standard" => Trial(96.100 ms)
"AutoReverseDiff()" => 2-element BenchmarkTools.BenchmarkGroup:
tags: ["ReverseDiff"]
"linked" => Trial(5.988 ms)
"standard" => Trial(4.771 ms)
"AutoZygote()" => 2-element BenchmarkTools.BenchmarkGroup:
tags: ["Zygote"]
"linked" => 0-element BenchmarkTools.BenchmarkGroup:
tags: []
"standard" => 0-element BenchmarkTools.BenchmarkGroup:
tags: []
julia> benchmark_model(m_loop)
┌ Warning: Gradient computation (without linking) failed for AutoZygote(): ErrorException("Mutating arrays is not supported -- called setindex!(Vector{Float64}, ...)\nThis error occurs when you ask Zygote to differentiate operations that change\nthe elements of arrays in place (e.g. setting values with x .= ...)\n\nPossible fixes:\n- avoid mutating operations (preferred)\n- or read the documentation and solutions for this error\n https://fluxml.ai/Zygote.jl/latest/limitations\n")
└ @ TuringBenchmarking ~/.julia/packages/TuringBenchmarking/VqHuu/src/TuringBenchmarking.jl:213
┌ Warning: Gradient computation (with linking) failed for AutoZygote(): ErrorException("Mutating arrays is not supported -- called setindex!(Vector{Float64}, ...)\nThis error occurs when you ask Zygote to differentiate operations that change\nthe elements of arrays in place (e.g. setting values with x .= ...)\n\nPossible fixes:\n- avoid mutating operations (preferred)\n- or read the documentation and solutions for this error\n https://fluxml.ai/Zygote.jl/latest/limitations\n")
└ @ TuringBenchmarking ~/.julia/packages/TuringBenchmarking/VqHuu/src/TuringBenchmarking.jl:243
2-element BenchmarkTools.BenchmarkGroup:
tags: []
"evaluation" => 2-element BenchmarkTools.BenchmarkGroup:
tags: []
"linked" => Trial(406.250 μs)
"standard" => Trial(239.250 μs)
"gradient" => 4-element BenchmarkTools.BenchmarkGroup:
tags: []
"AutoReverseDiff(compile=true)" => 2-element BenchmarkTools.BenchmarkGroup:
tags: ["ReverseDiff [compiled]"]
"linked" => Trial(2.400 ms)
"standard" => Trial(2.280 ms)
"AutoForwardDiff(chunksize=0)" => 2-element BenchmarkTools.BenchmarkGroup:
tags: ["ForwardDiff"]
"linked" => Trial(117.081 ms)
"standard" => Trial(79.289 ms)
"AutoReverseDiff()" => 2-element BenchmarkTools.BenchmarkGroup:
tags: ["ReverseDiff"]
"linked" => Trial(6.537 ms)
"standard" => Trial(5.800 ms)
"AutoZygote()" => 2-element BenchmarkTools.BenchmarkGroup:
tags: ["Zygote"]
"linked" => 0-element BenchmarkTools.BenchmarkGroup:
tags: []
"standard" => 0-element BenchmarkTools.BenchmarkGroup:
tags: [] |
I'm trying to understand all the valid uses of In
isliteral(e) = false
isliteral(::Number) = true
isliteral(e::Expr) = !isempty(e.args) && all(isliteral, e.args) The RHS needs to be either
Questions:1The main question is have I got this right, am I missing something? 2Do we want to support @model function f1()
x = Array{Float64, 3}(undef, 3, 2, 2)
x .~ fill(MvNormal(randn(2), I), 3)
end 3Is there a reason why we don't support the below, or is it just a bug? @model function f1()
x = Array{Float64, 3}(undef, 3, 2, 4)
m = randn(3)
x .~ MvNormal(m, I)
end This crashes with some 4Do we want to allow matrix variate distributions on the RHS? I don't see why not since we allow multivariate, but I haven't been able to make it work with Turing.jl as it now stands. Is that a bug? E.g. why doesn't this work: @model function f1()
x = Array{Float64, 3}(undef, 3, 3, 2)
x .~ LKJ(3, 1.0)
end Answers:Anyone should feel free to opine on what we should do, but if @torfjelde or @yebai could comment on whether some of the above things that don't work are intentional design decisions or just unimplemented/bugs, that would help. |
Probably not -- we can support it in principle, but it is better motivated by some concrete examples.
It looks like a bug to me. The
Probably not -- we can support it in principle, but it is better motivated by some concrete examples. On a general note, I think a high-order function that constructs an array of distributions (like |
👍 |
FYI, I started on an implementation, if anyone wants to work on this please talk to me first so we don't duplicate work. I'm happy to get rid of supporting I've posted on the Julia slack to see if any users would find the changes very disruptive. I could go either way on allowing only univariate distributions and allowing multivariate and matrix variate too. I generally think complicated broadcasting, where the dimensions don't match and neither side is a scalar, is confusing, but it is also something is commonly done in Julia and not hard for us to implement. |
On Slack, got a few questions about what the new syntax would be to replace calls to I made a PR implementing |
I'd say no 🤷 It's just complicated and doesn't add much
logpdf.(right, left) should be valid. This means that a case such as x .~ MvNormal(m, I) results in logpdf.(MvNormal(m, I), x) which is not defined (you're But even if you defined However, we have this (IMO very annoying) support for the special case of
We shouldn't allow multivariate 🙃 |
Thanks for the comments @torfjelde. Note that the current plan, implemented in the above linked PR, is to break this convention:
and become much more restrictive. For instance, |
Gotcha; seems sensible 👍 |
.~
andrand
#405.~
meaning #435.~
seems to give incorrect answers #28dot_tilde
#700.~
#722 (comment)Below is copied from #710 (comment)
The text was updated successfully, but these errors were encountered: