Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with the dot-tilde (i.e. .~) syntax #761

Open
yebai opened this issue Dec 19, 2024 · 19 comments
Open

Issues with the dot-tilde (i.e. .~) syntax #761

yebai opened this issue Dec 19, 2024 · 19 comments
Assignees

Comments

@yebai
Copy link
Member

yebai commented Dec 19, 2024

Below is copied from #710 (comment)

I lean towards removing the dot syntax. The benefits of using dot syntax against a loop or fillarray/product_distribution are relatively marginal compared to its complexity and lack of robustness.

@yebai
Copy link
Member Author

yebai commented Dec 19, 2024

@willtebbutt
Copy link
Member

I would also be in favour of removing the .~ syntax -- could you remind me what the benefits of it are? I vaguely remember there being a couple of things that it can be used to do, that someone mentioned being trickier with fillarray / product_distribution.

@mhauru
Copy link
Member

mhauru commented Jan 6, 2025

Might poll our userbase on Slack, see if anyone is desperate to keep .~. I would be very glad if we could get rid of it, but don't want to break user code lightly.

@torfjelde
Copy link
Member

I would also be in favour of removing the .~ syntax -- could you remind me what the benefits of it are? I vaguely remember there being a couple of things that it can be used to do, that someone mentioned being trickier with fillarray / product_distribution.

There are "semantic" differences.

x ~ filldist(Normal(), 2)

treats x as a multivariate random variable of length 2, while

x .~ Normal()

treats x as a vector of 2 univariate random variables. This is effectively the difference between a multivariate random variable vs. a vector of IID variables.

@yebai
Copy link
Member Author

yebai commented Jan 6, 2025

Here is the idea I mentioned briefly during the meeting. We lower

x .~ Normal()

to

x = tilde_assume!!(__context__, IIDDistribution(Normal()), __varinfo__)

Then, we overload tilde_assume!! for IIDDistribution to store each RV x[i] separately in __varinfo__. This allows users to individually apply model operations like condition/fix on x[i].

We could make this work for other distribution sugars by overloading the tilde_assume!! (also tilde_observe!!) interfaces for new types like IIDDistribution.

EDIT: I am happy to remove the .~ feature if we can't find a good, robust workaround.

@mhauru
Copy link
Member

mhauru commented Jan 30, 2025

Left this comment in one of the other .~ issues a couple of days ago, but should have put it here instead:

I tried to make a simple benchmark to see the effect of .~ vs a loop. The results came out quite underwhelming:

julia> module MWE

       using Turing
       import ReverseDiff

       adbackend = AutoReverseDiff()
       alg = NUTS(; adtype=adbackend)
       n_samples = 20
       y = randn(1000)

       @model function test_tilde(y, ::Type{T}=Float64) where {T}
           num_values = length(y)
           x1 = Vector{T}(undef, num_values)
           x1 .~ Normal()
           x2 = Vector{T}(undef, num_values)
           x2 .~ Gamma()
           m = sum(x1 .* x2)
           y .~ Normal(m)
       end

       m_tilde = test_tilde(y)
       @time sample(m_tilde, alg, n_samples)

       @model function test_loop(y, ::Type{T}=Float64) where {T}
           num_values = length(y)
           x1 = Vector{T}(undef, num_values)
           for i in eachindex(x1)
               x1[i] ~ Normal()
           end
           x2 = Vector{T}(undef, num_values)
           for i in eachindex(x2)
               x2[i] ~ Gamma()
           end
           m = sum(x1 .* x2)
           for i in eachindex(y)
               y[i] ~ Normal(m)
           end
       end

       m_loop = test_loop(y)
       @time sample(m_loop, alg, n_samples)

       end
┌ Info: Found initial step size
└   ϵ = 0.000390625
Sampling 100%|█████████████████████████████████████████████████████████████████████████████████| Time: 0:03:03
186.599898 seconds (3.79 G allocations: 166.483 GiB, 9.57% gc time, 1.02% compilation time)
┌ Info: Found initial step size
└   ϵ = 0.000390625
Sampling 100%|█████████████████████████████████████████████████████████████████████████████████| Time: 0:03:33
216.282827 seconds (4.17 G allocations: 181.365 GiB, 8.79% gc time, 0.91% compilation time)
Main.MWE

Is there some case where it makes a more dramatic difference? If so, can someone write an example?

Unless it makes a very substantial difference in some common use case, I'm in favour of keeping the syntax for backwards compatibility, but turning every .~ into a loop over eachindex(lhs) at macro time and internally dealing with only ~. I haven't thought through how that would work for more complex broadcasting than the LHS being a vector and RHS being a scalar.

@willtebbutt
Copy link
Member

willtebbutt commented Jan 30, 2025

FWIW, I find with Mooncake that the loopy version is faster:

julia> module MWE

       using Turing
       import Mooncake

       adbackend = AutoMooncake(config=nothing)
       alg = NUTS(; adtype=adbackend)
       n_samples = 20
       y = randn(1000)

       @model function test_tilde(y, ::Type{T}=Float64) where {T}
           num_values = length(y)
           x1 = Vector{T}(undef, num_values)
           x1 .~ Normal()
           x2 = Vector{T}(undef, num_values)
           x2 .~ Gamma()
           m = sum(x1 .* x2)
           y .~ Normal(m)
       end

       m_tilde = test_tilde(y)
       @time sample(m_tilde, alg, n_samples)

       @model function test_loop(y, ::Type{T}=Float64) where {T}
           num_values = length(y)
           x1 = Vector{T}(undef, num_values)
           for i in eachindex(x1)
               x1[i] ~ Normal()
           end
           x2 = Vector{T}(undef, num_values)
           for i in eachindex(x2)
               x2[i] ~ Gamma()
           end
           m = sum(x1 .* x2)
           for i in eachindex(y)
               y[i] ~ Normal(m)
           end
       end

       m_loop = test_loop(y)
       @time sample(m_loop, alg, n_samples)

       end

WARNING: replacing module MWE.
┌ Info: Found initial step size
└   ϵ = 0.000390625
Sampling 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:01:53
116.111583 seconds (1.60 G allocations: 95.795 GiB, 6.01% gc time, 4.90% compilation time)
┌ Info: Found initial step size
└   ϵ = 0.000390625
Sampling 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:59
 61.705976 seconds (1.11 G allocations: 73.435 GiB, 8.65% gc time, 7.44% compilation time)

(on the second run -- first run Mooncake has to compile some of its internals)

@mhauru
Copy link
Member

mhauru commented Jan 30, 2025

Could we lower

@model function f_tilde()
    x = Array{Float64,2}(undef, 2, 4)
    d = MvNormal([-10.0, 10.0], I)
    # d = Normal()
    x .~ d
end

to something like

@model function f_loop()
    x = Array{Float64,2}(undef, 2, 4)
    d = MvNormal([-10.0, 10.0], I)
    # d = Normal()
    lhs_axes = axes(x)
    num_dims_d = length(size(d))
    axes_to_loop = lhs_axes[(begin + num_dims_d):end]
    colons = fill(:, num_dims_d)
    for idx in Iterators.product(axes_to_loop...)
        x[colons..., idx...] ~ d
    end
end

?

@penelopeysm
Copy link
Member

That seems sensible - I'm not usually a TDD proponent but I think this is indeed a situation where it'd be helpful to enumerate a set of test cases first so that we know what we're dealing with

@torfjelde
Copy link
Member

Is there a reason why comparisons aren't done via TuringBenchmarking.jl here instead of timing smaple calls? The former should be a much more reliable, no?

@mhauru
Copy link
Member

mhauru commented Feb 1, 2025

Is there a reason why comparisons aren't done via TuringBenchmarking.jl here instead of timing smaple calls?

No.

Using the same two models I had above:

julia> @model function test_dot_tilde(y, ::Type{T}=Vector{Float64}) where {T}
                  num_values = length(y)
                  x1 = T(undef, num_values)
                  x1 .~ Normal()
                  x2 = T(undef, num_values)
                  x2 .~ Gamma()
                  m = sum(x1 .* x2)
                  y .~ Normal(m)
              end
test_dot_tilde (generic function with 4 methods)

julia> @model function test_loop(y, ::Type{T}=Vector{Float64}) where {T}
                  num_values = length(y)
                  x1 = T(undef, num_values)
                  for i in eachindex(x1)
                      x1[i] ~ Normal()
                  end
                  x2 = T(undef, num_values)
                  for i in eachindex(x2)
                      x2[i] ~ Gamma()
                  end
                  m = sum(x1 .* x2)
                  for i in eachindex(y)
                      y[i] ~ Normal(m)
                  end
              end
test_loop (generic function with 4 methods)

julia> y = randn(1000);

julia> m_dot_tilde = test_dot_tilde(y);

julia> m_loop = test_loop(y);

julia> benchmark_model(m_dot_tilde)
┌ Warning: Gradient computation (without linking) failed for AutoZygote(): ErrorException("Mutating arrays is not supported -- called copyto!(Vector{Float64}, ...)\nThis error occurs when you ask Zygote to differentiate operations that change\nthe elements of arrays in place (e.g. setting values with x .= ...)\n\nPossible fixes:\n- avoid mutating operations (preferred)\n- or read the documentation and solutions for this error\n  https://fluxml.ai/Zygote.jl/latest/limitations\n")
└ @ TuringBenchmarking ~/.julia/packages/TuringBenchmarking/VqHuu/src/TuringBenchmarking.jl:213
┌ Warning: Gradient computation (with linking) failed for AutoZygote(): ErrorException("Mutating arrays is not supported -- called copyto!(Vector{Float64}, ...)\nThis error occurs when you ask Zygote to differentiate operations that change\nthe elements of arrays in place (e.g. setting values with x .= ...)\n\nPossible fixes:\n- avoid mutating operations (preferred)\n- or read the documentation and solutions for this error\n  https://fluxml.ai/Zygote.jl/latest/limitations\n")
└ @ TuringBenchmarking ~/.julia/packages/TuringBenchmarking/VqHuu/src/TuringBenchmarking.jl:243
2-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "evaluation" => 2-element BenchmarkTools.BenchmarkGroup:
	  tags: []
	  "linked" => Trial(511.834 μs)
	  "standard" => Trial(322.708 μs)
  "gradient" => 4-element BenchmarkTools.BenchmarkGroup:
	  tags: []
	  "AutoReverseDiff(compile=true)" => 2-element BenchmarkTools.BenchmarkGroup:
		  tags: ["ReverseDiff [compiled]"]
		  "linked" => Trial(2.141 ms)
		  "standard" => Trial(1.687 ms)
	  "AutoForwardDiff(chunksize=0)" => 2-element BenchmarkTools.BenchmarkGroup:
		  tags: ["ForwardDiff"]
		  "linked" => Trial(135.956 ms)
		  "standard" => Trial(96.100 ms)
	  "AutoReverseDiff()" => 2-element BenchmarkTools.BenchmarkGroup:
		  tags: ["ReverseDiff"]
		  "linked" => Trial(5.988 ms)
		  "standard" => Trial(4.771 ms)
	  "AutoZygote()" => 2-element BenchmarkTools.BenchmarkGroup:
		  tags: ["Zygote"]
		  "linked" => 0-element BenchmarkTools.BenchmarkGroup:
			  tags: []
		  "standard" => 0-element BenchmarkTools.BenchmarkGroup:
			  tags: []

julia> benchmark_model(m_loop)
┌ Warning: Gradient computation (without linking) failed for AutoZygote(): ErrorException("Mutating arrays is not supported -- called setindex!(Vector{Float64}, ...)\nThis error occurs when you ask Zygote to differentiate operations that change\nthe elements of arrays in place (e.g. setting values with x .= ...)\n\nPossible fixes:\n- avoid mutating operations (preferred)\n- or read the documentation and solutions for this error\n  https://fluxml.ai/Zygote.jl/latest/limitations\n")
└ @ TuringBenchmarking ~/.julia/packages/TuringBenchmarking/VqHuu/src/TuringBenchmarking.jl:213
┌ Warning: Gradient computation (with linking) failed for AutoZygote(): ErrorException("Mutating arrays is not supported -- called setindex!(Vector{Float64}, ...)\nThis error occurs when you ask Zygote to differentiate operations that change\nthe elements of arrays in place (e.g. setting values with x .= ...)\n\nPossible fixes:\n- avoid mutating operations (preferred)\n- or read the documentation and solutions for this error\n  https://fluxml.ai/Zygote.jl/latest/limitations\n")
└ @ TuringBenchmarking ~/.julia/packages/TuringBenchmarking/VqHuu/src/TuringBenchmarking.jl:243
2-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "evaluation" => 2-element BenchmarkTools.BenchmarkGroup:
	  tags: []
	  "linked" => Trial(406.250 μs)
	  "standard" => Trial(239.250 μs)
  "gradient" => 4-element BenchmarkTools.BenchmarkGroup:
	  tags: []
	  "AutoReverseDiff(compile=true)" => 2-element BenchmarkTools.BenchmarkGroup:
		  tags: ["ReverseDiff [compiled]"]
		  "linked" => Trial(2.400 ms)
		  "standard" => Trial(2.280 ms)
	  "AutoForwardDiff(chunksize=0)" => 2-element BenchmarkTools.BenchmarkGroup:
		  tags: ["ForwardDiff"]
		  "linked" => Trial(117.081 ms)
		  "standard" => Trial(79.289 ms)
	  "AutoReverseDiff()" => 2-element BenchmarkTools.BenchmarkGroup:
		  tags: ["ReverseDiff"]
		  "linked" => Trial(6.537 ms)
		  "standard" => Trial(5.800 ms)
	  "AutoZygote()" => 2-element BenchmarkTools.BenchmarkGroup:
		  tags: ["Zygote"]
		  "linked" => 0-element BenchmarkTools.BenchmarkGroup:
			  tags: []
		  "standard" => 0-element BenchmarkTools.BenchmarkGroup:
			  tags: []

@mhauru
Copy link
Member

mhauru commented Feb 1, 2025

I'm trying to understand all the valid uses of .~ as it now stands and how we would want it. By reading the source code and browsing tests, I've come up with the following:

In LHS .~ RHS the LHS has to be either

  1. Any expression that is valid on the LHS of an assignment operation. In practice this is anything that can be turned into an optic using
    Accessors.setmacro(BangBang.prefermutation, expr; overwrite=true)
  2. A literal value made of numbers, such as 1.0 or [1.3, 2.3]. In practice this is defined by this function in compiler.jl:
isliteral(e) = false
isliteral(::Number) = true
isliteral(e::Expr) = !isempty(e.args) && all(isliteral, e.args)

The RHS needs to be either

  1. A Distribution, possibly a multivariate one.
  2. An AbstractVector of Distributions.

~, without the ., also accepts submodels on the RHS, but .~ does not.

Questions:

1

The main question is have I got this right, am I missing something?

2

Do we want to support Arrays of multidimensional distributions? The code that should naturally catch this (check_tilde_rhs) accepts it, but I can't find a case that would actually work (errors vary and not very informative). It also feels clunky to me, it would be a bit like broadcasting over nested arrays rather than multidimensional arrays, I'd be happy to say that we don't support it. Example of something that could work, but doesn't:

@model function f1()
    x = Array{Float64, 3}(undef, 3, 2, 2)
    x .~ fill(MvNormal(randn(2), I), 3)
end

3

Is there a reason why we don't support the below, or is it just a bug?

@model function f1()
    x = Array{Float64, 3}(undef, 3, 2, 4)
    m = randn(3)
    x .~ MvNormal(m, I)
end

This crashes with some MethodError from iterate that doesn't make immediate sense to me. Broadcasting over only a single dimension is fine, and broadcasting over multiple dimensions is fine if you use a univariate distribution, it's the combination of multivariate + broadcasting over multiple dimensions that fails.

4

Do we want to allow matrix variate distributions on the RHS? I don't see why not since we allow multivariate, but I haven't been able to make it work with Turing.jl as it now stands. Is that a bug? E.g. why doesn't this work:

@model function f1()
    x = Array{Float64, 3}(undef, 3, 3, 2)
    x .~ LKJ(3, 1.0)
end

Answers:

Anyone should feel free to opine on what we should do, but if @torfjelde or @yebai could comment on whether some of the above things that don't work are intentional design decisions or just unimplemented/bugs, that would help.

@yebai
Copy link
Member Author

yebai commented Feb 2, 2025

  1. Do we want to support Arrays of multidimensional distributions?

Probably not -- we can support it in principle, but it is better motivated by some concrete examples.

  1. Is there a reason why we don't support the below, or is it just a bug?

It looks like a bug to me. The .~ notation was a bit ad-hoc from the beginning, and its functionality was a bit patchy.

Do we want to allow matrix variate distributions on the RHS?

Probably not -- we can support it in principle, but it is better motivated by some concrete examples.

On a general note, I think a high-order function that constructs an array of distributions (like filldist or product_distribution) is better than the broadcasting syntax. So, I would happily restrict .~ to univariate distributions only and recommend users to use product_distribution for vector and matrix-variate distributions.

@penelopeysm
Copy link
Member

On a general note, I think a high-order function that constructs an array of distributions (like filldist or product_distribution) is better than the broadcasting syntax.

👍

@mhauru mhauru self-assigned this Feb 3, 2025
@mhauru
Copy link
Member

mhauru commented Feb 3, 2025

FYI, I started on an implementation, if anyone wants to work on this please talk to me first so we don't duplicate work.

I'm happy to get rid of supporting Arrays on RHS of .~. It's a breaking change, but one that is easy to adapt to. Makes the implementation a lot simpler too.

I've posted on the Julia slack to see if any users would find the changes very disruptive. I could go either way on allowing only univariate distributions and allowing multivariate and matrix variate too. I generally think complicated broadcasting, where the dimensions don't match and neither side is a scalar, is confusing, but it is also something is commonly done in Julia and not hard for us to implement.

@mhauru
Copy link
Member

mhauru commented Feb 7, 2025

On Slack, got a few questions about what the new syntax would be to replace calls to .~ with multivariate RHSs, but no one is saying this would be disruptive for them.

I made a PR implementing .~ as a loop over ~: #804

@torfjelde
Copy link
Member

2 Do we want to support Arrays of multidimensional distributions?

I'd say no 🤷 It's just complicated and doesn't add much

3 Is there a reason why we don't support the below, or is it just a bug?

.~ is (currently) meant to function in a way that "respects" logpdf broadcasting, i.e.

logpdf.(right, left)

should be valid.

This means that a case such as

x .~ MvNormal(m, I)

results in

logpdf.(MvNormal(m, I), x)

which is not defined (you're iterate error is likely because MvNormal doesn't define broadcastable; back in the day, I believe there was a default impl for Distribution).

But even if you defined broadcastable for MvNormal, you have to make an implicit choice regarding which dim your broadcasting over.

However, we have this (IMO very annoying) support for the special case of left::Matrix and right::MultivariateDistribution which breaks this convention, and tries to treat .~ instead as loglikelihood(right, left) because rand(right, n) returns a Matrix of size d, n (where d is the dim, nad n is the num samples). This is a) very annoying to maintain (just look at the leafs of over tilde-pipeline), and b) AFAIK nobody uses it.

I don't see why not since we allow multivariate, but I haven't been able to make it work with Turing.jl as it now stands.

We shouldn't allow multivariate 🙃

@mhauru
Copy link
Member

mhauru commented Feb 10, 2025

Thanks for the comments @torfjelde. Note that the current plan, implemented in the above linked PR, is to break this convention:

.~ is (currently) meant to function in a way that "respects" logpdf broadcasting

and become much more restrictive. For instance, x .~ Normal.(y) won't work anymore, because Normal.(y) is an array. You'd need product_distribution instead.

@torfjelde
Copy link
Member

Gotcha; seems sensible 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants