Issues with the dot-tilde (i.e. `.~`) syntax #761

yebai · 2024-12-19T15:46:15Z

Below is copied from #710 (comment)

I lean towards removing the dot syntax. The benefits of using dot syntax against a loop or fillarray/product_distribution are relatively marginal compared to its complexity and lack of robustness.

The text was updated successfully, but these errors were encountered:

yebai · 2024-12-19T15:46:43Z

cc @torfjelde @penelopeysm @mhauru @willtebbutt

willtebbutt · 2025-01-02T12:28:51Z

I would also be in favour of removing the .~ syntax -- could you remind me what the benefits of it are? I vaguely remember there being a couple of things that it can be used to do, that someone mentioned being trickier with fillarray / product_distribution.

mhauru · 2025-01-06T11:45:17Z

Might poll our userbase on Slack, see if anyone is desperate to keep .~. I would be very glad if we could get rid of it, but don't want to break user code lightly.

torfjelde · 2025-01-06T13:43:10Z

I would also be in favour of removing the .~ syntax -- could you remind me what the benefits of it are? I vaguely remember there being a couple of things that it can be used to do, that someone mentioned being trickier with fillarray / product_distribution.

There are "semantic" differences.

x ~ filldist(Normal(), 2)

treats x as a multivariate random variable of length 2, while

x .~ Normal()

treats x as a vector of 2 univariate random variables. This is effectively the difference between a multivariate random variable vs. a vector of IID variables.

yebai · 2025-01-06T14:53:16Z

Here is the idea I mentioned briefly during the meeting. We lower

x .~ Normal()

to

x = tilde_assume!!(__context__, IIDDistribution(Normal()), __varinfo__)

Then, we overload tilde_assume!! for IIDDistribution to store each RV x[i] separately in __varinfo__. This allows users to individually apply model operations like condition/fix on x[i].

We could make this work for other distribution sugars by overloading the tilde_assume!! (also tilde_observe!!) interfaces for new types like IIDDistribution.

EDIT: I am happy to remove the .~ feature if we can't find a good, robust workaround.

mhauru · 2025-01-30T16:51:02Z

Left this comment in one of the other .~ issues a couple of days ago, but should have put it here instead:

I tried to make a simple benchmark to see the effect of .~ vs a loop. The results came out quite underwhelming:

julia> module MWE

       using Turing
       import ReverseDiff

       adbackend = AutoReverseDiff()
       alg = NUTS(; adtype=adbackend)
       n_samples = 20
       y = randn(1000)

       @model function test_tilde(y, ::Type{T}=Float64) where {T}
           num_values = length(y)
           x1 = Vector{T}(undef, num_values)
           x1 .~ Normal()
           x2 = Vector{T}(undef, num_values)
           x2 .~ Gamma()
           m = sum(x1 .* x2)
           y .~ Normal(m)
       end

       m_tilde = test_tilde(y)
       @time sample(m_tilde, alg, n_samples)

       @model function test_loop(y, ::Type{T}=Float64) where {T}
           num_values = length(y)
           x1 = Vector{T}(undef, num_values)
           for i in eachindex(x1)
               x1[i] ~ Normal()
           end
           x2 = Vector{T}(undef, num_values)
           for i in eachindex(x2)
               x2[i] ~ Gamma()
           end
           m = sum(x1 .* x2)
           for i in eachindex(y)
               y[i] ~ Normal(m)
           end
       end

       m_loop = test_loop(y)
       @time sample(m_loop, alg, n_samples)

       end
┌ Info: Found initial step size
└   ϵ = 0.000390625
Sampling 100%|█████████████████████████████████████████████████████████████████████████████████| Time: 0:03:03
186.599898 seconds (3.79 G allocations: 166.483 GiB, 9.57% gc time, 1.02% compilation time)
┌ Info: Found initial step size
└   ϵ = 0.000390625
Sampling 100%|█████████████████████████████████████████████████████████████████████████████████| Time: 0:03:33
216.282827 seconds (4.17 G allocations: 181.365 GiB, 8.79% gc time, 0.91% compilation time)
Main.MWE

Is there some case where it makes a more dramatic difference? If so, can someone write an example?

Unless it makes a very substantial difference in some common use case, I'm in favour of keeping the syntax for backwards compatibility, but turning every .~ into a loop over eachindex(lhs) at macro time and internally dealing with only ~. I haven't thought through how that would work for more complex broadcasting than the LHS being a vector and RHS being a scalar.

willtebbutt · 2025-01-30T17:23:21Z

FWIW, I find with Mooncake that the loopy version is faster:

julia> module MWE

       using Turing
       import Mooncake

       adbackend = AutoMooncake(config=nothing)
       alg = NUTS(; adtype=adbackend)
       n_samples = 20
       y = randn(1000)

       @model function test_tilde(y, ::Type{T}=Float64) where {T}
           num_values = length(y)
           x1 = Vector{T}(undef, num_values)
           x1 .~ Normal()
           x2 = Vector{T}(undef, num_values)
           x2 .~ Gamma()
           m = sum(x1 .* x2)
           y .~ Normal(m)
       end

       m_tilde = test_tilde(y)
       @time sample(m_tilde, alg, n_samples)

       @model function test_loop(y, ::Type{T}=Float64) where {T}
           num_values = length(y)
           x1 = Vector{T}(undef, num_values)
           for i in eachindex(x1)
               x1[i] ~ Normal()
           end
           x2 = Vector{T}(undef, num_values)
           for i in eachindex(x2)
               x2[i] ~ Gamma()
           end
           m = sum(x1 .* x2)
           for i in eachindex(y)
               y[i] ~ Normal(m)
           end
       end

       m_loop = test_loop(y)
       @time sample(m_loop, alg, n_samples)

       end

WARNING: replacing module MWE.
┌ Info: Found initial step size
└   ϵ = 0.000390625
Sampling 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:01:53
116.111583 seconds (1.60 G allocations: 95.795 GiB, 6.01% gc time, 4.90% compilation time)
┌ Info: Found initial step size
└   ϵ = 0.000390625
Sampling 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:59
 61.705976 seconds (1.11 G allocations: 73.435 GiB, 8.65% gc time, 7.44% compilation time)

(on the second run -- first run Mooncake has to compile some of its internals)

mhauru · 2025-01-30T17:53:23Z

Could we lower

@model function f_tilde()
    x = Array{Float64,2}(undef, 2, 4)
    d = MvNormal([-10.0, 10.0], I)
    # d = Normal()
    x .~ d
end

to something like

@model function f_loop()
    x = Array{Float64,2}(undef, 2, 4)
    d = MvNormal([-10.0, 10.0], I)
    # d = Normal()
    lhs_axes = axes(x)
    num_dims_d = length(size(d))
    axes_to_loop = lhs_axes[(begin + num_dims_d):end]
    colons = fill(:, num_dims_d)
    for idx in Iterators.product(axes_to_loop...)
        x[colons..., idx...] ~ d
    end
end

?

penelopeysm · 2025-01-30T17:56:51Z

That seems sensible - I'm not usually a TDD proponent but I think this is indeed a situation where it'd be helpful to enumerate a set of test cases first so that we know what we're dealing with

torfjelde · 2025-01-31T17:40:41Z

Is there a reason why comparisons aren't done via TuringBenchmarking.jl here instead of timing smaple calls? The former should be a much more reliable, no?

mhauru · 2025-02-01T13:44:14Z

Is there a reason why comparisons aren't done via TuringBenchmarking.jl here instead of timing smaple calls?

No.

Using the same two models I had above:

julia> @model function test_dot_tilde(y, ::Type{T}=Vector{Float64}) where {T}
                  num_values = length(y)
                  x1 = T(undef, num_values)
                  x1 .~ Normal()
                  x2 = T(undef, num_values)
                  x2 .~ Gamma()
                  m = sum(x1 .* x2)
                  y .~ Normal(m)
              end
test_dot_tilde (generic function with 4 methods)

julia> @model function test_loop(y, ::Type{T}=Vector{Float64}) where {T}
                  num_values = length(y)
                  x1 = T(undef, num_values)
                  for i in eachindex(x1)
                      x1[i] ~ Normal()
                  end
                  x2 = T(undef, num_values)
                  for i in eachindex(x2)
                      x2[i] ~ Gamma()
                  end
                  m = sum(x1 .* x2)
                  for i in eachindex(y)
                      y[i] ~ Normal(m)
                  end
              end
test_loop (generic function with 4 methods)

julia> y = randn(1000);

julia> m_dot_tilde = test_dot_tilde(y);

julia> m_loop = test_loop(y);

julia> benchmark_model(m_dot_tilde)
┌ Warning: Gradient computation (without linking) failed for AutoZygote(): ErrorException("Mutating arrays is not supported -- called copyto!(Vector{Float64}, ...)\nThis error occurs when you ask Zygote to differentiate operations that change\nthe elements of arrays in place (e.g. setting values with x .= ...)\n\nPossible fixes:\n- avoid mutating operations (preferred)\n- or read the documentation and solutions for this error\n  https://fluxml.ai/Zygote.jl/latest/limitations\n")
└ @ TuringBenchmarking ~/.julia/packages/TuringBenchmarking/VqHuu/src/TuringBenchmarking.jl:213
┌ Warning: Gradient computation (with linking) failed for AutoZygote(): ErrorException("Mutating arrays is not supported -- called copyto!(Vector{Float64}, ...)\nThis error occurs when you ask Zygote to differentiate operations that change\nthe elements of arrays in place (e.g. setting values with x .= ...)\n\nPossible fixes:\n- avoid mutating operations (preferred)\n- or read the documentation and solutions for this error\n  https://fluxml.ai/Zygote.jl/latest/limitations\n")
└ @ TuringBenchmarking ~/.julia/packages/TuringBenchmarking/VqHuu/src/TuringBenchmarking.jl:243
2-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "evaluation" => 2-element BenchmarkTools.BenchmarkGroup:
	  tags: []
	  "linked" => Trial(511.834 μs)
	  "standard" => Trial(322.708 μs)
  "gradient" => 4-element BenchmarkTools.BenchmarkGroup:
	  tags: []
	  "AutoReverseDiff(compile=true)" => 2-element BenchmarkTools.BenchmarkGroup:
		  tags: ["ReverseDiff [compiled]"]
		  "linked" => Trial(2.141 ms)
		  "standard" => Trial(1.687 ms)
	  "AutoForwardDiff(chunksize=0)" => 2-element BenchmarkTools.BenchmarkGroup:
		  tags: ["ForwardDiff"]
		  "linked" => Trial(135.956 ms)
		  "standard" => Trial(96.100 ms)
	  "AutoReverseDiff()" => 2-element BenchmarkTools.BenchmarkGroup:
		  tags: ["ReverseDiff"]
		  "linked" => Trial(5.988 ms)
		  "standard" => Trial(4.771 ms)
	  "AutoZygote()" => 2-element BenchmarkTools.BenchmarkGroup:
		  tags: ["Zygote"]
		  "linked" => 0-element BenchmarkTools.BenchmarkGroup:
			  tags: []
		  "standard" => 0-element BenchmarkTools.BenchmarkGroup:
			  tags: []

julia> benchmark_model(m_loop)
┌ Warning: Gradient computation (without linking) failed for AutoZygote(): ErrorException("Mutating arrays is not supported -- called setindex!(Vector{Float64}, ...)\nThis error occurs when you ask Zygote to differentiate operations that change\nthe elements of arrays in place (e.g. setting values with x .= ...)\n\nPossible fixes:\n- avoid mutating operations (preferred)\n- or read the documentation and solutions for this error\n  https://fluxml.ai/Zygote.jl/latest/limitations\n")
└ @ TuringBenchmarking ~/.julia/packages/TuringBenchmarking/VqHuu/src/TuringBenchmarking.jl:213
┌ Warning: Gradient computation (with linking) failed for AutoZygote(): ErrorException("Mutating arrays is not supported -- called setindex!(Vector{Float64}, ...)\nThis error occurs when you ask Zygote to differentiate operations that change\nthe elements of arrays in place (e.g. setting values with x .= ...)\n\nPossible fixes:\n- avoid mutating operations (preferred)\n- or read the documentation and solutions for this error\n  https://fluxml.ai/Zygote.jl/latest/limitations\n")
└ @ TuringBenchmarking ~/.julia/packages/TuringBenchmarking/VqHuu/src/TuringBenchmarking.jl:243
2-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "evaluation" => 2-element BenchmarkTools.BenchmarkGroup:
	  tags: []
	  "linked" => Trial(406.250 μs)
	  "standard" => Trial(239.250 μs)
  "gradient" => 4-element BenchmarkTools.BenchmarkGroup:
	  tags: []
	  "AutoReverseDiff(compile=true)" => 2-element BenchmarkTools.BenchmarkGroup:
		  tags: ["ReverseDiff [compiled]"]
		  "linked" => Trial(2.400 ms)
		  "standard" => Trial(2.280 ms)
	  "AutoForwardDiff(chunksize=0)" => 2-element BenchmarkTools.BenchmarkGroup:
		  tags: ["ForwardDiff"]
		  "linked" => Trial(117.081 ms)
		  "standard" => Trial(79.289 ms)
	  "AutoReverseDiff()" => 2-element BenchmarkTools.BenchmarkGroup:
		  tags: ["ReverseDiff"]
		  "linked" => Trial(6.537 ms)
		  "standard" => Trial(5.800 ms)
	  "AutoZygote()" => 2-element BenchmarkTools.BenchmarkGroup:
		  tags: ["Zygote"]
		  "linked" => 0-element BenchmarkTools.BenchmarkGroup:
			  tags: []
		  "standard" => 0-element BenchmarkTools.BenchmarkGroup:
			  tags: []

mhauru · 2025-02-01T16:39:30Z

I'm trying to understand all the valid uses of .~ as it now stands and how we would want it. By reading the source code and browsing tests, I've come up with the following:

In LHS .~ RHS the LHS has to be either

Any expression that is valid on the LHS of an assignment operation. In practice this is anything that can be turned into an optic using

DynamicPPL.jl/src/compiler.jl

Line 469 in 29a6c7e

Accessors.setmacro(BangBang.prefermutation, expr; overwrite=true)
A literal value made of numbers, such as 1.0 or [1.3, 2.3]. In practice this is defined by this function in compiler.jl:

isliteral(e) = false
isliteral(::Number) = true
isliteral(e::Expr) = !isempty(e.args) && all(isliteral, e.args)

The RHS needs to be either

A Distribution, possibly a multivariate one.
An AbstractVector of Distributions.

~, without the ., also accepts submodels on the RHS, but .~ does not.

Questions:

1

The main question is have I got this right, am I missing something?

2

Do we want to support Arrays of multidimensional distributions? The code that should naturally catch this (check_tilde_rhs) accepts it, but I can't find a case that would actually work (errors vary and not very informative). It also feels clunky to me, it would be a bit like broadcasting over nested arrays rather than multidimensional arrays, I'd be happy to say that we don't support it. Example of something that could work, but doesn't:

@model function f1()
    x = Array{Float64, 3}(undef, 3, 2, 2)
    x .~ fill(MvNormal(randn(2), I), 3)
end

3

Is there a reason why we don't support the below, or is it just a bug?

@model function f1()
    x = Array{Float64, 3}(undef, 3, 2, 4)
    m = randn(3)
    x .~ MvNormal(m, I)
end

This crashes with some MethodError from iterate that doesn't make immediate sense to me. Broadcasting over only a single dimension is fine, and broadcasting over multiple dimensions is fine if you use a univariate distribution, it's the combination of multivariate + broadcasting over multiple dimensions that fails.

4

Do we want to allow matrix variate distributions on the RHS? I don't see why not since we allow multivariate, but I haven't been able to make it work with Turing.jl as it now stands. Is that a bug? E.g. why doesn't this work:

@model function f1()
    x = Array{Float64, 3}(undef, 3, 3, 2)
    x .~ LKJ(3, 1.0)
end

Answers:

Anyone should feel free to opine on what we should do, but if @torfjelde or @yebai could comment on whether some of the above things that don't work are intentional design decisions or just unimplemented/bugs, that would help.

yebai · 2025-02-02T15:12:24Z

Do we want to support Arrays of multidimensional distributions?

Probably not -- we can support it in principle, but it is better motivated by some concrete examples.

Is there a reason why we don't support the below, or is it just a bug?

It looks like a bug to me. The .~ notation was a bit ad-hoc from the beginning, and its functionality was a bit patchy.

Do we want to allow matrix variate distributions on the RHS?

Probably not -- we can support it in principle, but it is better motivated by some concrete examples.

On a general note, I think a high-order function that constructs an array of distributions (like filldist or product_distribution) is better than the broadcasting syntax. So, I would happily restrict .~ to univariate distributions only and recommend users to use product_distribution for vector and matrix-variate distributions.

penelopeysm · 2025-02-02T15:31:44Z

On a general note, I think a high-order function that constructs an array of distributions (like filldist or product_distribution) is better than the broadcasting syntax.

👍

mhauru · 2025-02-03T12:13:23Z

FYI, I started on an implementation, if anyone wants to work on this please talk to me first so we don't duplicate work.

I'm happy to get rid of supporting Arrays on RHS of .~. It's a breaking change, but one that is easy to adapt to. Makes the implementation a lot simpler too.

I've posted on the Julia slack to see if any users would find the changes very disruptive. I could go either way on allowing only univariate distributions and allowing multivariate and matrix variate too. I generally think complicated broadcasting, where the dimensions don't match and neither side is a scalar, is confusing, but it is also something is commonly done in Julia and not hard for us to implement.

mhauru · 2025-02-07T18:12:21Z

On Slack, got a few questions about what the new syntax would be to replace calls to .~ with multivariate RHSs, but no one is saying this would be disruptive for them.

I made a PR implementing .~ as a loop over ~: #804

torfjelde · 2025-02-08T12:58:55Z

2 Do we want to support Arrays of multidimensional distributions?

I'd say no 🤷 It's just complicated and doesn't add much

3 Is there a reason why we don't support the below, or is it just a bug?

.~ is (currently) meant to function in a way that "respects" logpdf broadcasting, i.e.

logpdf.(right, left)

should be valid.

This means that a case such as

x .~ MvNormal(m, I)

results in

logpdf.(MvNormal(m, I), x)

which is not defined (you're iterate error is likely because MvNormal doesn't define broadcastable; back in the day, I believe there was a default impl for Distribution).

But even if you defined broadcastable for MvNormal, you have to make an implicit choice regarding which dim your broadcasting over.

However, we have this (IMO very annoying) support for the special case of left::Matrix and right::MultivariateDistribution which breaks this convention, and tries to treat .~ instead as loglikelihood(right, left) because rand(right, n) returns a Matrix of size d, n (where d is the dim, nad n is the num samples). This is a) very annoying to maintain (just look at the leafs of over tilde-pipeline), and b) AFAIK nobody uses it.

I don't see why not since we allow multivariate, but I haven't been able to make it work with Turing.jl as it now stands.

We shouldn't allow multivariate 🙃

mhauru · 2025-02-10T10:30:30Z

Thanks for the comments @torfjelde. Note that the current plan, implemented in the above linked PR, is to break this convention:

.~ is (currently) meant to function in a way that "respects" logpdf broadcasting

and become much more restrictive. For instance, x .~ Normal.(y) won't work anymore, because Normal.(y) is an array. You'd need product_distribution instead.

torfjelde · 2025-02-10T15:12:42Z

Gotcha; seems sensible 👍

yebai mentioned this issue Dec 20, 2024

Possible Improvements to FixedContext #710

Closed

yebai mentioned this issue Jan 31, 2025

Turing 1.0 Meta Meta Issue TuringLang/Turing.jl#2420

Open

15 tasks

mhauru self-assigned this Feb 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues with the dot-tilde (i.e. `.~`) syntax #761

Issues with the dot-tilde (i.e. `.~`) syntax #761

yebai commented Dec 19, 2024 •

edited

Loading

yebai commented Dec 19, 2024

willtebbutt commented Jan 2, 2025

mhauru commented Jan 6, 2025

torfjelde commented Jan 6, 2025

yebai commented Jan 6, 2025 •

edited

Loading

mhauru commented Jan 30, 2025

willtebbutt commented Jan 30, 2025 •

edited

Loading

mhauru commented Jan 30, 2025

penelopeysm commented Jan 30, 2025

torfjelde commented Jan 31, 2025

mhauru commented Feb 1, 2025

mhauru commented Feb 1, 2025

yebai commented Feb 2, 2025

penelopeysm commented Feb 2, 2025

mhauru commented Feb 3, 2025

mhauru commented Feb 7, 2025

torfjelde commented Feb 8, 2025

mhauru commented Feb 10, 2025

torfjelde commented Feb 10, 2025

Issues with the dot-tilde (i.e. .~) syntax #761

Issues with the dot-tilde (i.e. .~) syntax #761

Comments

yebai commented Dec 19, 2024 • edited Loading

yebai commented Dec 19, 2024

willtebbutt commented Jan 2, 2025

mhauru commented Jan 6, 2025

torfjelde commented Jan 6, 2025

yebai commented Jan 6, 2025 • edited Loading

mhauru commented Jan 30, 2025

willtebbutt commented Jan 30, 2025 • edited Loading

mhauru commented Jan 30, 2025

penelopeysm commented Jan 30, 2025

torfjelde commented Jan 31, 2025

mhauru commented Feb 1, 2025

mhauru commented Feb 1, 2025

Questions:

1

2

3

4

Answers:

yebai commented Feb 2, 2025

penelopeysm commented Feb 2, 2025

mhauru commented Feb 3, 2025

mhauru commented Feb 7, 2025

torfjelde commented Feb 8, 2025

mhauru commented Feb 10, 2025

torfjelde commented Feb 10, 2025

Issues with the dot-tilde (i.e. `.~`) syntax #761

Issues with the dot-tilde (i.e. `.~`) syntax #761

yebai commented Dec 19, 2024 •

edited

Loading

yebai commented Jan 6, 2025 •

edited

Loading

willtebbutt commented Jan 30, 2025 •

edited

Loading