Skip to content

Comments

[breaking] v0.40#1164

Draft
mhauru wants to merge 285 commits intomainfrom
breaking
Draft

[breaking] v0.40#1164
mhauru wants to merge 285 commits intomainfrom
breaking

Conversation

@mhauru
Copy link
Member

@mhauru mhauru commented Dec 3, 2025

Release 0.40

mhauru and others added 17 commits November 27, 2025 13:14
Co-authored-by: Penelope Yong <penelopeysm@gmail.com>
* Make threadsafe evaluation opt-in

* Reduce number of type parameters in methods

* Make `warned_warn_about_threads_threads_threads_threads` shorter

* Improve `setthreadsafe` docstring

* warn on bare `@threads` as well

* fix merge

* Fix performance issues

* Use maxthreadid() in TSVI

* Move convert_eltype code to threadsafe eval function

* Point to new Turing docs page

* Add a test for setthreadsafe

* Tidy up check_model

* Apply suggestions from code review

Fix outdated docstrings

Co-authored-by: Markus Hauru <markus@mhauru.org>

* Improve warning message

* Export `requires_threadsafe`

* Add an actual docstring for `requires_threadsafe`

---------

Co-authored-by: Markus Hauru <markus@mhauru.org>
* Standardise `:lp` -> `:logjoint`

* changelog

* fix a test
@github-actions
Copy link
Contributor

github-actions bot commented Dec 3, 2025

Benchmark Report

  • this PR's head: d089228497b46e723325e0d2be93fcf25c7b62ed
  • base branch: eea8d01c5fb217c1a0f4df170bf1ca16ee879c10

Computer Information

Julia Version 1.11.9
Commit 53a02c0720c (2026-02-06 00:27 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × AMD EPYC 7763 64-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)

Benchmark Results

┌───────────────────────┬───────┬─────────────┬────────┬────────────────────────────────┬───────────────────────────┬─────────────────────────────────┐
│                       │       │             │        │        t(eval) / t(ref)        │     t(grad) / t(eval)     │        t(grad) / t(ref)         │
│                       │       │             │        │ ──────────┬──────────┬──────── │ ──────┬─────────┬──────── │ ──────────┬───────────┬──────── │
│                 Model │   Dim │  AD Backend │ Linked │      base │  this PR │ speedup │  base │ this PR │ speedup │      base │   this PR │ speedup │
├───────────────────────┼───────┼─────────────┼────────┼───────────┼──────────┼─────────┼───────┼─────────┼─────────┼───────────┼───────────┼─────────┤
│               Dynamic │    10 │    mooncake │   true │    440.44 │   363.27 │    1.21 │  7.75 │   12.84 │    0.60 │   3411.31 │   4664.99 │    0.73 │
│                   LDA │    12 │ reversediff │   true │   2647.98 │  2674.68 │    0.99 │  5.03 │    4.68 │    1.08 │  13316.19 │  12509.55 │    1.06 │
│   Loop univariate 10k │ 10000 │    mooncake │   true │ 106782.23 │ 59455.61 │    1.80 │  4.60 │    5.30 │    0.87 │ 491375.50 │ 315172.79 │    1.56 │
├───────────────────────┼───────┼─────────────┼────────┼───────────┼──────────┼─────────┼───────┼─────────┼─────────┼───────────┼───────────┼─────────┤
│    Loop univariate 1k │  1000 │    mooncake │   true │   8860.86 │  5907.68 │    1.50 │  4.24 │    5.26 │    0.81 │  37605.77 │  31077.09 │    1.21 │
│      Multivariate 10k │ 10000 │    mooncake │   true │  43933.05 │ 31500.13 │    1.39 │  8.91 │    9.72 │    0.92 │ 391371.23 │ 306093.75 │    1.28 │
│       Multivariate 1k │  1000 │    mooncake │   true │   4560.30 │  8193.76 │    0.56 │  7.19 │    3.83 │    1.88 │  32808.26 │  31399.57 │    1.04 │
├───────────────────────┼───────┼─────────────┼────────┼───────────┼──────────┼─────────┼───────┼─────────┼─────────┼───────────┼───────────┼─────────┤
│ Simple assume observe │     1 │ forwarddiff │  false │      2.35 │     2.46 │    0.96 │  4.66 │    3.87 │    1.20 │     10.95 │      9.51 │    1.15 │
│           Smorgasbord │   201 │ forwarddiff │  false │   1294.68 │  1023.42 │    1.27 │ 71.74 │  142.44 │    0.50 │  92876.57 │ 145780.16 │    0.64 │
│           Smorgasbord │   201 │      enzyme │   true │   1849.49 │  1411.12 │    1.31 │  3.58 │    6.42 │    0.56 │   6616.88 │   9057.45 │    0.73 │
├───────────────────────┼───────┼─────────────┼────────┼───────────┼──────────┼─────────┼───────┼─────────┼─────────┼───────────┼───────────┼─────────┤
│           Smorgasbord │   201 │ forwarddiff │   true │   1849.15 │  1405.27 │    1.32 │ 54.92 │   67.25 │    0.82 │ 101548.94 │  94509.79 │    1.07 │
│           Smorgasbord │   201 │    mooncake │   true │   1850.75 │  1414.73 │    1.31 │  4.34 │    5.29 │    0.82 │   8035.21 │   7484.34 │    1.07 │
│           Smorgasbord │   201 │ reversediff │   true │   1935.37 │  1409.60 │    1.37 │ 83.96 │   99.47 │    0.84 │ 162502.96 │ 140215.39 │    1.16 │
├───────────────────────┼───────┼─────────────┼────────┼───────────┼──────────┼─────────┼───────┼─────────┼─────────┼───────────┼───────────┼─────────┤
│              Submodel │     1 │    mooncake │   true │      7.36 │     3.23 │    2.28 │  5.29 │   52.58 │    0.10 │     38.91 │    169.62 │    0.23 │
└───────────────────────┴───────┴─────────────┴────────┴───────────┴──────────┴─────────┴───────┴─────────┴─────────┴───────────┴───────────┴─────────┘

@codecov
Copy link

codecov bot commented Dec 3, 2025

Codecov Report

❌ Patch coverage is 81.16983% with 367 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.58%. Comparing base (eea8d01) to head (d089228).

Files with missing lines Patch % Lines
src/varnamedtuple/show.jl 7.35% 63 Missing ⚠️
src/varinfo.jl 68.42% 60 Missing ⚠️
src/varnamedtuple/partial_array.jl 84.32% 45 Missing ⚠️
src/test_utils/models.jl 63.93% 22 Missing ⚠️
src/transformed_values.jl 74.39% 21 Missing ⚠️
src/accumulators.jl 9.09% 20 Missing ⚠️
src/accumulators/vector_params.jl 66.00% 17 Missing ⚠️
src/varnamedtuple/getset.jl 93.06% 12 Missing ⚠️
src/varnamedtuple.jl 83.33% 10 Missing ⚠️
src/accumulators/bijector.jl 71.42% 8 Missing ⚠️
... and 20 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1164      +/-   ##
==========================================
- Coverage   79.00%   78.58%   -0.43%     
==========================================
  Files          41       47       +6     
  Lines        3944     3600     -344     
==========================================
- Hits         3116     2829     -287     
+ Misses        828      771      -57     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

penelopeysm and others added 30 commits February 5, 2026 12:08
…API to avoid reevaluating (#1260)

Closes #1167 
Closes #986

This PR renames `ValuesAsInModelAccumulator` to `RawValueAccumulator`.
Reasons:

1. It's shorter.
2. I think it is a nice contrast to `VectorValueAccumulator`, which
stores the (possibly linked) vectorised forms (i.e., it is *essentially*
the replacement for VarInfo).

Maybe more importantly, it also gets rid of

```julia
values_as_in_model(model, include_colon_eq, varinfo)
```

and replaces it with

```julia
get_raw_values(varinfo)
```

The rationale for this is that the former, `values_as_in_model`, would
*reevaluate* the model with the accumulator and then return the values
in that accumulator. That's a bit wasteful, and for the most part in
Turing we never did this, instead preferring to add the accumulator
manually to a VarInfo, and then extract it. See e.g.
https://github.com/TuringLang/Turing.jl/blob/main/src/mcmc/prior.jl#L16.

The latter, `get_raw_values(varinfo)` will error if the VarInfo doesn't
have a `RawValueAccumulator`. That's a bit more annoying, but the main
point here is to force developers to think more about which accumulators
they should use and when, instead of just calling `values_as_in_model`
and accidentally triggering a reevaluation.
This PR implements what I described in #1225.

## Cholesky

As I also said in that PR, this breaks MCMCChains on Cholesky variables.
This is because the individual elements of a Cholesky are stored in the
chain as `x.L[1,1]` and when calling

```julia
DynamicPPL.templated_setindex!!(vnt, val, @varname(x.L[1,1]), template)
```

the current VNT code will attempt to create a NamedTuple for x, then a
2D GrowableArray for x.L, then stick `val` into the first element of
that.

I am kind of repeating myself here, but the true solution is to stop
using MCMCChains. However, for the purposes of this PR, I can put in a
hacky overload for `templated_setindex!!` for when `template isa
LinearAlgebra.Cholesky` that will make this work in the expected way.

## Varying dimensionality

This will also break models that look like this:

```julia
@model function f()
    N ~ Poisson(2.0)
    x = Vector{Float64}(undef, N)
    for i in 1:N
        x[i] ~ Normal()
    end
end
```

The reason is because the template is picked up by running the model
once. If `N` is not constant, then the template for `x` may be (for
example) a length-2 vector. If you then attempt to use this template on
a dataset that has `x[3]`, it will error.
…rm strategy (#1264)

see #1184

this PR is basically me trying to get people to stop using
`evaluate!!(model, vi)` which is pretty hard to reason about. it depends
on a lot of things:

 - the accs are taken from `vi`
- the transform strategy is taken from `model.context` if it's an
InitContext and otherwise it's inferred from `vi`,
- the init strategy is taken from `model.context` if it's an InitContext
and otherwise it's inferred from `vi`

(yeah, that's quite nasty)

instead of this the intention is to push people towards using

`init!!([rng,] model, ::OnlyAccsVarInfo, init_strategy,
transform_strategy)`

because the end goal for DPPL (at least, my end goal) is to have a
*single* evaluation method, whose name is not yet determined, but should
take exactly those five arguments (perhaps in a different order).

i'll write docs on this soon.

----

unfortunately we still need to keep the old two-argument `evaluate!!`
method around in DynamicPPL because things like `init!!` are defined in
terms of it. it is possible to switch it round and make this depend on
`init!!`, but that's a bit harder to do since that relies on getting rid
of DefaultContext.
Closes #1257

This PR concerns VNTs where all elements of a PartialArray with a
concrete template have been filled in:

```julia
julia> using DynamicPPL

julia> vnt = @vnt begin
           @template x = zeros(2)
           x[1] := 1.0
           x[2] := 2.0
       end
VarNamedTuple
└─ x => PartialArray size=(2,) data::Vector{Float64}
        ├─ (1,) => 1.0
        └─ (2,) => 2.0

julia> keys(vnt)
2-element Vector{VarName}:
 x[1]
 x[2]
```

A function `densify!!` is introduced here to, well, densify the VNT. It
converts any `PartialArray`s with all its elements filled into normal
arrays. There are some criteria for densification to happen
successfully. If any of these are not met, it will not densify the PA:

- `pa.mask` must be all true (this is the definition of dense)
- There must be no ArrayLikeBlocks
- There must be no VarNamedTuples inside
- There must be no PartialArrays inside (which can't themselves be
densified)
- The PartialArray must not contain a GrowableArray

The result is:

```julia
julia> new_vnt = densify!!(vnt)
VarNamedTuple
└─ x => [1.0, 2.0]

julia> keys(new_vnt)
1-element Vector{VarName}:
 x
```

Currently, this function is only used inside the `ParamsWithStats`
constructor. This means that the only place where this has an impact is
when accessing results from inference algorithms, e.g., the chains' keys
when using MCMC, or the ModeResult when using optimisation.

Specifically, for MCMC sampling, because MCMCChains will split up the
vector values *anyway* into `x[1]` and `x[2]` (essentially undoing the
densification here), there will only really be a noticeable difference
with FlexiChains. I haven't updated FlexiChains for the new DPPL version
yet, so I can't demonstrate this.

However, I would expect that once all the necessary compatibility things
have been updated, sampling from the following model

```julia
@model function f()
    x = Vector{Float64}(undef, 2)
    x[1] ~ Normal()
    x[2] ~ Normal()
end
```

with FlexiChains should give a single key, `x`, rather than two separate
keys `x[1]` and `x[2]` as is currently the case.

There are some other places where we might want to consider doing this,
e.g. in `pointwise_logdensities`. One could argue that we should just do
it whenever we can. I'm a bit hesitant to immediately go down that
route, because `densify!!` is type unstable. (It cannot be type stable,
because it must check `all(pa.mask)`, which can only be done at
runtime.) So I think it's safer to start small, and expand it where it
makes sense to.
Closes #1232

This PR 'closes the loop' between LogDensityFunctions and models, in
other words, it makes sure that anything you want to do with a model can
be done with an LDF, and vice versa.

It does so by providing safe conversions between vectorised parameters,
initialisation strategies, and accumulators.

Specifically, it allows you to:

- **Given a vector of parameters and an LDF, generate an initialisation
strategy that lets you run a model.** (i.e., the equivalent of
`unflatten(vi, x); evaluate!!(model, vi)` but without using a VarInfo)

This was technically already in the codebase (it used to be called
`InitFromParams{<:VectorWithRanges}`, and LDF used it internally -- this
is exactly what FastLDF uses 😉). This PR just cleans up the constructors
and exports it as `InitFromVector(vector, ldf)` so that other people can
also use it.

- **Given an initialisation strategy and an LDF, generate a set of
vectorised parameters** (i.e., the equivalent of `vi = VarInfo(model),
vi[:]` but again without using a VarInfo)

Apart from removing the need for a VarInfo, the new functions also do so
in a way that ensures correctness (as far as possible). The old approach
was riddled with potential correctness issues and as a user you had to
be very much on top of things to make sure that you didn't screw
something up.*

See docs for more info:
https://turinglang.org/DynamicPPL.jl/previews/PR1267/ldf/models/

----------

\* Example:

```julia
julia> using DynamicPPL, Distributions, LogDensityProblems

julia> @model f() = x ~ Beta(2, 2); model = f(); ldf = LogDensityFunction(f()); # Unlinked!

julia> LogDensityProblems.logdensity(ldf, [0.5]), logpdf(Beta(2, 2), 0.5)
(0.4054651081081644, 0.4054651081081644)

julia> # Different model, different variable name, different link status...
       @model g() = y ~ Beta(2, 2); vi = link!!(VarInfo(g()), g());
g (generic function with 2 methods)

julia> # But by calling vi[:] we lose all that information...
       ps = vi[:]
1-element Vector{Float64}:
 1.7940959094517983

julia> # This is completely meaningless!
        LogDensityProblems.logdensity(ldf, ps)
-Inf
```

With the current approach, it will tell you you're doing something
silly:

```julia
julia> oavi = OnlyAccsVarInfo(VectorValueAccumulator());

julia> vvals = get_vector_values(last(init!!(g(), oavi, InitFromPrior(), LinkAll())))
VarNamedTuple
└─ y => LinkedVectorValue{Vector{Float64}, ComposedFunction{DynamicPPL.UnwrapSingletonTransform{Tuple{}}, ComposedFunction{Bijectors.Inverse{Bijectors.Logit{Float64, Float64}}, DynamicPPL.ReshapeTransform{Tuple{Int64}, Tuple{}}}}, Tuple{}}([1.7044738291818293], DynamicPPL.UnwrapSingletonTransform{Tuple{}}(()) ∘ (Bijectors.Inverse{Bijectors.Logit{Float64, Float64}}(Bijectors.Logit{Float64, Float64}(0.0, 1.0)) ∘ DynamicPPL.ReshapeTransform{Tuple{Int64}, Tuple{}}((1,), ())), ())

julia> to_vector_input(vvals, ldf)
ERROR: type NamedTuple has no field y
[...]
```

---------

Co-authored-by: Xianda Sun <5433119+sunxd3@users.noreply.github.com>
Closes #1272 

```julia
using DynamicPPL, Distributions, Chairmarks, Test
@model function f()
    x ~ Normal()
    y ~ LogNormal()
    return nothing
end
model = f()
accs = OnlyAccsVarInfo(VectorValueAccumulator())
_, accs = init!!(model, accs, InitFromPrior(), LinkAll())
vector_values = get_vector_values(accs)
ldf = LogDensityFunction(model, getlogjoint_internal, vector_values)
init_strategy = InitFromParams(VarNamedTuple(; x=5.0, y=0.6))

# Prior to this PR, this was the recommended way of getting new vectorised params
function f(ldf, init_strat)
    tfm_strat = ldf.transform_strategy
    accs = OnlyAccsVarInfo(VectorValueAccumulator())
    _, accs = init!!(ldf.model, accs, init_strat, tfm_strat)
    vvals = get_vector_values(accs)
    return to_vector_input(vvals, ldf)
end

# This PR introduces this
function g(ldf, init_strat)
    tfm_strat = ldf.transform_strategy
    accs = OnlyAccsVarInfo(VectorParamAccumulator(ldf))
    _, accs = init!!(ldf.model, accs, init_strat, tfm_strat)
    return get_vector_params(accs)
end

# The old way with `vi[:]` is this
vi = VarInfo(model, InitFromPrior(), ldf.transform_strategy)
function h(ldf, vi, init_strat)
    tfm_strat = ldf.transform_strategy
    _, vi = init!!(ldf.model, vi, init_strat, tfm_strat)
    return vi[:]
end

@test f(ldf, init_strategy) ≈ [5.0, log(0.6)]
@test g(ldf, init_strategy) ≈ [5.0, log(0.6)]
@test h(ldf, vi, init_strategy) ≈ [5.0, log(0.6)]

julia> @be f($ldf, $init_strategy)
Benchmark: 2750 samples with 20 evaluations
 min    1.363 μs (57 allocs: 2.109 KiB)
 median 1.431 μs (57 allocs: 2.109 KiB)
 mean   1.712 μs (57 allocs: 2.109 KiB, 0.07% gc time)
 max    421.446 μs (57 allocs: 2.109 KiB, 99.12% gc time)

julia> @be g($ldf, $init_strategy)
Benchmark: 1257 samples with 201 evaluations
 min    126.453 ns (16 allocs: 528 bytes)
 median 136.398 ns (16 allocs: 528 bytes)
 mean   534.414 ns (16 allocs: 528 bytes, 0.24% gc time)
 max    378.881 μs (16 allocs: 528 bytes, 99.92% gc time)

julia> @be h($ldf, $vi, $init_strategy)
Benchmark: 2699 samples with 293 evaluations
 min    94.000 ns (10 allocs: 336 bytes)
 median 98.406 ns (10 allocs: 336 bytes)
 mean   119.215 ns (10 allocs: 336 bytes, 0.15% gc time)
 max    17.365 μs (10 allocs: 336 bytes, 98.83% gc time)
```

Comparing the two accumulator-based approaches, the one in this PR is
_clearly_ better.

## Why is this *still* slower than using a VarInfo?

It might seem like I'm not being completely honest when I claim that
OnlyAccsVarInfo is faster than VarInfo.

_In general_ there are indeed some cases where VarInfo is still faster,
because it reuses its internal VNT storage, and doesn't have to create
new arrays on the fly. But in this specific instance, that's irrelevant.
In this case it's actually to do with some of the checks that are
encoded in this accumulator, in particular in the `set_indices` field:


https://github.com/TuringLang/DynamicPPL.jl/blob/1167bc60c1fb775c31d2d52dc59917d068e0a256/src/accumulators/vector_params.jl#L1-L5

Keeping track of which indices were set allows us to fix two problems:

(1) The problem with `vi[:]` is that it's quite dangerous. You can
flatten any parameters inside any VarInfo and it doesn't have any
consistency checks with the LDF. For example, you have no control over
the order of variables, you don't know if you have enough parameters,
too many parameters, etc etc

(2) Threadsafe evaluation requires us to collect these parameters in
separate accumulators and then merge them later. If we don't keep track
of which parameters were set in each accumulator, we can't later merge
them. The benefit of having this info is that we can actually do it in a
threadsafe manner. Something like this

```julia
@model function f()
    x = Vector{Float64}(undef, 100)
    @threads for i in 1:100
        x[i] ~ Normal()
    end
end
```

will work correctly with this accumulator, whereas it will error with
VarInfo.

If you completely remove the `set_indices` field and its associated
checks, the accumulator *is* indeed faster than VarInfo; but I don't
think that's worth it.

```julia
julia> @be g($ldf, $init_strategy)
Benchmark: 2015 samples with 338 evaluations
 min    81.976 ns (7 allocs: 240 bytes)
 median 86.417 ns (7 allocs: 240 bytes)
 mean   245.184 ns (7 allocs: 240 bytes, 0.20% gc time)
 max    224.432 μs (7 allocs: 240 bytes, 99.92% gc time)
```
Closes #1029

Closes #1213
Closes #1275
Closes #1276

DOESN'T close #1157 (unfortunately)

-----

This PR reworks DebugAccumulator such that it does two kinds of checks:

1. Every time it sees a tilde-statement, it does some checks.

2. At the end of the model evaluation, it does more checks.

The current problem with DebugAccumulator is that right now it attempts
to do all its checks inside accumulate_assume!! (i.e. type 1 above).
This is wrong for two reasons:

- It doesn't have full knowledge of what's to come.
- If TSVI is used, this also means that it can only detect problems with
tilde-statements that are run in its own thread.

As a bonus, we add the ability here to check for discrete distributions,
which is useful for HMC/NUTS or other gradient-based methods like
optimisation.
…alse`/`nothing` (#1277)

Closes #1255.

I thought this would have no impact on performance, but weirdly it makes
`subset` about 3x faster on some models. I have no idea why. I'll take
it, though.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants