diff --git a/_quarto.yml b/_quarto.yml
index 99af27e14..720098fc0 100644
--- a/_quarto.yml
+++ b/_quarto.yml
@@ -97,6 +97,13 @@ website:
               - developers/compiler/minituring-contexts/index.qmd
               - developers/compiler/design-overview/index.qmd
 
+          - section: "Variable Transformations"
+            collapse-level: 1
+            contents:
+              - developers/transforms/distributions/index.qmd
+              - developers/transforms/bijectors/index.qmd
+              - developers/transforms/dynamicppl/index.qmd
+
           - section: "Inference (note: outdated)"
             collapse-level: 1
             contents:
@@ -195,3 +202,6 @@ using-turing-abstractmcmc: developers/inference/abstractmcmc-turing
 using-turing-interface: developers/inference/abstractmcmc-interface
 using-turing-variational-inference: developers/inference/variational-inference
 using-turing-implementing-samplers: developers/inference/implementing-samplers
+dev-transforms-distributions: developers/transforms/distributions
+dev-transforms-bijectors: developers/transforms/bijectors
+dev-transforms-dynamicppl: developers/transforms/dynamicppl
diff --git a/developers/transforms/bijectors/index.qmd b/developers/transforms/bijectors/index.qmd
new file mode 100644
index 000000000..e8c8bb840
--- /dev/null
+++ b/developers/transforms/bijectors/index.qmd
@@ -0,0 +1,340 @@
+---
+title: "Bijectors in MCMC"
+engine: julia
+---
+
+```{julia}
+#| echo: false
+#| output: false
+using Pkg;
+Pkg.instantiate();
+```
+
+All the above has purely been a mathematical discussion of how distributions can be transformed.
+Now, we turn to their implementation in Julia, specifically using the [Bijectors.jl package](https://github.com/TuringLang/Bijectors.jl).
+
+## Bijectors.jl
+
+```{julia}
+import Random
+Random.seed!(468);
+
+using Distributions: Normal, LogNormal, logpdf
+using Statistics: mean, var
+using Plots: histogram
+```
+
+A _bijection_ between two sets ([Wikipedia](https://en.wikipedia.org/wiki/Bijection)) is, essentially, a one-to-one mapping between the elements of these sets.
+That is to say, if we have two sets $X$ and $Y$, then a bijection maps each element of $X$ to a unique element of $Y$.
+To return to our univariate example, where we transformed $x$ to $y$ using $y = \exp(x)$, the exponentiation function is a bijection because every value of $x$ maps to one unique value of $y$.
+The input set (the domain) is $(-\infty, \infty)$, and the output set (the codomain) is $(0, \infty)$.
+(Here, $(a, b)$ denotes the open interval from $a$ to $b$ but excluding $a$ and $b$ themselves.)
+
+Since bijections are a one-to-one mapping between elements, we can also reverse the direction of this mapping to create an inverse function. 
+In the case of $y = \exp(x)$, the inverse function is $x = \log(y)$.
+
+::: {.callout-note}
+Technically, the bijections in Bijectors.jl are functions $f: X \to Y$ for which:
+
+ - $f$ is continuously differentiable, i.e. the derivative $\mathrm{d}f(x)/\mathrm{d}x$ exists and is continuous (over the domain of interest $X$);
+- If $f^{-1}: Y \to X$ is the inverse of $f$, then that is also continuously differentiable (over _its_ own domain, i.e. $Y$).
+
+The technical mathematical term for this is a diffeomorphism ([Wikipedia](https://en.wikipedia.org/wiki/Diffeomorphism)), but we call them 'bijectors'.
+
+When thinking about continuous differentiability, it's important to be conscious of the domains or codomains that we care about.
+For example, taking the inverse function $\log(y)$ from above, its derivative is $1/y$, which is not continuous at $y = 0$.
+However, we specified that the bijection $y = \exp(x)$ maps values of $x \in (-\infty, \infty)$ to $y \in (0, \infty)$, so the point $y = 0$ is not within the domain of the inverse function.
+:::
+
+Specifically, one of the primary purposes of Bijectors.jl is to construct _bijections which map constrained distributions to unconstrained ones_.
+For example, the log-normal distribution which we saw in [the previous page]({{< meta dev-transforms-distributions >}}) is constrained: its _support_, i.e. the range over which $p(x) > 0$, is $(0, \infty)$.
+However, we can transform that to an unconstrained distribution (the normal distribution) using the transformation $y = \log(x)$.
+
+::: {.callout-note}
+Bijectors.jl, as well as DynamicPPL (which we'll come to later), can work with a much broader class of bijective transformations of variables, not just ones that go to the entire real line.
+But for the purposes of MCMC, unconstraining is the most common transformation, so we'll stick with that terminology.
+:::
+
+
+The `bijector` function, when applied to a distribution, returns a bijection $f$ that can be used to map the constrained distribution to an unconstrained one.
+Unsurprisingly, for the log-normal distribution, the bijection is (a broadcasted version of) the $\log$ function.
+
+```{julia}
+import Bijectors as B
+
+f = B.bijector(LogNormal())
+```
+
+We can apply this transformation to samples from the original distribution, for example:
+
+```{julia}
+samples_lognormal = rand(LogNormal(), 5)
+
+samples_normal = f(samples_lognormal)
+```
+
+We can also obtain the inverse of a bijection, $f^{-1}$:
+
+```{julia}
+f_inv = B.inverse(f)
+
+f_inv(samples_normal) == samples_lognormal
+```
+
+We know that the transformation $y = \log(x)$ changes the log-normal distribution to the normal distribution.
+Bijectors.jl also gives us a way to access that transformed distribution:
+
+```{julia}
+transformed_dist = B.transformed(LogNormal(), f)
+```
+
+This type doesn't immediately look like a `Normal()`, but it behaves in exactly the same way.
+For example, we can sample from it and plot a histogram:
+
+```{julia}
+samples_plot = rand(transformed_dist, 5000)
+histogram(samples_plot, bins=50)
+```
+
+We can also obtain the logpdf of the transformed distribution and check that it is the same as that of a normal distribution:
+
+```{julia}
+println("Sample:   $(samples_plot[1])")
+println("Expected: $(logpdf(Normal(), samples_plot[1]))")
+println("Actual:   $(logpdf(transformed_dist, samples_plot[1]))")
+```
+
+Given the discussion in the previous sections, you might not be surprised to find that the logpdf of the transformed distribution is implemented using the Jacobian of the transformation.
+In particular, it [directly uses](https://github.com/TuringLang/Bijectors.jl/blob/f52a9c52ede1f43155239447601387eb1dafe394/src/Bijectors.jl#L242-L255) the formula
+
+$$\log(q(\mathbf{y})) = \log(p(\mathbf{x})) - \log(|\det(\mathbf{J})|).$$
+
+You can access $\log(|\det(\mathbf{J})|)$ (evaluated at the point $\mathbf{x}$) using the `logabsdetjac` function:
+
+```{julia}
+# Reiterating the setup, just to be clear
+original_dist = LogNormal()
+x = rand(original_dist)
+f = B.bijector(original_dist)
+y = f(x)
+transformed_dist = B.transformed(LogNormal(), f)
+
+println("log(q(y))     : $(logpdf(transformed_dist, y))")
+println("log(p(x))     : $(logpdf(original_dist, x))")
+println("log(|det(J)|) : $(B.logabsdetjac(f, x))")
+```
+
+from which you can see that the equation above holds.
+There are more functions available in the Bijectors.jl API; for full details do check out the [documentation](https://turinglang.org/Bijectors.jl/stable/).
+For example, `logpdf_with_trans` can directly give us $\log(q(\mathbf{y}))$ without going through the effort of constructing the bijector:
+
+```{julia}
+B.logpdf_with_trans(original_dist, x, true)
+```
+
+## The case for bijectors in MCMC
+
+Constraints pose a challenge for many numerical methods such as optimisation, and sampling is no exception to this.
+The problem is that for any value $x$ outside of the support of a constrained distribution, $p(x)$ will be zero, and the logpdf will be $-\infty$.
+Thus, any term that involves some ratio of probabilities (or equivalently, the logpdf)  will be infinite.
+
+### Metropolis with rejection
+
+To see the practical impact of this on sampling, let's attempt to sample from a log-normal distribution using a random walk Metropolis algorithm.
+
+One way of handling constraints is to simply reject any steps that would take us out of bounds.
+This is a barebones implementation which does precisely that:
+
+```{julia}
+# Take a step where the proposal is a normal distribution centred around
+# the current value. Return the new value, plus a flag to indicate whether
+# the new value was in bounds.
+function mh_step(logp, x, in_bounds)
+    x_proposed = rand(Normal(x, 1))
+    in_bounds(x_proposed) || return (x, false)  # bounds check
+    acceptance_logp = logp(x_proposed) - logp(x)
+    return if log(rand()) < acceptance_logp
+        (x_proposed, true)  # successful step
+    else
+        (x, true)  # failed step
+    end
+end
+
+# Run a random walk Metropolis sampler.
+# `logp`      : a function that takes `x` and returns the log pdf of the
+#               distribution we're trying to sample from (up to a constant
+#               additive factor)
+# `n_samples` : the number of samples to draw
+# `in_bounds` : a function that takes `x` and returns whether `x` is within
+#               the support of the distribution
+# `x0`        : the initial value
+# Returns a vector of samples, plus the number of times we went out of bounds.
+function mh(logp, n_samples, in_bounds; x0=1.0)
+    samples = [x0]
+    x = x0
+    n_out_of_bounds = 0
+    for _ in 2:n_samples
+        x, inb = mh_step(logp, x, in_bounds)
+        if !inb
+            n_out_of_bounds += 1
+        end
+        push!(samples, x)
+    end
+    return (samples, n_out_of_bounds)
+end
+```
+
+::: {.callout-note}
+In the MH algorithm, we technically do not need to explicitly check the proposal, because for any $x \leq 0$, we have that $p(x) = 0$; thus, the acceptance probability will be zero.
+However, doing so here allows us to track how often this happens, and also illustrates the general principle of handling constraints by rejection.
+:::
+
+Now to actually perform the sampling:
+
+```{julia}
+logp(x) = logpdf(LogNormal(), x)
+samples, n_out_of_bounds = mh(logp, 10000, x -> x > 0)
+histogram(samples, bins=0:0.1:5; xlims=(0, 5))
+```
+
+How do we know that this has sampled correctly?
+For one, we can check that the mean of the samples are what we expect them to be.
+From [Wikipedia](https://en.wikipedia.org/wiki/Log-normal_distribution), the mean of a log-normal distribution is given by $\exp[\mu + (\sigma^2/2)]$.
+For our log-normal distribution, we set $\mu = 0$ and $\sigma = 1$, so:
+
+```{julia}
+println("expected mean: $(exp(0 + (1^2/2)))")
+println("  actual mean: $(mean(samples))")
+```
+
+### Metropolis with transformation
+
+The issue with this is that many of the sampling steps are unproductive, in that they bring us to the region of $x \leq 0$ and get rejected:
+
+```{julia}
+println("went out of bounds $n_out_of_bounds/10000 times")
+```
+
+And this could have been even worse if we had chosen a wider proposal distribution in the Metropolis step, or if the support of the distribution was narrower!
+In general, we probably don't want to have to re-parameterise our proposal distribution each time we sample from a distribution with different constraints.
+
+This is where the transformation functions from Bijectors.jl come in: we can use them to map the distribution to an unconstrained one and sample from *that* instead.
+Since the sampler only ever sees an unconstrained distribution, it doesn't have to worry about checking for bounds.
+
+To make this happen, instead of passing $\log(p(x))$ to the sampler, we pass $\log(q(y))$.
+This can be obtained using the `Bijectors.logpdf_with_trans` function that was introduced above.
+
+```{julia}
+d = LogNormal()
+f = B.bijector(d)     # Transformation function
+f_inv = B.inverse(f)  # Inverse transformation function
+function logq(y)
+    x = f_inv(y)
+    return B.logpdf_with_trans(d, x, true)
+end
+samples_transformed, n_oob_transformed = mh(logq, 10000, x -> true);
+```
+
+Now, this process gives us samples that have been transformed, so we need to un-transform them to get the samples from the original distribution:
+
+```{julia}
+samples_untransformed = f_inv(samples_transformed)
+histogram(samples_untransformed, bins=0:0.1:5; xlims=(0, 5))
+```
+
+We can check the mean of the samples too, to see that it is what we expect:
+
+```{julia}
+println("expected mean: $(exp(0 + (1^2/2)))")
+println("  actual mean: $(mean(samples_untransformed))")
+```
+
+On top of that, we can also verify that we don't ever go out of bounds:
+
+```{julia}
+println("went out of bounds $n_oob_transformed/10000 times")
+```
+
+### Which one is better?
+
+In the subsections above, we've seen two different methods of sampling from a constrained distribution:
+
+1. Sample directly from the distribution and reject any samples outside of its support.
+2. Transform the distribution to an unconstrained one and sample from that instead.
+
+(Note that both of these methods are applicable to other samplers as well, such as Hamiltonian Monte Carlo.)
+
+Of course, a natural question to then ask is which one of these is better!
+
+One option might be look at the sample means above to see which one is 'closer' to the expected mean.
+However, that's not a very robust method because the sample mean is itself random, and if we were to use a different random seed we might well reach a different conclusion.
+
+Another possibility we could look at the number of times the sample was rejected.
+Does a lower rejection rate (as in the transformed case) imply that the method is better?
+As it happens, this might seem like an intuitive conclusion, but it's not necessarily the case: for example, the sampling in unconstrained space could be much less efficient, such that even though we're not _rejecting_ samples, the ones that we do get are overly correlated and thus not representative of the distribution.
+
+A robust comparison would involve performing both methods many times and seeing how _reliable_ the sample mean is.
+
+```{julia}
+function get_sample_mean(; transform)
+    if transform
+       # Sample from transformed distribution
+       samples = f_inv(first(mh(logq, 10000, x -> true)))
+    else
+       # Sample from original distribution and reject if out of bounds
+       samples = first(mh(logp, 10000, x -> x > 0))
+    end
+    return mean(samples)
+end
+```
+
+```{julia}
+means_with_rejection = [get_sample_mean(; transform=false) for _ in 1:1000]
+mean(means_with_rejection), var(means_with_rejection)
+```
+
+```{julia}
+means_with_transformation = [get_sample_mean(; transform=true) for _ in 1:1000]
+mean(means_with_transformation), var(means_with_transformation)
+```
+
+We can see from this small study that although both methods give us the correct mean (on average), the method with the transformation is more reliable, in that the variance is much lower!
+
+::: {.callout-note}
+Alternatively, we could also try to directly measure how correlated the samples are.
+One way to do this is to calculate the _effective sample size_ (ESS), which is described in [the Stan documentation](https://mc-stan.org/docs/reference-manual/analysis.html#effective-sample-size.section), and implemented in [MCMCChains.jl](https://github.com/TuringLang/MCMCChains.jl/).
+A larger ESS implies that the samples are less correlated, and thus more representative of the underlying distribution:
+
+```{julia}
+using MCMCChains: Chains, ess
+
+rejection = first(mh(logp, 10000, x -> x > 0))
+transformation = f_inv(first(mh(logq, 10000, x -> true)))
+chn = Chains(hcat(rejection, transformation), [:rejection, :transformation])
+ess(chn)
+```
+:::
+
+### What happens without the Jacobian?
+
+In the transformation method above, we used `Bijectors.logpdf_with_trans` to calculate the log probability density of the transformed distribution.
+This function makes sure to include the Jacobian term when performing the transformation, and this is what makes sure that when we un-transform the samples, we get the correct distribution.
+
+The next code block shows what happens if we don't include the Jacobian term.
+In this `logq_wrong`, we've un-transformed `y` to `x` and calculated the logpdf with respect to its original distribution.
+This is exactly the same mistake that we made at the start of this article with `naive_logpdf`.
+
+```{julia}
+function logq_wrong(y)
+    x = f_inv(y)
+    return logpdf(d, x)  # no Jacobian term!
+end
+samples_questionable, _ = mh(logq_wrong, 100000, x -> true)
+samples_questionable_untransformed = f_inv(samples_questionable)
+
+println("mean: $(mean(samples_questionable_untransformed))")
+```
+
+You can see that even though we used ten times more samples, the mean is quite wrong, which implies that our samples are not being drawn from the correct distribution.
+
+In the next page, we'll see how to use these transformations in the context of a probabilistic programming language, paying particular attention to their handling in DynamicPPL.
diff --git a/developers/transforms/distributions/index.qmd b/developers/transforms/distributions/index.qmd
new file mode 100644
index 000000000..55f1b2da4
--- /dev/null
+++ b/developers/transforms/distributions/index.qmd
@@ -0,0 +1,316 @@
+---
+title: "Distributions and the Jacobian"
+engine: julia
+---
+
+```{julia}
+#| echo: false
+#| output: false
+using Pkg;
+Pkg.instantiate();
+```
+
+This series of articles will seek to motivate the [Bijectors.jl package](https://github.com/TuringLang/Bijectors.jl/), which provides the tools for transforming distributions in the Turing.jl probabilistic programming language.
+
+It assumes:
+
+- some basic knowledge of probability distributions (the notions of sampling from them and calculating the probability density function for a given distribution); and
+- some calculus (the chain and product rules for differentiation, and changes of variables in integrals).
+
+```{julia}
+import Random
+Random.seed!(468);
+
+using Distributions: Normal, LogNormal, logpdf, Distributions
+using Plots: histogram
+```
+
+## Sampling from a distribution
+
+To sample from a distribution (as defined in [Distributions.jl](https://juliastats.org/Distributions.jl/)), we can use the `rand` function.
+Let's sample from a normal distribution and then plot a histogram of the samples.
+
+```{julia}
+samples = rand(Normal(), 5000)
+histogram(samples, bins=50)
+```
+
+(Calling `Normal()` without any arguments, as we do here, gives us a normal distribution with mean 0 and standard deviation 1.)
+If you want to know the log probability density of observing any of the samples, you can use `logpdf`:
+
+```{julia}
+println("sample: $(samples[1])")
+println("logpdf: $(logpdf(Normal(), samples[1]))")
+```
+
+The probability density function for the normal distribution with mean 0 and standard deviation 1 is
+
+$$p(x) = \frac{1}{\sqrt{2\pi}} \exp{\left(-\frac{x^2}{2}\right)},$$
+
+so we could also have calculated this manually using:
+
+```{julia}
+log(1 / sqrt(2π) * exp(-samples[1]^2 / 2))
+```
+
+(or more efficiently, `-(samples[1]^2 + log2π) / 2`, where `log2π` is from the [IrrationalConstants.jl package](https://github.com/JuliaMath/IrrationalConstants.jl)).
+
+## Sampling from a transformed distribution
+
+Say that $x$ is distributed according to `Normal()`, and we want to draw samples of $y = \exp(x)$.
+Now, $y$ is itself a random variable, and like any other random variable, will have a probability distribution, which we'll call $q(y)$.
+
+In this specific case, the distribution of $y$ is known as a [log-normal distribution](https://en.wikipedia.org/wiki/Log-normal_distribution).
+For the purposes of this tutorial, let's implement our own `MyLogNormal` distribution that we can sample from.
+(Distributions.jl already defines its own `LogNormal`, so we have to use a different name.)
+To do this, we need to overload `Base.rand` for our new distribution.
+
+```{julia}
+struct MyLogNormal <: Distributions.ContinuousUnivariateDistribution
+    μ::Float64
+    σ::Float64
+end
+MyLogNormal() = MyLogNormal(0.0, 1.0)
+
+function Base.rand(rng::Random.AbstractRNG, d::MyLogNormal)
+  exp(rand(rng, Normal(d.μ, d.σ)))
+end
+```
+
+Now we can do the same as above:
+
+```{julia}
+samples_lognormal = rand(MyLogNormal(), 5000)
+# Cut off the tail for clearer visualisation
+histogram(samples_lognormal, bins=0:0.1:5; xlims=(0, 5))
+```
+
+How do we implement `logpdf` for our new distribution, though?
+Or in other words, if we observe a sample $y$, how do we know what the probability of drawing that sample was?
+
+Naively, we might think to just un-transform the variable `y` by reversing the exponential, i.e. taking the logarithm.
+We could then use the `logpdf` of the original distribution of `x`.
+
+```{julia}
+naive_logpdf(d::MyLogNormal, y) = logpdf(Normal(d.μ, d.σ), log(y))
+```
+
+We can compare this function against the logpdf implemented in Distributions.jl:
+
+```{julia}
+println("Sample   : $(samples_lognormal[1])")
+println("Expected : $(logpdf(LogNormal(), samples_lognormal[1]))")
+println("Actual   : $(naive_logpdf(MyLogNormal(), samples_lognormal[1]))")
+```
+
+Clearly this approach is not quite correct!
+
+## The derivative
+
+The reason why this doesn't work is because transforming a (continuous) distribution causes probability density to be stretched and otherwise moved around.
+For example, in the normal distribution, half of the probability density is between $-\infty$ and $0$, and half is between $0$ and $\infty$.
+When exponentiated (i.e. in the log-normal distribution), the first half of the density is mapped to the interval $(0, 1)$, and the second half to $(1, \infty)$.
+
+This 'explanation' on its own does not really mean much, though.
+A perhaps more useful approach is to not talk about _probability densities_, but instead to make it more concrete by relating them to actual _probabilities_.
+If we think about the normal distribution as a continuous curve, what the probability density function $p(x)$ really tells us is that: for any two points $a$ and $b$ (where $a \leq b$), the probability of drawing a sample between $a$ and $b$ is the corresponding area under the curve, i.e.
+
+$$\int_a^b p(x) \, \mathrm{d}x.$$
+
+For example, if $(a, b) = (-\infty, \infty)$, then the probability of drawing a sample between $a$ and $b$ is 1.
+
+Let's say that the probability density function of the log-normal distribution is $q(y)$.
+Then, the area under the curve between the two points $\exp(a)$ and $\exp(b)$ is:
+
+$$\int_{\exp(a)}^{\exp(b)} q(y) \, \mathrm{d}y.$$
+
+This integral should be equal to the one above, because the probability of drawing from $[a, b]$ in the original distribution should be the same as the probability of drawing from $[\exp(a), \exp(b)]$ in the transformed distribution.
+The question we have to solve here is: how do we find a function $q(y)$ such that this equality holds?
+
+We can approach this by making the substitution $y = \exp(x)$ in the first integral (see [Wikipedia](https://en.wikipedia.org/wiki/Integration_by_substitution) for a refresher on substitutions in integrals, if needed).
+We have that:
+
+$$\frac{\mathrm{d}y}{\mathrm{d}x} = \exp(x) = y \implies \mathrm{d}x = \frac{1}{y}\,\mathrm{d}y$$
+
+and so
+
+$$\int_{x=a}^{x=b} p(x) \, \mathrm{d}x
+  = \int_{y=\exp(a)}^{y=\exp(b)} p(\log(y)) \frac{1}{y} \,\mathrm{d}y
+  = \int_{\exp(a)}^{\exp(b)} q(y) \, \mathrm{d}y,
+$$
+
+from which we can read off $q(y) = p(\log(y)) / y$.
+
+In contrast, when we implemented `naive_logpdf`
+
+```{julia}
+naive_logpdf(d::MyLogNormal, y) = logpdf(Normal(d.μ, d.σ), log(y))
+```
+
+that was the equivalent of saying that $q(y) = p(\log(y))$.
+We left out a factor of $1/y$!
+
+Indeed, now we can define the correct `logpdf` function.
+Since everything is a logarithm here, instead of multiplying by $1/y$ we subtract $\log(y)$:
+
+```{julia}
+Distributions.logpdf(d::MyLogNormal, y) = logpdf(Normal(d.μ, d.σ), log(y)) - log(y)
+```
+
+and check that it works:
+
+```{julia}
+println("Sample   : $(samples_lognormal[1])")
+println("Expected : $(logpdf(LogNormal(), samples_lognormal[1]))")
+println("Actual   : $(logpdf(MyLogNormal(), samples_lognormal[1]))")
+```
+
+The same process can be applied to any kind of (invertible) transformation.
+If we have some transformation from $x$ to $y$, and the probability density functions of $x$ and $y$ are $p(x)$ and $q(y)$ respectively, then we have a general formula that:
+
+$$q(y) = p(x) \left| \frac{\mathrm{d}x}{\mathrm{d}y} \right|.$$
+
+In this case, we had $y = \exp(x)$, so $\mathrm{d}x/\mathrm{d}y = 1/y$.
+(This equation is (11.5) in Bishop's textbook.)
+
+::: {.callout-note}
+The absolute value here takes care of the case where $f$ is a decreasing function, i.e., $f(x) > f(y)$ when $x < y$.
+You can try this out with the transformation $y = -\exp(x)$.
+If $a < b$, then $-\exp(a) > -\exp(b)$, and so you will have to swap the integration limits to ensure that the integral comes out positive.
+:::
+
+Note that $\mathrm{d}y/\mathrm{d}x$ is equal to $(\mathrm{d}x/\mathrm{d}y)^{-1}$, so the formula above can also be written as:
+
+$$q(y) \left| \frac{\mathrm{d}y}{\mathrm{d}x} \right| = p(x).$$
+
+## The Jacobian
+
+In general, we may have transforms that act on multivariate distributions: for example, something mapping $p(x_1, x_2)$ to $q(y_1, y_2)$.
+In this case, we need to extend the rule above by introducing what is known as the Jacobian matrix:
+
+In this case, the rule above has to be extended by replacing the derivative $\mathrm{d}x/\mathrm{d}y$ with the determinant of the inverse Jacobian matrix:
+
+$$\mathbf{J} = \begin{pmatrix}
+\partial y_1/\partial x_1 & \partial y_1/\partial x_2 \\
+\partial y_2/\partial x_1 & \partial y_2/\partial x_2
+\end{pmatrix}.$$
+
+This allows us to write the direct generalisation as:
+
+$$q(y_1, y_2) \left| \det(\mathbf{J}) \right| = p(x_1, x_2),$$
+
+or equivalently,
+
+$$q(y_1, y_2) = p(x_1, x_2) \left| \det(\mathbf{J}^{-1}) \right|.$$
+
+where $\mathbf{J}^{-1}$ is the inverse of the Jacobian matrix.
+This is the same as equation (11.9) in Bishop.
+
+::: {.callout-note}
+Instead of inverting the original Jacobian matrix to get $\mathbf{J}^{-1}$, we could also use the Jacobian of the inverse function:
+
+$$\mathbf{J}_\text{inv} = \begin{pmatrix}
+\partial x_1/\partial y_1 & \partial x_1/\partial y_2 \\
+\partial x_2/\partial y_1 & \partial x_2/\partial y_2
+\end{pmatrix}.$$
+
+As it turns out, these are entirely equivalent: the Jacobian of the inverse function is the inverse of the original Jacobian matrix.
+:::
+
+The rest of this section will be devoted to an example to show that this works, and contains some slightly less pretty mathematics.
+If you are already suitably convinced by this stage, then you can skip the rest of this section.
+(Or if you prefer something more formal, the Wikipedia article on integration by substitution [discusses the multivariate case as well](https://en.wikipedia.org/wiki/Integration_by_substitution#Substitution_for_multiple_variables).)
+
+### An example: the Box–Muller transform
+
+A motivating example where one might like to use a Jacobian is the [Box–Muller transform](https://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform), which is a technique for sampling from a normal distribution.
+
+The Box–Muller transform works by first sampling two random variables from the uniform distribution between 0 and 1:
+
+$$\begin{align}
+x_1 &\sim U(0, 1) \\
+x_2 &\sim U(0, 1).
+\end{align}$$
+
+Both of these have a probability density function of $p(x) = 1$ for $0 < x \leq 1$, and 0 otherwise.
+Because they are independent, we can write that
+
+$$p(x_1, x_2) = p(x_1) p(x_2) = \begin{cases}
+1 & \text{if } 0 < x_1 \leq 1 \text{ and } 0 < x_2 \leq 1, \\
+0 & \text{otherwise}.
+\end{cases}$$
+
+The next step is to perform the transforms
+
+$$\begin{align}
+y_1 &= \sqrt{-2 \log(x_1)} \cos(2\pi x_2); \\
+y_2 &= \sqrt{-2 \log(x_1)} \sin(2\pi x_2),
+\end{align}$$
+
+and it turns out that with these transforms, both $y_1$ and $y_2$ are independent and normally distributed with mean 0 and standard deviation 1, i.e.
+
+$$q(y_1, y_2) = \frac{1}{2\pi} \exp{\left(-\frac{y_1^2}{2}\right)} \exp{\left(-\frac{y_2^2}{2}\right)}.$$
+
+How can we show that this is the case?
+
+There are many ways to work out the required calculus.
+Some are more elegant and some rather less so!
+One of the less headache-inducing ways is to define the intermediate variables:
+
+$$r = \sqrt{-2 \log(x_1)}; \quad \theta = 2\pi x_2,$$
+
+from which we can see that $y_1 = r\cos\theta$ and $y_2 = r\sin\theta$, and hence
+
+$$\begin{align}
+x_1 &= \exp{\left(-\frac{r^2}{2}\right)} = \exp{\left(-\frac{y_1^2}{2}\right)}\exp{\left(-\frac{y_2^2}{2}\right)}; \\
+x_2 &= \frac{\theta}{2\pi} = \frac{1}{2\pi} \, \arctan\left(\frac{y_2}{y_1}\right).
+\end{align}$$
+
+This lets us obtain the requisite partial derivatives in a way that doesn't involve _too_ much algebra.
+As an example, we have
+
+$$\frac{\partial x_1}{\partial y_1} = -y_1 \exp{\left(-\frac{y_1^2}{2}\right)}\exp{\left(-\frac{y_2^2}{2}\right)} = -y_1 x_1,$$
+
+(where we used the product rule), and
+
+$$\frac{\partial x_2}{\partial y_1} = \frac{1}{2\pi} \left(\frac{1}{1 + (y_2/y_1)^2}\right) \left(-\frac{y_2}{y_1^2}\right),$$
+
+(where we used the chain rule, and the derivative $\mathrm{d}(\arctan(a))/\mathrm{d}a = 1/(1 + a^2)$).
+
+Putting together the Jacobian matrix, we have:
+
+$$\mathbf{J} = \begin{pmatrix}
+-y_1 x_1 & -y_2 x_1 \\
+-cy_2/y_1^2 & c/y_1 \\
+\end{pmatrix},$$
+
+where $c = [2\pi(1 + (y_2/y_1)^2)]^{-1}$.
+The determinant of this matrix is
+
+$$\begin{align}
+\det(\mathbf{J}) &= -cx_1 - cx_1(y_2/y_1)^2 \\
+&= -cx_1\left[1 + \left(\frac{y_2}{y_1}\right)^2\right] \\
+&= -\frac{1}{2\pi} x_1 \\
+&= -\frac{1}{2\pi}\exp{\left(-\frac{y_1^2}{2}\right)}\exp{\left(-\frac{y_2^2}{2}\right)},
+\end{align}$$
+
+Coming right back to our probability density, we have that
+
+$$\begin{align}
+q(y_1, y_2) &= p(x_1, x_2) \cdot |\det(\mathbf{J})| \\
+&= \frac{1}{2\pi}\exp{\left(-\frac{y_1^2}{2}\right)}\exp{\left(-\frac{y_2^2}{2}\right)},
+\end{align}$$
+
+as desired.
+
+::: {.callout-note}
+We haven't yet explicitly accounted for the fact that $p(x_1, x_2)$ is 0 if either $x_1$ or $x_2$ are outside the range $(0, 1]$.
+For example, if this constraint on $x_1$ and $x_2$ were to result in inaccessible values of $y_1$ or $y_2$, then $q(y_1, y_2)$ should be 0 for those values.
+Formally, for the transformation $f: X \to Y$ where $X$ is the unit square (i.e. $0 < x_1, x_2 \leq 1$), $q(y_1, y_2)$ should only take the above value for the [image](https://en.wikipedia.org/wiki/Image_(mathematics)) of $f$, and anywhere outside of the image it should be 0.
+
+In our case, the $\log(x_1)$ term in the transform varies between 0 and $\infty$, and the $\cos(2\pi x_2)$ term ranges from $-1$ to $1$.
+Hence $y_1$, which is the product of these two terms, ranges from $-\infty$ to $\infty$, and likewise for $y_2$.
+So the image of $f$ is the entire real plane, and we don't have to worry about this.
+:::
+
+Having seen the theory that underpins how distributions can be transformed, let's now turn to how this is implemented in the Turing ecosystem.
diff --git a/developers/transforms/dynamicppl/dynamicppl_link.png b/developers/transforms/dynamicppl/dynamicppl_link.png
new file mode 100644
index 000000000..370c6d48f
Binary files /dev/null and b/developers/transforms/dynamicppl/dynamicppl_link.png differ
diff --git a/developers/transforms/dynamicppl/dynamicppl_link2.png b/developers/transforms/dynamicppl/dynamicppl_link2.png
new file mode 100644
index 000000000..5b99f542d
Binary files /dev/null and b/developers/transforms/dynamicppl/dynamicppl_link2.png differ
diff --git a/developers/transforms/dynamicppl/index.qmd b/developers/transforms/dynamicppl/index.qmd
new file mode 100644
index 000000000..ae7722d48
--- /dev/null
+++ b/developers/transforms/dynamicppl/index.qmd
@@ -0,0 +1,418 @@
+---
+title: "Variable transformations in DynamicPPL"
+engine: julia
+---
+
+```{julia}
+#| echo: false
+#| output: false
+using Pkg;
+Pkg.instantiate();
+```
+
+In the final part of this chapter, we'll discuss the higher-level implications of constrained distributions in the Turing.jl framework.
+
+## Linked and unlinked VarInfos in DynamicPPL
+
+```{julia}
+import Random
+Random.seed!(468);
+
+# Turing re-exports the entirety of Distributions
+using Turing
+```
+
+When we are performing Bayesian inference, we're trying to sample from a joint probability distribution, which isn't usually a single, well-defined distribution like in the rather simplified example above.
+However, each random variable in the model will have its own distribution, and often some of these will be constrained.
+For example, if `b ~ LogNormal()` is a random variable in a model, then $p(b)$ will be zero for any $b \leq 0$.
+Consequently, any joint probability $p(b, c, \ldots)$ will also be zero for any combination of parameters where $b \leq 0$, and so that joint distribution is itself constrained.
+
+To get around this, DynamicPPL allows the variables to be transformed in exactly the same way as above.
+For simplicity, consider the following model:
+
+```{julia}
+using DynamicPPL
+
+@model function demo()
+    x ~ LogNormal()
+end
+
+model = demo()
+vi = VarInfo(model)
+vn_x = @varname(x)
+# Retrieve the 'internal' representation of x – we'll explain this later
+DynamicPPL.getindex_internal(vi, vn_x)
+```
+
+The call to `VarInfo` executes the model once and stores the sampled value inside `vi`.
+By default, `VarInfo` itself stores un-transformed values.
+We can see this by comparing the value of the logpdf stored inside the `VarInfo`:
+
+```{julia}
+DynamicPPL.getlogp(vi)
+```
+
+with a manual calculation:
+
+```{julia}
+logpdf(LogNormal(), DynamicPPL.getindex_internal(vi, vn_x))
+```
+
+In DynamicPPL, the `link` function can be used to transform the variables.
+This function does three things: firstly, it transforms the variables; secondly, it updates the value of logp (by adding the Jacobian term); and thirdly, it sets a flag on the variables to indicate that it has been transformed.
+Note that this acts on _all_ variables in the model, including unconstrained ones.
+(Unconstrained variables just have an identity transformation.)
+
+```{julia}
+vi_linked = DynamicPPL.link(vi, model)
+println("Transformed value: $(DynamicPPL.getindex_internal(vi_linked, vn_x))")
+println("Transformed logp: $(DynamicPPL.getlogp(vi_linked))")
+println("Transformed flag: $(DynamicPPL.istrans(vi_linked, vn_x))")
+```
+
+Indeed, we can see that the new logp value matches with
+
+```{julia}
+logpdf(Normal(), DynamicPPL.getindex_internal(vi_linked, vn_x))
+```
+
+The reverse transformation, `invlink`, reverts all of the above steps:
+
+```{julia}
+vi = DynamicPPL.invlink(vi_linked, model)  # Same as the previous vi
+println("Un-transformed value: $(DynamicPPL.getindex_internal(vi, vn_x))")
+println("Un-transformed logp: $(DynamicPPL.getlogp(vi))")
+println("Un-transformed flag: $(DynamicPPL.istrans(vi, vn_x))")
+```
+
+### Model and internal representations
+
+In DynamicPPL, there is a difference between the value of a random variable and its 'internal' value.
+This is most easily seen by first transforming, and then comparing the output of `getindex` and `getindex_internal`.
+The former extracts the regular value, which we call the **model representation** (because it is consistent with the distribution specified in the model).
+The latter, as the name suggests, gets the **internal representation** of the variable, which is how it is actually stored in the VarInfo object.
+
+```{julia}
+println("   Model representation: $(getindex(vi_linked, vn_x))")
+println("Internal representation: $(DynamicPPL.getindex_internal(vi_linked, vn_x))")
+```
+
+::: {.callout-note}
+Note that `vi_linked[vn_x]` can also be used as shorthand for `getindex(vi_linked, vn_x)`; this usage is common in the DynamicPPL/Turing codebase.
+:::
+
+We can see (for this linked varinfo) that there are _two_ differences between these outputs:
+
+1. _The internal representation has been transformed using the bijector (in this case, the log function)._
+   This means that the `istrans()` flag which we used above doesn't modify the model representation: it only tells us whether the internal representation has been transformed or not.
+
+2. _The internal representation is a vector, whereas the model representation is a scalar._
+   This is because in DynamicPPL, _all_ internal values are vectorised (i.e. converted into some vector), regardless of distribution. On the other hand, since the model specifies a univariate distribution, the model representation is a scalar.
+
+One might also ask, what is the internal representation for an _unlinked_ varinfo?
+
+```{julia}
+println("   Model representation: $(getindex(vi, vn_x))")
+println("Internal representation: $(DynamicPPL.getindex_internal(vi, vn_x))")
+```
+
+For an unlinked VarInfo, the internal representation is vectorised, but not transformed.
+We call this an **unlinked internal representation**; conversely, when the VarInfo has been linked, each variable will have a corresponding **linked internal representation**.
+
+This sequence of events is summed up in the following diagram, where `f(..., args)` indicates that the `...` is to be replaced with the object at the beginning of the arrow:
+
+![Functions related to variable transforms in DynamicPPL](./dynamicppl_link.png)
+
+In the final part of this article, we'll take a more in-depth look at the internal DynamicPPL machinery that allows us to convert between representations and obtain the correct probability densities.
+Before that, though, we'll take a quick high-level look at how the HMC sampler in Turing.jl uses the functions introduced so far.
+
+## Case study: HMC in Turing.jl
+
+While DynamicPPL provides the _functionality_ for transforming variables, the transformation itself happens at an even higher level, i.e. in the sampler itself.
+The HMC sampler in Turing.jl is in [this file](https://github.com/TuringLang/Turing.jl/blob/5b24cebe773922e0f3d5c4cb7f53162eb758b04d/src/mcmc/hmc.jl).
+In the first step of sampling, it calls `link` on the sampler.
+This transformation is preserved throughout the sampling process, meaning that `istrans()` always returns true.
+
+We can observe this by inserting print statements into the model.
+Here, `__varinfo__` is the internal symbol for the `VarInfo` object used in model evaluation:
+
+```{julia}
+setprogress!(false)
+
+@model function demo2()
+    x ~ LogNormal()
+    if x isa AbstractFloat
+        println("-----------")
+        println("model repn: $(DynamicPPL.getindex(__varinfo__, @varname(x)))")
+        println("internal repn: $(DynamicPPL.getindex_internal(__varinfo__, @varname(x)))")
+        println("istrans: $(istrans(__varinfo__, @varname(x)))")
+    end
+end
+
+sample(demo2(), HMC(0.1, 3), 3);
+```
+
+
+(Here, the check on `if x isa AbstractFloat` prevents the printing from occurring during computation of the derivative.)
+You can see that during the three sampling steps, `istrans` is always kept as `true`.
+
+::: {.callout-note}
+The first two model evaluations where `istrans` is `false` occur prior to the actual sampling.
+One occurs when the model is checked for correctness (using [`DynamicPPL.check_model_and_trace`](https://github.com/TuringLang/DynamicPPL.jl/blob/ba490bf362653e1aaefe298364fe3379b60660d3/src/debug_utils.jl#L582-L612)).
+The second occurs because the model is evaluated once to generate a set of initial parameters inside [DynamicPPL's implementation of `AbstractMCMC.step`](https://github.com/TuringLang/DynamicPPL.jl/blob/ba490bf362653e1aaefe298364fe3379b60660d3/src/sampler.jl#L98-L117).
+Both of these steps occur with all samplers in Turing.jl, so are not specific to the HMC example shown here.
+:::
+
+What this means is that from the perspective of the HMC sampler, it _never_ sees the constrained variable: it always thinks that it is sampling from an unconstrained distribution.
+
+The biggest prerequisite for this to work correctly is that the potential energy term in the Hamiltonian—or in other words, the model log density—must be programmed correctly to include the Jacobian term.
+This is exactly the same as how we had to make sure to define `logq(y)` correctly in the toy HMC example above.
+
+Within Turing.jl, this is correctly handled because a statement like `x ~ LogNormal()` in the model definition above is translated into `assume(LogNormal(), @varname(x), __varinfo__)`, defined [here](https://github.com/TuringLang/DynamicPPL.jl/blob/ba490bf362653e1aaefe298364fe3379b60660d3/src/context_implementations.jl#L225-L229).
+If you follow the trail of function calls, you can verify that the `assume` function does indeed check for the presence of the `istrans` flag and adds the Jacobian term accordingly.
+
+## A deeper dive into DynamicPPL's internal machinery
+
+As described above, DynamicPPL stores a (possibly linked) _internal representation_ which is accessible via `getindex_internal`, but can also provide users with the original, untransformed, _model representation_ via `getindex`.
+This abstraction allows the user to obtain samples from constrained distributions without having to perform the transformation themselves.
+
+![More functions related to variable transforms in DynamicPPL](./dynamicppl_link2.png)
+
+The conversion between these representations is done using several internal functions in DynamicPPL, as depicted in the above diagram.
+The following operations are labelled:
+
+1. This is linking, i.e. transforming a constrained variable to an unconstrained one.
+
+2. This is vectorisation: for example, converting a scalar value to a 1-element vector.
+
+3. This arrow brings us from the model representation to the linked internal representation.
+   This is the composition of (1) and (2): linking and then vectorising.
+
+4. This arrow brings us from the model representation to the unlinked internal representation.
+   This only requires a single step, vectorisation.
+
+Each of these steps can be accomplished using the following functions.
+
+|     | To get the function | To get the inverse function |
+| --- | ------------------- | --------------------------- |
+| (1) | `link_transform(dist)` | `invlink_transform(dist)` |
+| (2) | `to_vec_transform(dist)` | `from_vec_transform(dist)` |
+| (3) | `to_linked_internal_transform(vi, vn[, dist])` | `from_linked_internal_transform(vi, vn[, dist])` |
+| (4) | `to_internal_transform(vi, vn[, dist])` | `from_internal_transform(vi, vn[, dist])` |
+
+Note that these functions do not perform the actual transformation; rather, they return the transformation function itself.
+For example, let's take a look at the `VarInfo` from the previous section, which contains a single variable `x ~ LogNormal()`.
+
+```{julia}
+model_repn = vi[vn_x]
+```
+
+```{julia}
+# (1) Get the link function
+f_link = DynamicPPL.link_transform(LogNormal())
+# (2) Get the vectorise function
+f_vec = DynamicPPL.to_vec_transform(LogNormal())
+
+# Apply it to the model representation
+linked_internal_repn = f_vec(f_link(model_repn))
+```
+
+Equivalently, we could have done:
+
+```{julia}
+# (3) Get the linked internal transform function
+f_linked_internal = DynamicPPL.to_linked_internal_transform(vi, vn_x, LogNormal())
+
+# Apply it to the model representation
+linked_internal_repn = f_linked_internal(model_repn)
+```
+
+And let's confirm that this is the same as the linked internal representation, using the `VarInfo` that we linked earlier:
+
+```{julia}
+DynamicPPL.getindex_internal(vi_linked, vn_x)
+```
+
+The purpose of having all of these machinery is to allow other parts of DynamicPPL, such as the tilde pipeline, to handle transformed variables correctly.
+The following diagram shows how `assume` first checks whether the variable is transformed (using `istrans`), and then applies the appropriate transformation function.
+
+<!-- 'wrappingWidth' setting required because of https://github.com/mermaid-js/mermaid-cli/issues/112#issuecomment-2352670995 -->
+```{mermaid}
+%%| echo: false
+
+%%{ init: { 'themeVariables': { 'lineColor': '#000000' } } }%%
+%%{ init: { 'flowchart': { 'curve': 'linear', 'wrappingWidth': -1 } } }%%
+graph TD
+    A["x ~ LogNormal()"]:::boxStyle
+    B["vn = <span style='color:#3B6EA8 !important;'>@varname</span>(x)<br>dist = LogNormal()<br>x, vi = ..."]:::boxStyle
+    C["assume(vn, dist, vi)"]:::boxStyle
+    D(["<span style='color:#3B6EA8 !important;'>if</span> istrans(vi, vn)"]):::boxStyle
+    E["f = from_internal_transform(vi, vn, dist)"]:::boxStyle
+    F["f = from_linked_internal_transform(vi, vn, dist)"]:::boxStyle
+    G["x, logjac = with_logabsdet_jacobian(f, getindex_internal(vi, vn, dist))"]:::boxStyle
+    H["<span style='color:#3B6EA8 !important;'>return</span> x, logpdf(dist, x) - logjac, vi"]:::boxStyle
+    
+    A -.->|<span style='color:#3B6EA8 ; background-color:#ffffff;'>@model</span>| B
+    B -.->|<span style='color:#000000 ; background-color:#ffffff;'>tilde-pipeline</span>| C
+    C --> D
+    D -->|<span style='color:#97365B ; background-color:#ffffff;'>false</span>| E
+    D -->|<span style='color:#97365B ; background-color:#ffffff;'>true</span>| F
+    E --> G
+    F --> G
+    G --> H
+
+    classDef boxStyle fill:#ffffff,stroke:#000000,font-family:Courier,color:#000000
+    linkStyle default stroke:#000000,stroke-width:1px,color:#000000
+```
+
+Here, `with_logabsdet_jacobian` is defined [in the ChangesOfVariables.jl package](https://juliamath.github.io/ChangesOfVariables.jl/stable/api/#ChangesOfVariables.with_logabsdet_jacobian), and returns both the effect of the transformation `f` as well as the log Jacobian term.
+
+Because we chose `f` appropriately, we find here that `x` is always the model representation; furthermore, if the variable was _not_ linked (i.e. `istrans` was false), the log Jacobian term will be zero.
+However, if it was linked, then the Jacobian term would be appropriately included, making sure that sampling proceeds correctly.
+
+## Why do we need to do this at runtime?
+
+Given that we know whether a `VarInfo` is linked or not, one might wonder why we need both `from_internal_transform` and `from_linked_internal_transform` at the point where the model is evaluated.
+Could we not, for example, store the required transformation inside the `VarInfo` when we link it, and simply reuse that each time?
+
+That is, why can't we just do
+
+```{mermaid}
+%%| echo: false
+%%| fig-width: 5
+
+%%{ init: { 'flowchart': { 'curve': 'linear', 'wrappingWidth': -1 } } }%%
+%%{ init: { 'themeVariables': { 'lineColor': '#000000' } } }%%
+graph TD
+      A["assume(varinfo, <span style='color:#3B6EA8 !important;'>@varname</span>(x), Normal())"]:::boxStyle
+      B["f = from_internal_transform(varinfo, varname, dist)"]:::boxStyle
+      C["x, logjac = with_logabsdet_jacobian(f, getindex_internal(varinfo, varname))"]:::boxStyle
+      D["<span style='color:#3B6EA8 !important;'>return</span> x, logpdf(dist, x) - logjac, varinfo"]:::dashedBox
+      
+      A --> B
+      B --> C
+      C --> D
+
+    classDef dashedBox fill:#ffffff,stroke:#000000,stroke-dasharray: 5 5,font-family:Courier,color:#000000
+    classDef boxStyle fill:#ffffff,stroke:#000000,font-family:Courier,color:#000000
+
+    linkStyle default stroke:#000000,stroke-width:1px,color:#000000
+```
+
+where `from_internal_transform` here only looks up a stored transformation function?
+
+Unfortunately, this is not possible in general, because the transformation function might not be a constant between different model evaluations.
+Consider, for example, the following model:
+
+```{julia}
+@model function demo_dynamic_constraint()
+    m ~ Normal()
+    x ~ truncated(Normal(); lower=m)
+    return (m=m, x=x)
+end
+```
+
+Here, `m` is distributed according to a plain `Normal()`, whereas the variable `x` is constrained to be in the domain `(m, Inf)`.
+Because of this, we expect that any time we sample from the model, we should have that `m < x` (in terms of their model representations):
+
+```{julia}
+model = demo_dynamic_constraint()
+vi = VarInfo(model)
+vn_m, vn_x = @varname(m), @varname(x)
+
+vi[vn_m], vi[vn_x]
+```
+
+(Note that `vi[vn]` is a shorthand for `getindex(vi, vn)`, so this retrieves the model representations of `m` and `x`.)
+So far, so good.
+Let's now link this `VarInfo` so that we end up working in an 'unconstrained' space, where both `m` and `x` can take on any values in `(-Inf, Inf)`.
+First, we should check that the model representations are unchanged when linking:
+
+```{julia}
+vi_linked = link(vi, model)
+vi_linked[vn_m], vi_linked[vn_x]
+```
+
+But if we change the value of `m`, to, say, a bit larger than `x`:
+
+```{julia}
+# Update the model representation for `m` in `vi_linked`.
+vi_linked[vn_m] = vi_linked[vn_x] + 1
+vi_linked[vn_m], vi_linked[vn_x]
+```
+
+::: {.callout-warning}
+This is just for demonstration purposes!
+You shouldn't be directly setting variables in a linked `varinfo` like this unless you know for a fact that the value will be compatible with the constraints of the model.
+:::
+
+Now, we see that the constraint `m < x` is no longer satisfied.
+Hence, one might expect that if we try to evaluate the model using this `VarInfo`, we should obtain an error.
+Here, `evaluate!!` returns two things: the model's return value itself (which we defined above to be a `NamedTuple`), and the resulting `VarInfo` post-evaluation.
+
+```{julia}
+retval, ret_varinfo = DynamicPPL.evaluate!!(model, vi_linked, DefaultContext())
+getlogp(ret_varinfo)
+```
+
+But we don't get any errors!
+Indeed, we could even calculate the 'log probability density' for this evaluation.
+
+To understand this, we need to look at the actual value which was used during the model evaluation.
+We can glean this from the return value (or from the returned `VarInfo`, but the former is easier):
+
+```{julia}
+retval
+```
+
+We can see here that the model evaluation used the value of `m` that we provided, but the value of `x` was 'updated'.
+
+The reason for this is that internally in a model evaluation, we construct the transformation function from the internal to the model representation based on the *current* realizations in the model!
+That is, we take the `dist` in a `x ~ dist` expression _at model evaluation time_ and use that to construct the transformation, thus allowing it to change between model evaluations without invalidating the transformation.
+
+Knowing that the distribution of `x` depends on the value of `m`, we can now understand how the model representation of `x` got updated.
+The linked `VarInfo` does not store the model representation of `x`, but only its linked internal representation.
+So, what happened during the model evaluation above was that the linked internal representation of `x` – which was constructed using the _original_ value of `m` – was transformed back into a new model representation using a _different_ value of `m`.
+
+We can reproduce the 'new' value of `x` by performing these transformations manually:
+
+```{julia}
+# Generate a fresh linked VarInfo (without the new / 'corrupted' values)
+vi_linked = link(vi, model)
+# See the linked internal representations
+DynamicPPL.getindex_internal(vi_linked, vn_m), DynamicPPL.getindex_internal(vi_linked, vn_x)
+```
+
+Now we update the value of `m` like we did before:
+
+```{julia}
+vi_linked[vn_m] = vi_linked[vn_x] + 1
+vi_linked[vn_m]
+```
+
+When evaluating the model, the distribution of `x` is now changed, and so is the corresponding inverse bijector:
+
+```{julia}
+new_dist_x = truncated(Normal(); lower=vi_linked[vn_m])
+new_f_inv = DynamicPPL.invlink_transform(new_dist_x)
+```
+
+and if we apply this to the internal representation of `x`:
+
+```{julia}
+new_f_inv(DynamicPPL.getindex_internal(vi_linked, vn_x))
+```
+
+which is the same value as we got above in `retval`.
+
+## Conclusion
+
+In this chapter of the Turing docs, we've looked at:
+
+- why variables might need to be transformed;
+- how this is accounted for mathematically with the Jacobian term;
+- the basic API and functionality of Bijectors.jl; and
+- the higher-level usage of transforms in DynamicPPL and Turing.
+
+This will hopefully have equipped you with a better understanding of how constrained variables are handled in the Turing framework.
+With this knowledge, you should especially find it easier to navigate DynamicPPL's `VarInfo` type, which forms the backbone of model evaluation.
diff --git a/theming/styles.css b/theming/styles.css
index 2e55f2cdf..c670e42aa 100755
--- a/theming/styles.css
+++ b/theming/styles.css
@@ -1,19 +1,3 @@
-/* css styles */
-/* .cell-output {
-    background-color: #f1f3f5;
-} */
-
-/* .cell-output img {
-    max-width: 100%;
-    height: auto;
-} */
-
-/* .cell-output-display pre {
-    word-break: break-wor !important;
-    white-space: pre-wrap !important;
-}
- */
-
 .navbar a:hover {
     text-decoration: none;
 }
@@ -31,10 +15,15 @@
     white-space: pre-wrap !important;
 }
 
-
 .cell-output-display svg {
     height: fit-content;
     width: fit-content;
+
+    &.mermaid-js {
+        /* fit-content for mermaid diagrams makes them really small, so we
+         * default to 100% */
+        width: 100%;
+    }
 }
 
 .cell-output-display img {
@@ -54,4 +43,4 @@
     border-radius: 5px;
     max-height: 250px;
     overflow: scroll;
-}
\ No newline at end of file
+}