Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more extensive usage docs #63

Merged
merged 14 commits into from
May 4, 2022
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "Pathfinder"
uuid = "b1d3bc72-d0e7-4279-b92f-7fa5d6d2d454"
authors = ["Seth Axen <[email protected]> and contributors"]
version = "0.4.1"
version = "0.4.2"

[deps]
AbstractDifferentiation = "c29ec348-61ec-40c8-8164-b8c60e9d9f3d"
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Pathfinder
# Pathfinder.jl: Parallel quasi-Newton variational inference

[![Stable](https://img.shields.io/badge/docs-stable-blue.svg)](https://sethaxen.github.io/Pathfinder.jl/stable)
[![Dev](https://img.shields.io/badge/docs-dev-blue.svg)](https://sethaxen.github.io/Pathfinder.jl/dev)
Expand Down
18 changes: 16 additions & 2 deletions docs/Project.toml
Original file line number Diff line number Diff line change
@@ -1,11 +1,25 @@
[deps]
AdvancedHMC = "0bf59076-c3b1-5ca4-86bd-e02cd72cde3d"
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
DynamicHMC = "bbc10e6e-7c05-544b-b16e-64fede858acb"
ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
LogDensityProblems = "6fdf6af0-433a-55f7-b3ed-c6c6e0b8df7c"
Pathfinder = "b1d3bc72-d0e7-4279-b92f-7fa5d6d2d454"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
Printf = "de0858da-6303-5e67-8744-51eddeeeb8d7"
StatsFuns = "4c63d2b9-4356-54db-8cca-17b64c39e42c"
StatsPlots = "f3b207a7-027a-5e70-b257-86293d7955fd"
TransformVariables = "84d833dd-6860-57f9-a1a7-6da5db126cff"
Turing = "fce5fe82-541a-59a6-adf8-730c64b5f9a0"

[compat]
AdvancedHMC = "0.3"
Documenter = "0.27"
DynamicHMC = "3"
ForwardDiff = "0.10"
LogDensityProblems = "0.11"
Pathfinder = "0.4"
Plots = "1"
StatsFuns = "0.9"
StatsPlots = "0.14"
TransformVariables = "0.6"
Turing = "0.21"
11 changes: 9 additions & 2 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,15 @@ makedocs(;
),
pages=[
"Home" => "index.md",
"Single-path Pathfinder" => "pathfinder.md",
"Multi-path Pathfinder" => "multipathfinder.md",
"Library" => [
"Public" => "lib/public.md",
"Internals" => "lib/internals.md",
],
"Examples" => [
"Quickstart" => "examples/quickstart.md",
"Initializing HMC" => "examples/initializing-hmc.md",
"Turing usage" => "examples/turing.md",
]
],
)

Expand Down
239 changes: 239 additions & 0 deletions docs/src/examples/initializing-hmc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,239 @@
# Initializing HMC with Pathfinder

## The MCMC warm-up phase

When using MCMC to draw samples from some target distribution, there is often a lengthy warm-up phase with 2 phases:
1. converge to the _typical set_ (the region of the target distribution where the bulk of the probability mass is located)
2. adapt any tunable parameters of the MCMC sampler (optional)

While (1) often happens fairly quickly, (2) usually requires a lengthy exploration of the typical set to iteratively adapt parameters suitable for further exploration.
An example is the widely used windowed adaptation scheme of Hamiltonian Monte Carlo (HMC) in Stan, where a step size and positive definite metric (aka mass matrix) are adapted.[^1]
For posteriors with complex geometry, the adaptation phase can require many evaluations of the gradient of the log density function of the target distribution.

Pathfinder can be used to initialize MCMC, and in particular HMC, in 3 ways:
1. Initialize MCMC from one of Pathfinder's draws (replace phase 1 of the warm-up).
2. Initialize an HMC metric adaptation from the inverse of the covariance of the multivariate normal approximation (replace the first window of phase 2 of the warm-up).
3. Use the inverse of the covariance as the metric without further adaptation (replace phase 2 of the warm-up).

This tutorial demonstrates all three approaches with [DynamicHMC.jl](https://tamaspapp.eu/DynamicHMC.jl/stable/) and [AdvancedHMC.jl](https://github.com/TuringLang/AdvancedHMC.jl).
Both of these packages have standalone implementations of adaptive HMC (aka NUTS) and can be used independently of any probabilistic programming language (PPL).
Both the [Turing](https://turing.ml/stable/) and [Soss](https://github.com/cscherrer/Soss.jl) PPLs have some DynamicHMC integration, while Turing also integrates with AdvancedHMC.

For demonstration purposes, we'll use the following dummy data:

```@example 1
using LinearAlgebra, Pathfinder, Random, StatsFuns, StatsPlots

Random.seed!(91)

x = 0:0.01:1
y = @. sin(10x) + randn() * 0.2 + x

scatter(x, y; xlabel="x", ylabel="y", legend=false, msw=0, ms=2)
```

We'll fit this using a simple polynomial regression model:

```math
\begin{aligned}
\sigma &\sim \text{Half-Normal}(\mu=0, \sigma=1)\\
\alpha, \beta_j &\sim \mathrm{Normal}(\mu=0, \sigma=1)\\
\hat{y}_i &= \alpha + \sum_{j=1}^J x_i^j \beta_j\\
y_i &\sim \mathrm{Normal}(\mu=\hat{y}_i, \sigma=\sigma)
\end{aligned}
```

We just need to implement the log-density function of the resulting posterior.

```@example 1
struct RegressionProblem{X,Z,Y}
x::X
J::Int
z::Z
y::Y
end
function RegressionProblem(x, J, y)
z = x .* (1:J)'
return RegressionProblem(x, J, z, y)
end

function (prob::RegressionProblem)(θ)
σ = θ.σ
α = θ.α
β = θ.β
z = prob.z
y = prob.y
lp = normlogpdf(σ) + logtwo
lp += normlogpdf(α)
lp += sum(normlogpdf, β)
y_hat = muladd(z, β, α)
lp += sum(eachindex(y_hat, y)) do i
return normlogpdf(y_hat[i], σ, y[i])
end
return lp
end

J = 3
dim = J + 2
model = RegressionProblem(x, J, y)
ndraws = 1_000;
nothing # hide
```

## DynamicHMC.jl

To use DynamicHMC, we first need to transform our model to an unconstrained space using [TransformVariables.jl](https://tamaspapp.eu/TransformVariables.jl/stable/) and wrap it in a type that implements the [LogDensityProblems.jl](https://github.com/tpapp/LogDensityProblems.jl) interface:

```@example 1
using DynamicHMC, LogDensityProblems, TransformVariables

transform = as((σ=asℝ₊, α=asℝ, β=as(Array, J)))
P = TransformedLogDensity(transform, model)
∇P = ADgradient(:ForwardDiff, P)
```

Pathfinder, on the other hand, expects a log-density function:

```@example 1
logp(x) = LogDensityProblems.logdensity(P, x)
∇logp(x) = LogDensityProblems.logdensity_and_gradient(∇P, x)[2]
result_pf = pathfinder(logp, ∇logp; dim)
```

```@example 1
init_params = result_pf.draws[:, 1]
```

```@example 1
inv_metric = result_pf.fit_distribution.Σ
```

### Initializing from Pathfinder's draws

Here we just need to pass one of the draws as the initial point `q`:

```@example 1
result_dhmc1 = mcmc_with_warmup(
Random.GLOBAL_RNG,
∇P,
ndraws;
initialization=(; q=init_params),
reporter=NoProgressReport(),
)
```

### Initializing metric adaptation from Pathfinder's estimate

To start with Pathfinder's inverse metric estimate, we just need to initialize a `GaussianKineticEnergy` object with it as input:

```@example 1
result_dhmc2 = mcmc_with_warmup(
Random.GLOBAL_RNG,
∇P,
ndraws;
initialization=(; q=init_params, κ=GaussianKineticEnergy(inv_metric)),
warmup_stages=default_warmup_stages(; M=Symmetric),
reporter=NoProgressReport(),
)
```

We also specified that DynamicHMC should tune a dense `Symmetric` matrix.
However, we could also have requested a `Diagonal` metric.

### Use Pathfinder's metric estimate for sampling

To turn off metric adaptation entirely and use Pathfinder's estimate, we just set the number and size of the metric adaptation windows to 0.

```@example 1
result_dhmc3 = mcmc_with_warmup(
Random.GLOBAL_RNG,
∇P,
ndraws;
initialization=(; q=init_params, κ=GaussianKineticEnergy(inv_metric)),
warmup_stages=default_warmup_stages(; middle_steps=0, doubling_stages=0),
reporter=NoProgressReport(),
)
```

## AdvancedHMC.jl

Similar to Pathfinder, AdvancedHMC works with an unconstrained log density function and its gradient.
We'll just use the `logp` we already created above.

```@example 1
using AdvancedHMC, ForwardDiff

nadapts = 500;
nothing # hide
```

### Initializing from Pathfinder's draws

```@example 1
metric = DiagEuclideanMetric(dim)
hamiltonian = Hamiltonian(metric, logp, ForwardDiff)
ϵ = find_good_stepsize(hamiltonian, init_params)
integrator = Leapfrog(ϵ)
proposal = NUTS{MultinomialTS,GeneralisedNoUTurn}(integrator)
adaptor = StepSizeAdaptor(0.8, integrator)
samples_ahmc1, stats_ahmc1 = sample(
hamiltonian,
proposal,
init_params,
ndraws + nadapts,
adaptor,
nadapts;
drop_warmup=true,
progress=false,
)
```

### Initializing metric adaptation from Pathfinder's estimate

We can't start with Pathfinder's inverse metric estimate directly.
Instead we need to first extract its diagonal for a `DiagonalEuclideanMetric` or make it dense for a `DenseEuclideanMetric`:

```@example 1
metric = DenseEuclideanMetric(Matrix(inv_metric))
hamiltonian = Hamiltonian(metric, logp, ForwardDiff)
ϵ = find_good_stepsize(hamiltonian, init_params)
integrator = Leapfrog(ϵ)
proposal = NUTS{MultinomialTS,GeneralisedNoUTurn}(integrator)
adaptor = StepSizeAdaptor(0.8, integrator)
samples_ahmc2, stats_ahmc2 = sample(
hamiltonian,
proposal,
init_params,
ndraws + nadapts,
adaptor,
nadapts;
drop_warmup=true,
progress=false,
)
```

### Use Pathfinder's metric estimate for sampling

To use Pathfinder's metric with no metric adaptation, we need to use Pathfinder's own `RankUpdateEuclideanMetric` type, which just wraps our inverse metric estimate for use with AdvancedHMC:

```@example 1
nadapts = 75
metric = Pathfinder.RankUpdateEuclideanMetric(inv_metric)
hamiltonian = Hamiltonian(metric, logp, ForwardDiff)
ϵ = find_good_stepsize(hamiltonian, init_params)
integrator = Leapfrog(ϵ)
proposal = NUTS{MultinomialTS,GeneralisedNoUTurn}(integrator)
adaptor = StepSizeAdaptor(0.8, integrator)
samples_ahmc3, stats_ahmc3 = sample(
hamiltonian,
proposal,
init_params,
ndraws + nadapts,
adaptor,
nadapts;
drop_warmup=true,
progress=false,
)
```

[^1]: https://mc-stan.org/docs/reference-manual/hmc-algorithm-parameters.html
Loading