Skip to content

Commit

Permalink
Setup docs building.
Browse files Browse the repository at this point in the history
  • Loading branch information
yebai committed Jan 6, 2023
1 parent 89257ba commit 0bb86e2
Show file tree
Hide file tree
Showing 6 changed files with 306 additions and 0 deletions.
32 changes: 32 additions & 0 deletions .github/workflows/Docs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
name: Documentation

on:
push:
branches:
# Build the master branch.
- master
tags: '*'
pull_request:

concurrency:
# Skip intermediate builds: always.
# Cancel intermediate builds: only if it is a pull request build.
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ startsWith(github.ref, 'refs/pull/') }}

jobs:
docs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: julia-actions/setup-julia@latest
with:
version: '1'
- name: Install dependencies
run: julia --project=docs/ -e 'using Pkg; Pkg.develop(PackageSpec(path=pwd())); Pkg.instantiate()'
- name: Build and deploy
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} # For authentication with GitHub Actions token
DOCUMENTER_KEY: ${{ secrets.DOCUMENTER_KEY }} # For authentication with SSH deploy key
JULIA_DEBUG: Documenter # Print `@debug` statements (https://github.com/JuliaDocs/Documenter.jl/issues/955)
run: julia --project=docs/ docs/make.jl
11 changes: 11 additions & 0 deletions docs/Project.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
[deps]
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
Functors = "d9f16b24-f501-4c13-a1f2-28368ffc5196"
StableRNGs = "860ef19b-820b-49d6-a774-d7a799459cd3"
Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f"

[compat]
Documenter = "0.27"
Functors = "0.3"
StableRNGs = "1"
Zygote = "0.6"
16 changes: 16 additions & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
using Documenter
using Bijectors

# Doctest setup
DocMeta.setdocmeta!(Bijectors, :DocTestSetup, :(using Bijectors); recursive=true)

makedocs(
sitename = "Bijectors",
format = Documenter.HTML(),
modules = [Bijectors],
pages = ["Home" => "index.md", "Distributions.jl integration" => "distributions.md", "Examples" => "examples.md"],
strict=false,
checkdocs=:exports,
)

deploydocs(repo = "github.com/TuringLang/Bijectors.jl.git", push_preview=true)
52 changes: 52 additions & 0 deletions docs/src/distributions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
## Basic usage
Other than the `logpdf_with_trans` methods, the package also provides a more composable interface through the `Bijector` types. Consider for example the one from above with `Beta(2, 2)`.

```julia
julia> using Random; Random.seed!(42);

julia> using Bijectors; using Bijectors: Logit

julia> dist = Beta(2, 2)
Beta{Float64}=2.0, β=2.0)

julia> x = rand(dist)
0.36888689965963756

julia> b = bijector(dist) # bijection (0, 1) → ℝ
Logit{Float64}(0.0, 1.0)

julia> y = b(x)
-0.5369949942509267
```

In this case we see that `bijector(d::Distribution)` returns the corresponding constrained-to-unconstrained bijection for `Beta`, which indeed is a `Logit` with `a = 0.0` and `b = 1.0`. The resulting `Logit <: Bijector` has a method `(b::Logit)(x)` defined, allowing us to call it just like any other function. Comparing with the above example, `b(x) ≈ link(dist, x)`. Just to convince ourselves:

```julia
julia> b(x) link(dist, x)
true
```

## Transforming distributions

```@setup transformed-dist-simple
using Bijectors
```

We can create a _transformed_ `Distribution`, i.e. a `Distribution` defined by sampling from a given `Distribution` and then transforming using a given transformation:

```@repl transformed-dist-simple
dist = Beta(2, 2) # support on (0, 1)
tdist = transformed(dist) # support on ℝ
tdist isa UnivariateDistribution
```

We can the then compute the `logpdf` for the resulting distribution:

```@repl transformed-dist-simple
# Some example values
x = rand(dist)
y = tdist.transform(x)
logpdf(tdist, y)
```
163 changes: 163 additions & 0 deletions docs/src/examples.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
```@setup advi
using Bijectors
```

## Univariate ADVI example
But the real utility of `TransformedDistribution` becomes more apparent when using `transformed(dist, b)` for any bijector `b`. To get the transformed distribution corresponding to the `Beta(2, 2)`, we called `transformed(dist)` before. This is simply an alias for `transformed(dist, bijector(dist))`. Remember `bijector(dist)` returns the constrained-to-constrained bijector for that particular `Distribution`. But we can of course construct a `TransformedDistribution` using different bijectors with the same `dist`. This is particularly useful in something called _Automatic Differentiation Variational Inference (ADVI)_.[2] An important part of ADVI is to approximate a constrained distribution, e.g. `Beta`, as follows:
1. Sample `x` from a `Normal` with parameters `μ` and `σ`, i.e. `x ~ Normal(μ, σ)`.
2. Transform `x` to `y` s.t. `y ∈ support(Beta)`, with the transform being a differentiable bijection with a differentiable inverse (a "bijector")

This then defines a probability density with same _support_ as `Beta`! Of course, it's unlikely that it will be the same density, but it's an _approximation_. Creating such a distribution becomes trivial with `Bijector` and `TransformedDistribution`:

```@repl advi
using StableRNGs: StableRNG
rng = StableRNG(42);
dist = Beta(2, 2)
b = bijector(dist) # (0, 1) → ℝ
b⁻¹ = inverse(b) # ℝ → (0, 1)
td = transformed(Normal(), b⁻¹) # x ∼ 𝓝(0, 1) then b(x) ∈ (0, 1)
x = rand(rng, td) # ∈ (0, 1)
```

It's worth noting that `support(Beta)` is the _closed_ interval `[0, 1]`, while the constrained-to-unconstrained bijection, `Logit` in this case, is only well-defined as a map `(0, 1) → ℝ` for the _open_ interval `(0, 1)`. This is of course not an implementation detail. `` is itself open, thus no continuous bijection exists from a _closed_ interval to ``. But since the boundaries of a closed interval has what's known as measure zero, this doesn't end up affecting the resulting density with support on the entire real line. In practice, this means that

```@repl advi
td = transformed(Beta())
inverse(td.transform)(rand(rng, td))
```

will never result in `0` or `1` though any sample arbitrarily close to either `0` or `1` is possible. _Disclaimer: numerical accuracy is limited, so you might still see `0` and `1` if you're lucky._

## Multivariate ADVI example
We can also do _multivariate_ ADVI using the `Stacked` bijector. `Stacked` gives us a way to combine univariate and/or multivariate bijectors into a singe multivariate bijector. Say you have a vector `x` of length 2 and you want to transform the first entry using `Exp` and the second entry using `Log`. `Stacked` gives you an easy and efficient way of representing such a bijector.

```@repl advi
using Bijectors: SimplexBijector
# Original distributions
dists = (
Beta(),
InverseGamma(),
Dirichlet(2, 3)
);
# Construct the corresponding ranges
ranges = [];
idx = 1;
for i = 1:length(dists)
d = dists[i]
push!(ranges, idx:idx + length(d) - 1)
global idx
idx += length(d)
end;
ranges
# Base distribution; mean-field normal
num_params = ranges[end][end]
d = MvNormal(zeros(num_params), ones(num_params));
# Construct the transform
bs = bijector.(dists); # constrained-to-unconstrained bijectors for dists
ibs = inverse.(bs); # invert, so we get unconstrained-to-constrained
sb = Stacked(ibs, ranges) # => Stacked <: Bijector
# Mean-field normal with unconstrained-to-constrained stacked bijector
td = transformed(d, sb);
y = rand(td)
0.0 ≤ y[1] ≤ 1.0
0.0 < y[2]
sum(y[3:4]) ≈ 1.0
```

## Normalizing flows
A very interesting application is that of _normalizing flows_.[1] Usually this is done by sampling from a multivariate normal distribution, and then transforming this to a target distribution using invertible neural networks. Currently there are two such transforms available in Bijectors.jl: `PlanarLayer` and `RadialLayer`. Let's create a flow with a single `PlanarLayer`:

```@setup normalizing-flows
using Bijectors
using StableRNGs: StableRNG
rng = StableRNG(42);
```

```@repl normalizing-flows
d = MvNormal(zeros(2), ones(2));
b = PlanarLayer(2)
flow = transformed(d, b)
flow isa MultivariateDistribution
```

That's it. Now we can sample from it using `rand` and compute the `logpdf`, like any other `Distribution`.

```@repl normalizing-flows
y = rand(rng, flow)
logpdf(flow, y) # uses inverse of `b`
```

Similarily to the multivariate ADVI example, we could use `Stacked` to get a _bounded_ flow:

```@repl normalizing-flows
d = MvNormal(zeros(2), ones(2));
ibs = inverse.(bijector.((InverseGamma(2, 3), Beta())));
sb = stack(ibs...) # == Stacked(ibs) == Stacked(ibs, [i:i for i = 1:length(ibs)]
b = sb ∘ PlanarLayer(2)
td = transformed(d, b);
y = rand(rng, td)
0 < y[1]
0 ≤ y[2] ≤ 1
```

Want to fit the flow?

```@repl normalizing-flows
using Zygote
# Construct the flow.
b = PlanarLayer(2)
# Convenient for extracting parameters and reconstructing the flow.
using Functors
θs, reconstruct = Functors.functor(b);
# Make the objective a `struct` to avoid capturing global variables.
struct NLLObjective{R,D,T}
reconstruct::R
basedist::D
data::T
end
function (obj::NLLObjective)(θs...)
transformed_dist = transformed(obj.basedist, obj.reconstruct(θs))
return -sum(Base.Fix1(logpdf, transformed_dist), eachcol(obj.data))
end
# Some random data to estimate the density of.
xs = randn(2, 1000);
# Construct the objective.
f = NLLObjective(reconstruct, MvNormal(2, 1), xs);
# Initial loss.
@info "Initial loss: $(f(θs...))"
# Train using gradient descent.
ε = 1e-3;
for i = 1:100
∇s = Zygote.gradient(f, θs...)
θs = map(θs, ∇s) do θ, ∇
θ - ε .* ∇
end
end
# Final loss
@info "Finall loss: $(f(θs...))"
# Very simple check to see if we learned something useful.
samples = rand(transformed(f.basedist, f.reconstruct(θs)), 1000);
mean(eachcol(samples)) # ≈ [0, 0]
cov(samples; dims=2) # ≈ I
```

We can easily create more complex flows by simply doing `PlanarLayer(10) ∘ PlanarLayer(10) ∘ RadialLayer(10)` and so on.
32 changes: 32 additions & 0 deletions docs/src/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Bijectors.jl

This package implements a set of functions for transforming constrained random variables (e.g. simplexes, intervals) to Euclidean space. The 3 main functions implemented in this package are the `link`, `invlink` and `logpdf_with_trans` for a number of distributions. The distributions supported are:
1. `RealDistribution`: `Union{Cauchy, Gumbel, Laplace, Logistic, NoncentralT, Normal, NormalCanon, TDist}`,
2. `PositiveDistribution`: `Union{BetaPrime, Chi, Chisq, Erlang, Exponential, FDist, Frechet, Gamma, InverseGamma, InverseGaussian, Kolmogorov, LogNormal, NoncentralChisq, NoncentralF, Rayleigh, Weibull}`,
3. `UnitDistribution`: `Union{Beta, KSOneSided, NoncentralBeta}`,
4. `SimplexDistribution`: `Union{Dirichlet}`,
5. `PDMatDistribution`: `Union{InverseWishart, Wishart}`, and
6. `TransformDistribution`: `Union{T, Truncated{T}} where T<:ContinuousUnivariateDistribution`.

All exported names from the [Distributions.jl](https://github.com/TuringLang/Bijectors.jl) package are reexported from `Bijectors`.

Bijectors.jl also provides a nice interface for working with these maps: composition, inversion, etc.
The following table lists mathematical operations for a bijector and the corresponding code in Bijectors.jl.

| Operation | Method | Automatic |
|:------------------------------------:|:-----------------:|:-----------:|
| `b ↦ b⁻¹` | `inverse(b)` ||
| `(b₁, b₂) ↦ (b₁ ∘ b₂)` | `b₁ ∘ b₂` ||
| `(b₁, b₂) ↦ [b₁, b₂]` | `stack(b₁, b₂)` ||
| `x ↦ b(x)` | `b(x)` | × |
| `y ↦ b⁻¹(y)` | `inverse(b)(y)` | × |
| `x ↦ log|det J(b, x)|` | `logabsdetjac(b, x)` | AD |
| `x ↦ b(x), log|det J(b, x)|` | `with_logabsdet_jacobian(b, x)` ||
| `p ↦ q := b_* p` | `q = transformed(p, b)` ||
| `y ∼ q` | `y = rand(q)` ||
| `p ↦ b` such that `support(b_* p) = ℝᵈ` | `bijector(p)` ||
| `(x ∼ p, b(x), log|det J(b, x)|, log q(y))` | `forward(q)` ||

In this table, `b` denotes a `Bijector`, `J(b, x)` denotes the Jacobian of `b` evaluated at `x`, `b_*` denotes the [push-forward](https://www.wikiwand.com/en/Pushforward_measure) of `p` by `b`, and `x ∼ p` denotes `x` sampled from the distribution with density `p`.

The "Automatic" column in the table refers to whether or not you are required to implement the feature for a custom `Bijector`. "AD" refers to the fact that it can be implemented "automatically" using automatic differentiation.

0 comments on commit 0bb86e2

Please sign in to comment.