-
Notifications
You must be signed in to change notification settings - Fork 34
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
6 changed files
with
306 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
name: Documentation | ||
|
||
on: | ||
push: | ||
branches: | ||
# Build the master branch. | ||
- master | ||
tags: '*' | ||
pull_request: | ||
|
||
concurrency: | ||
# Skip intermediate builds: always. | ||
# Cancel intermediate builds: only if it is a pull request build. | ||
group: ${{ github.workflow }}-${{ github.ref }} | ||
cancel-in-progress: ${{ startsWith(github.ref, 'refs/pull/') }} | ||
|
||
jobs: | ||
docs: | ||
runs-on: ubuntu-latest | ||
steps: | ||
- uses: actions/checkout@v2 | ||
- uses: julia-actions/setup-julia@latest | ||
with: | ||
version: '1' | ||
- name: Install dependencies | ||
run: julia --project=docs/ -e 'using Pkg; Pkg.develop(PackageSpec(path=pwd())); Pkg.instantiate()' | ||
- name: Build and deploy | ||
env: | ||
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} # For authentication with GitHub Actions token | ||
DOCUMENTER_KEY: ${{ secrets.DOCUMENTER_KEY }} # For authentication with SSH deploy key | ||
JULIA_DEBUG: Documenter # Print `@debug` statements (https://github.com/JuliaDocs/Documenter.jl/issues/955) | ||
run: julia --project=docs/ docs/make.jl |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
[deps] | ||
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4" | ||
Functors = "d9f16b24-f501-4c13-a1f2-28368ffc5196" | ||
StableRNGs = "860ef19b-820b-49d6-a774-d7a799459cd3" | ||
Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f" | ||
|
||
[compat] | ||
Documenter = "0.27" | ||
Functors = "0.3" | ||
StableRNGs = "1" | ||
Zygote = "0.6" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
using Documenter | ||
using Bijectors | ||
|
||
# Doctest setup | ||
DocMeta.setdocmeta!(Bijectors, :DocTestSetup, :(using Bijectors); recursive=true) | ||
|
||
makedocs( | ||
sitename = "Bijectors", | ||
format = Documenter.HTML(), | ||
modules = [Bijectors], | ||
pages = ["Home" => "index.md", "Distributions.jl integration" => "distributions.md", "Examples" => "examples.md"], | ||
strict=false, | ||
checkdocs=:exports, | ||
) | ||
|
||
deploydocs(repo = "github.com/TuringLang/Bijectors.jl.git", push_preview=true) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
## Basic usage | ||
Other than the `logpdf_with_trans` methods, the package also provides a more composable interface through the `Bijector` types. Consider for example the one from above with `Beta(2, 2)`. | ||
|
||
```julia | ||
julia> using Random; Random.seed!(42); | ||
|
||
julia> using Bijectors; using Bijectors: Logit | ||
|
||
julia> dist = Beta(2, 2) | ||
Beta{Float64}(α=2.0, β=2.0) | ||
|
||
julia> x = rand(dist) | ||
0.36888689965963756 | ||
|
||
julia> b = bijector(dist) # bijection (0, 1) → ℝ | ||
Logit{Float64}(0.0, 1.0) | ||
|
||
julia> y = b(x) | ||
-0.5369949942509267 | ||
``` | ||
|
||
In this case we see that `bijector(d::Distribution)` returns the corresponding constrained-to-unconstrained bijection for `Beta`, which indeed is a `Logit` with `a = 0.0` and `b = 1.0`. The resulting `Logit <: Bijector` has a method `(b::Logit)(x)` defined, allowing us to call it just like any other function. Comparing with the above example, `b(x) ≈ link(dist, x)`. Just to convince ourselves: | ||
|
||
```julia | ||
julia> b(x) ≈ link(dist, x) | ||
true | ||
``` | ||
|
||
## Transforming distributions | ||
|
||
```@setup transformed-dist-simple | ||
using Bijectors | ||
``` | ||
|
||
We can create a _transformed_ `Distribution`, i.e. a `Distribution` defined by sampling from a given `Distribution` and then transforming using a given transformation: | ||
|
||
```@repl transformed-dist-simple | ||
dist = Beta(2, 2) # support on (0, 1) | ||
tdist = transformed(dist) # support on ℝ | ||
tdist isa UnivariateDistribution | ||
``` | ||
|
||
We can the then compute the `logpdf` for the resulting distribution: | ||
|
||
```@repl transformed-dist-simple | ||
# Some example values | ||
x = rand(dist) | ||
y = tdist.transform(x) | ||
logpdf(tdist, y) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,163 @@ | ||
```@setup advi | ||
using Bijectors | ||
``` | ||
|
||
## Univariate ADVI example | ||
But the real utility of `TransformedDistribution` becomes more apparent when using `transformed(dist, b)` for any bijector `b`. To get the transformed distribution corresponding to the `Beta(2, 2)`, we called `transformed(dist)` before. This is simply an alias for `transformed(dist, bijector(dist))`. Remember `bijector(dist)` returns the constrained-to-constrained bijector for that particular `Distribution`. But we can of course construct a `TransformedDistribution` using different bijectors with the same `dist`. This is particularly useful in something called _Automatic Differentiation Variational Inference (ADVI)_.[2] An important part of ADVI is to approximate a constrained distribution, e.g. `Beta`, as follows: | ||
1. Sample `x` from a `Normal` with parameters `μ` and `σ`, i.e. `x ~ Normal(μ, σ)`. | ||
2. Transform `x` to `y` s.t. `y ∈ support(Beta)`, with the transform being a differentiable bijection with a differentiable inverse (a "bijector") | ||
|
||
This then defines a probability density with same _support_ as `Beta`! Of course, it's unlikely that it will be the same density, but it's an _approximation_. Creating such a distribution becomes trivial with `Bijector` and `TransformedDistribution`: | ||
|
||
```@repl advi | ||
using StableRNGs: StableRNG | ||
rng = StableRNG(42); | ||
dist = Beta(2, 2) | ||
b = bijector(dist) # (0, 1) → ℝ | ||
b⁻¹ = inverse(b) # ℝ → (0, 1) | ||
td = transformed(Normal(), b⁻¹) # x ∼ 𝓝(0, 1) then b(x) ∈ (0, 1) | ||
x = rand(rng, td) # ∈ (0, 1) | ||
``` | ||
|
||
It's worth noting that `support(Beta)` is the _closed_ interval `[0, 1]`, while the constrained-to-unconstrained bijection, `Logit` in this case, is only well-defined as a map `(0, 1) → ℝ` for the _open_ interval `(0, 1)`. This is of course not an implementation detail. `ℝ` is itself open, thus no continuous bijection exists from a _closed_ interval to `ℝ`. But since the boundaries of a closed interval has what's known as measure zero, this doesn't end up affecting the resulting density with support on the entire real line. In practice, this means that | ||
|
||
```@repl advi | ||
td = transformed(Beta()) | ||
inverse(td.transform)(rand(rng, td)) | ||
``` | ||
|
||
will never result in `0` or `1` though any sample arbitrarily close to either `0` or `1` is possible. _Disclaimer: numerical accuracy is limited, so you might still see `0` and `1` if you're lucky._ | ||
|
||
## Multivariate ADVI example | ||
We can also do _multivariate_ ADVI using the `Stacked` bijector. `Stacked` gives us a way to combine univariate and/or multivariate bijectors into a singe multivariate bijector. Say you have a vector `x` of length 2 and you want to transform the first entry using `Exp` and the second entry using `Log`. `Stacked` gives you an easy and efficient way of representing such a bijector. | ||
|
||
```@repl advi | ||
using Bijectors: SimplexBijector | ||
# Original distributions | ||
dists = ( | ||
Beta(), | ||
InverseGamma(), | ||
Dirichlet(2, 3) | ||
); | ||
# Construct the corresponding ranges | ||
ranges = []; | ||
idx = 1; | ||
for i = 1:length(dists) | ||
d = dists[i] | ||
push!(ranges, idx:idx + length(d) - 1) | ||
global idx | ||
idx += length(d) | ||
end; | ||
ranges | ||
# Base distribution; mean-field normal | ||
num_params = ranges[end][end] | ||
d = MvNormal(zeros(num_params), ones(num_params)); | ||
# Construct the transform | ||
bs = bijector.(dists); # constrained-to-unconstrained bijectors for dists | ||
ibs = inverse.(bs); # invert, so we get unconstrained-to-constrained | ||
sb = Stacked(ibs, ranges) # => Stacked <: Bijector | ||
# Mean-field normal with unconstrained-to-constrained stacked bijector | ||
td = transformed(d, sb); | ||
y = rand(td) | ||
0.0 ≤ y[1] ≤ 1.0 | ||
0.0 < y[2] | ||
sum(y[3:4]) ≈ 1.0 | ||
``` | ||
|
||
## Normalizing flows | ||
A very interesting application is that of _normalizing flows_.[1] Usually this is done by sampling from a multivariate normal distribution, and then transforming this to a target distribution using invertible neural networks. Currently there are two such transforms available in Bijectors.jl: `PlanarLayer` and `RadialLayer`. Let's create a flow with a single `PlanarLayer`: | ||
|
||
```@setup normalizing-flows | ||
using Bijectors | ||
using StableRNGs: StableRNG | ||
rng = StableRNG(42); | ||
``` | ||
|
||
```@repl normalizing-flows | ||
d = MvNormal(zeros(2), ones(2)); | ||
b = PlanarLayer(2) | ||
flow = transformed(d, b) | ||
flow isa MultivariateDistribution | ||
``` | ||
|
||
That's it. Now we can sample from it using `rand` and compute the `logpdf`, like any other `Distribution`. | ||
|
||
```@repl normalizing-flows | ||
y = rand(rng, flow) | ||
logpdf(flow, y) # uses inverse of `b` | ||
``` | ||
|
||
Similarily to the multivariate ADVI example, we could use `Stacked` to get a _bounded_ flow: | ||
|
||
```@repl normalizing-flows | ||
d = MvNormal(zeros(2), ones(2)); | ||
ibs = inverse.(bijector.((InverseGamma(2, 3), Beta()))); | ||
sb = stack(ibs...) # == Stacked(ibs) == Stacked(ibs, [i:i for i = 1:length(ibs)] | ||
b = sb ∘ PlanarLayer(2) | ||
td = transformed(d, b); | ||
y = rand(rng, td) | ||
0 < y[1] | ||
0 ≤ y[2] ≤ 1 | ||
``` | ||
|
||
Want to fit the flow? | ||
|
||
```@repl normalizing-flows | ||
using Zygote | ||
# Construct the flow. | ||
b = PlanarLayer(2) | ||
# Convenient for extracting parameters and reconstructing the flow. | ||
using Functors | ||
θs, reconstruct = Functors.functor(b); | ||
# Make the objective a `struct` to avoid capturing global variables. | ||
struct NLLObjective{R,D,T} | ||
reconstruct::R | ||
basedist::D | ||
data::T | ||
end | ||
function (obj::NLLObjective)(θs...) | ||
transformed_dist = transformed(obj.basedist, obj.reconstruct(θs)) | ||
return -sum(Base.Fix1(logpdf, transformed_dist), eachcol(obj.data)) | ||
end | ||
# Some random data to estimate the density of. | ||
xs = randn(2, 1000); | ||
# Construct the objective. | ||
f = NLLObjective(reconstruct, MvNormal(2, 1), xs); | ||
# Initial loss. | ||
@info "Initial loss: $(f(θs...))" | ||
# Train using gradient descent. | ||
ε = 1e-3; | ||
for i = 1:100 | ||
∇s = Zygote.gradient(f, θs...) | ||
θs = map(θs, ∇s) do θ, ∇ | ||
θ - ε .* ∇ | ||
end | ||
end | ||
# Final loss | ||
@info "Finall loss: $(f(θs...))" | ||
# Very simple check to see if we learned something useful. | ||
samples = rand(transformed(f.basedist, f.reconstruct(θs)), 1000); | ||
mean(eachcol(samples)) # ≈ [0, 0] | ||
cov(samples; dims=2) # ≈ I | ||
``` | ||
|
||
We can easily create more complex flows by simply doing `PlanarLayer(10) ∘ PlanarLayer(10) ∘ RadialLayer(10)` and so on. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
# Bijectors.jl | ||
|
||
This package implements a set of functions for transforming constrained random variables (e.g. simplexes, intervals) to Euclidean space. The 3 main functions implemented in this package are the `link`, `invlink` and `logpdf_with_trans` for a number of distributions. The distributions supported are: | ||
1. `RealDistribution`: `Union{Cauchy, Gumbel, Laplace, Logistic, NoncentralT, Normal, NormalCanon, TDist}`, | ||
2. `PositiveDistribution`: `Union{BetaPrime, Chi, Chisq, Erlang, Exponential, FDist, Frechet, Gamma, InverseGamma, InverseGaussian, Kolmogorov, LogNormal, NoncentralChisq, NoncentralF, Rayleigh, Weibull}`, | ||
3. `UnitDistribution`: `Union{Beta, KSOneSided, NoncentralBeta}`, | ||
4. `SimplexDistribution`: `Union{Dirichlet}`, | ||
5. `PDMatDistribution`: `Union{InverseWishart, Wishart}`, and | ||
6. `TransformDistribution`: `Union{T, Truncated{T}} where T<:ContinuousUnivariateDistribution`. | ||
|
||
All exported names from the [Distributions.jl](https://github.com/TuringLang/Bijectors.jl) package are reexported from `Bijectors`. | ||
|
||
Bijectors.jl also provides a nice interface for working with these maps: composition, inversion, etc. | ||
The following table lists mathematical operations for a bijector and the corresponding code in Bijectors.jl. | ||
|
||
| Operation | Method | Automatic | | ||
|:------------------------------------:|:-----------------:|:-----------:| | ||
| `b ↦ b⁻¹` | `inverse(b)` | ✓ | | ||
| `(b₁, b₂) ↦ (b₁ ∘ b₂)` | `b₁ ∘ b₂` | ✓ | | ||
| `(b₁, b₂) ↦ [b₁, b₂]` | `stack(b₁, b₂)` | ✓ | | ||
| `x ↦ b(x)` | `b(x)` | × | | ||
| `y ↦ b⁻¹(y)` | `inverse(b)(y)` | × | | ||
| `x ↦ log|det J(b, x)|` | `logabsdetjac(b, x)` | AD | | ||
| `x ↦ b(x), log|det J(b, x)|` | `with_logabsdet_jacobian(b, x)` | ✓ | | ||
| `p ↦ q := b_* p` | `q = transformed(p, b)` | ✓ | | ||
| `y ∼ q` | `y = rand(q)` | ✓ | | ||
| `p ↦ b` such that `support(b_* p) = ℝᵈ` | `bijector(p)` | ✓ | | ||
| `(x ∼ p, b(x), log|det J(b, x)|, log q(y))` | `forward(q)` | ✓ | | ||
|
||
In this table, `b` denotes a `Bijector`, `J(b, x)` denotes the Jacobian of `b` evaluated at `x`, `b_*` denotes the [push-forward](https://www.wikiwand.com/en/Pushforward_measure) of `p` by `b`, and `x ∼ p` denotes `x` sampled from the distribution with density `p`. | ||
|
||
The "Automatic" column in the table refers to whether or not you are required to implement the feature for a custom `Bijector`. "AD" refers to the fact that it can be implemented "automatically" using automatic differentiation. |