Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/src/figures/FelsensteinDownRecursion.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
31 changes: 28 additions & 3 deletions docs/src/framework.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,17 +28,42 @@ Felsenstein's algorithm recursively computes, for each node, the probability of

![](figures/FelsensteinRecursion.svg)

At the root node, we wind up with ``P(O_{all}|R)``, where ``R`` is the state at the root, and we can compute ``P(O_{all}) = \sum_{R} P(O_{all}|R) P(R)``.
At the root node, we wind up with ``P(O_{all}|R)``, where ``R`` is the state at the root, and we can compute:

```math
P(O_{all}) = \sum_{R} P(O_{all}|R) P(R) \tag{1}
```

!!! note
If the root state space would happen to be continuous, then we would instead have:
```math
P(O_{all}) = \int_{R} P(O_{all}|R) P(R) \tag{2}
```

After a Felsenstein pass, we can recursively compute, for each node, the probability of all observations above that node, given the state at that node. We call this algorithm `felsenstein_down!`, and it can be decomposed into the following combination of `forward!` and `combine!` operations:

![](figures/FelsensteinDownRecursion.svg)
!!! note
- We don't display how ``P(O_{Â}, O_L|C)`` is computed, to avoid cluttering the illustration. It, however, follows from swapping B and C.
- When we say "observation above", we mean the observations in the tree that aren't below, together with the **root state**.

## Technicalities

### Scaling constants

Coming soon.
For phylogenetic trees with many nodes or long branch lengths, the likelihood values can become extremely small, leading to numerical underflow in floating-point arithmetic. To maintain numerical stability, we perform computations in the log-domain. This approach is implemented in the following components:

- **[`GaussianPartition`](@ref) with the field `norm_const`**. Let `part isa GaussianPartition`, then if we integrate the conditional probability of `part` over the real axis, it evaluates to `exp(part.norm_const)`.
- **Concrete subtypes of [`DiscretePartition`](@ref) with the field `scaling`**. Let `part isa DiscretePartition`. The
conditional probability, `P`, of state `i` at site `j`, is computed by `P = part.state[i, j] * exp(part.scaling[j])`.


### Root state

Coming soon.
In equations (1) & (2) in the above [Algorithms](@ref) section, `parent_message` is the field of a **root**-`FelNode` which
represents the quantity `P(R)`. The typical way we specify the root state is to, during initiliazation, pass the desired `parent_message` as the template `Partition` to [`allocate!`](@ref).
In addition to the root-`parent_message` being incorporated in computing the likelihood of a tree,
it's also used for downward passes, which includes: [`felsenstein_down!`](@ref), [`branchlength_optim!`](@ref), [`nni_optim!`](@ref), [`marginal_state_dict`](@ref).

## Functions

Expand Down
16 changes: 6 additions & 10 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,20 +21,16 @@ hero:
link: https://github.com/MurrellGroup/MolecularEvolution.jl
features:
- title: Flexible Model Development
details: Create custom evolutionary models with your own data types and probability distributions while leveraging the existing infrastructure for tree operations and likelihood calculations.
details: Create custom evolutionary models by defining how a process evolves along a branch and get likelihood calculations (and much more) for free! • Mix and match different models on the same phylogeny.
link: /framework
- title: Rich Model Library
details: Includes standard models like JC69, K80, GTR for nucleotides, JTT and WAG for amino acids, MG94 and GY94 for codons, plus Brownian motion for continuous traits.
- title: Model Library
details: Nucleotide, AA, Codon, and generic discrete character models • Continuous models (eg. Brownian motion) • Site- and branch-wise mixture models and more.
link: /models
- title: Powerful Tree Operations
details: Optimize branch lengths, infer ancestral states, perform tree rearrangements (NNI), and sample topologies with MCMC.
- title: Tree Operations
details: Optimize model parameters, branch lengths, tree topology (NNI), and root position for maximum likelihood inference • Sample from the model and tree posterior with MCMC for Bayesian inference • Infer ancestral states.
link: /optimization
- title: Simulation Tools
details: Generate phylogenies using both coalescent and birth-death processes, and simulate sequence evolution on those trees.
details: Sample from a range of realistic phylogenies using a flexible coalescent process • Simulate discrete and continuous data under any model over a simulated or imported phylogeny.
link: /simulation
---
```

```@meta
CurrentModule = MolecularEvolution
```
81 changes: 80 additions & 1 deletion docs/src/models.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,94 @@
# Models

Coming soon.
We offer a catalogue of frequently used models that are already integrated with the framework and ready to be used.
We maintain that if a model you'd like to use is not included in the list, you can swiftly define one yourself and
leverage our framework nonetheless.

## Discrete state models

Here we account for a typical set-up for a discrete state `Partition`:
```@docs; canonical=false
DiscretePartition
```

And here's a list of simple concrete subtypes of `DiscretePartition`:
- [`CodonPartition`](@ref)
- [`CustomDiscretePartition`](@ref)
- [`NucleotidePartition`](@ref)
- [`GappyNucleotidePartition`](@ref)
- [`AminoAcidPartition`](@ref)
- [`GappyAminoAcidPartition`](@ref)

And then there are two typical `BranchModel`s that will cooperate with this `Partition`:
- [`GeneralCTMC`](@ref)
- [`DiagonalizedCTMC`](@ref)
!!! note
The two above can be regarded as special cases of the more general [`PModel`](@ref), which just represents a P matrix.

A typical way of constructing your Q matrix in our ecosystem is by:
```@docs; canonical=false
reversibleQ
nonreversibleQ
```

### Codon models

```@docs; canonical=false
CodonPartition
```

We offer constructors for the following Q matrix parameterizations:
- `MolecularEvolution.MG94_F3x4` - for example usage, see [Example 3: FUBAR](@ref)
- `MolecularEvolution.MG94_F61`
- `MolecularEvolution.HB98_F61`
- `MolecularEvolution.HB98_AAfit`
Use help mode, `?`, in the REPL to find out more about the above.

### Miscellaneous models
```@docs; canonical=false
InterpolatedDiscreteModel
PiQ
```

## Continuous models

The partition of interest is:
```@docs; canonical=false
GaussianPartition
```

And then there are two `BranchModel`s which are compatible with the above partition, namely:
```@docs; canonical=false
BrownianMotion
ZeroDriftOrnsteinUhlenbeck
```

## Compound models

### Branch-wise mixture
```@docs; canonical=false
BWMModel
```

### CAT
```@docs; canonical=false
CATModel
CATPartition
```

### Covarion
```@docs; canonical=false
CovarionModel
CovarionPartition
```

### Site-wise mixture
See [Example 2: GTR+Gamma](@ref) for example usage.
```@docs; canonical=false
SWMModel
SWMPartition
```

## Lazy models

### LazyPartition
Expand Down
4 changes: 4 additions & 0 deletions docs/src/optimization.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@ There are two distinct kinds of optimization: "global" model parameters, and the

The example below will set up and optimize a ["Generalized Time Reversible" nucleotide substitution model](https://en.wikipedia.org/wiki/Substitution_model), where there are 6 rate parameters that govern the symmetric part of a rate matrix, and 4 nucleotide frequencies (that sum to 1, so only 3 underlying parameters).

!!! note
For the Bayesian counterpart of this page, we refer you to [Set up a Bayesian model sampler](@ref) and [Multiple trees](@ref).

## Optimizing model parameters

We first need to construct an objective function. A very common use case involves parameterizing a rate matrix (along with all the constraints this entails) from a flat parameter vector. `reversibleQ` can be convenient here, which takes a vector of parameters and equilibrium frequencies and returns a reversible rate matrix.
Expand Down Expand Up @@ -158,5 +161,6 @@ reversibleQ
unc2probvec
branchlength_optim!
nni_optim!
root_optim!
tree_polish!
```
17 changes: 17 additions & 0 deletions src/models/compound_models/bwm.jl
Original file line number Diff line number Diff line change
@@ -1,9 +1,26 @@
export BWMModel
"""
mutable struct BWMModel{M} <: DiscreteStateModel where M <: DiscreteStateModel

# Fields
- `models::Vector{<:M}`: A vector of models.
- `weights::Vector{Float64}`: A vector of weights.

# Description
Branch-wise mixture model.
!!! note
`forward!` and `backward!` are currently only defined for `M<:PMatrixModel`.
"""
mutable struct BWMModel{M} <: DiscreteStateModel where M <: DiscreteStateModel
models::Vector{<:M}
weights::Vector{Float64}
end

"""
BWMModel{M}(models::Vector{<:M}) where M <: DiscreteStateModel

Convenience constructor where the weights are uniform.
"""
function BWMModel{M}(models::Vector{<:M}) where M <: DiscreteStateModel
BWMModel{M}(models, sum2one(ones(length(models))))
end
Expand Down
14 changes: 14 additions & 0 deletions src/models/compound_models/cat.jl
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
export CATModel
"""
CATModel(models::Vector{<:BranchModel})

CAT is something where you split the sites up, and assign each site to a different model (whose "data" gets stored in a contiguous block of memory).
"""
mutable struct CATModel <: BranchModel
models::Vector{<:BranchModel}
function CATModel(models::Vector{<:BranchModel})
Expand All @@ -7,6 +12,15 @@ mutable struct CATModel <: BranchModel
end

export CATPartition
"""
# Constructors
```julia
CATPartition(part_inds::Vector{Vector{Int}})
CATPartition(part_inds::Vector{Vector{Int}}, parts::Vector{PType})
```
# Description
A partition for the [`CATModel`](@ref).
"""
mutable struct CATPartition{PType} <: Partition where {PType <: DiscretePartition}
part_inds::Vector{Vector{Int}}
parts::Vector{PType}
Expand Down
14 changes: 14 additions & 0 deletions src/models/compound_models/covarion.jl
Original file line number Diff line number Diff line change
@@ -1,4 +1,13 @@
export CovarionModel
"""
# Constructors
```julia
CovarionModel(models::Vector{<:DiscreteStateModel}, inter_transition_rates::Matrix{Float64})
CovarionModel(models::Vector{<:DiscreteStateModel}, inter_transition_rate::Float64)
```
# Description
The covarion model.
"""
mutable struct CovarionModel <: DiscreteStateModel
model::DiscreteStateModel
function CovarionModel(models::Vector{<:DiscreteStateModel}, inter_transition_rates::Matrix{Float64})
Expand Down Expand Up @@ -35,6 +44,11 @@ end
export CovarionPartition
#Parts is a vector containing the various components
#Weights is a num_components by num_sites matrix
"""
CovarionPartition(states,sites,models,t)

A partition for the [`CovarionModel`](@ref).
"""
mutable struct CovarionPartition <: DiscretePartition
state::Array{Float64,2}
states::Int
Expand Down
19 changes: 19 additions & 0 deletions src/models/compound_models/swm.jl
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,15 @@
#for each site?

export SWMModel
"""
# Constructors
```julia
SWMModel(models::Vector{<:BranchModel})
SWMModel(model::M, rs::Vector{Float64}) where {M <: BranchModel}
```
# Description
A site-wise mixture model, for site-to-site "random effects" rate variation.
"""
mutable struct SWMModel <: BranchModel
models::Vector{<:BranchModel}
weights::Vector{Float64}
Expand All @@ -34,6 +43,16 @@ end
export SWMPartition
#Parts is a vector containing the various components
#Weights is a num_components by num_sites matrix
"""
# Constructors
```julia
SWMPartition(parts::Vector{PType}) where {PType <: MultiSitePartition}
SWMPartition(part::PType, n_parts::Int) where {PType <: MultiSitePartition}
SWMPartition(parts::Vector{PType}, weights::Vector{Float64}, sites::Int, states::Int, models::Int) where {PType <: MultiSitePartition}
```
# Description
A site-wise mixture partition for the [`SWMModel`](@ref).
"""
mutable struct SWMPartition{PType} <: Partition where {PType <: MultiSitePartition}
parts::Vector{PType}
weights::Vector{Float64}
Expand Down
5 changes: 5 additions & 0 deletions src/models/continuous_models/brownian_motion.jl
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
#Model behavior
"""
BrownianMotion(mean_drift::Float64, var_drift::Float64)

A 1D continuous Brownian motion model with mean drift and variance drift.
"""
mutable struct BrownianMotion <: ContinuousStateModel
mean_drift::Float64
var_drift::Float64
Expand Down
10 changes: 10 additions & 0 deletions src/models/continuous_models/gaussian_partition.jl
Original file line number Diff line number Diff line change
@@ -1,5 +1,15 @@
#-----------------Gaussian prop---------------
#Partition behavior
"""
# Constructors
```julia
GaussianPartition(mean::Float64, var::Float64, norm_const::Float64)
GaussianPartition(mean::Float64, var::Float64) # norm_const defaults to 0.0
GaussianPartition() # mean, var, norm_const default to 0.0, 1.0, 0.0 respectively
```
# Description
A partition representing a (not necessarily normalized) Gaussian distribution. `norm_const` is the log-domain normalization constant.
"""
mutable struct GaussianPartition <: ContinuousPartition
mean::Float64
var::Float64
Expand Down
11 changes: 11 additions & 0 deletions src/models/continuous_models/ornstein_uhlenbeck.jl
Original file line number Diff line number Diff line change
@@ -1,5 +1,14 @@
#Define a ZeroDriftOrnsteinUhlenbeck process, with minimal functionality for simulation
#(ie. branch_prop_down and eq freqs)
"""
function ZeroDriftOrnsteinUhlenbeck(
mean::Float64,
volatility::Float64,
reversion::Float64,
)

A 1D continuous Ornstein-Uhlenbeck process with mean, volatility, and reversion.
"""
mutable struct ZeroDriftOrnsteinUhlenbeck <: ContinuousStateModel
mean::Float64
volatility::Float64
Expand Down Expand Up @@ -39,3 +48,5 @@ function eq_freq_from_template(
out_partition.norm_const = 0.0
return out_partition
end

export ZeroDriftOrnsteinUhlenbeck
10 changes: 10 additions & 0 deletions src/models/discrete_models/PiQ.jl
Original file line number Diff line number Diff line change
@@ -1,6 +1,16 @@
#F81 but for general dimensions.

#Propagation in O(States)
"""
# Constructors
```julia
PiQ(r::Float64,pi::Vector{Float64}; normalize=false)
PiQ(pi::Vector{Float64}; normalize=false)
```
# Description
The F81 substitution model, but for general dimensions.
https://www.diva-portal.org/smash/get/diva2:1878793/FULLTEXT01.pdf
"""
mutable struct PiQ <: DiscreteStateModel
r::Float64
pi::Vector{Float64}
Expand Down
Loading
Loading