Skip to content

Commit

Permalink
using writeTableCF to convert eCF to df in doc
Browse files Browse the repository at this point in the history
  • Loading branch information
cecileane committed Apr 11, 2023
1 parent aaf3069 commit eb961e8
Show file tree
Hide file tree
Showing 6 changed files with 29 additions and 33 deletions.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@

## overview

`QuartetNetworkGoodnessFit` is a Julia package for phylogenetic networks analyses using four-taxon subsets.
`QuartetNetworkGoodnessFit`, aka "Quarnet GoF" or simply "QGoF",
is a Julia package for phylogenetic networks analyses using four-taxon subsets.
It includes tools to measure the
goodness of fit of a phylogenetic network to data on subsets of 4 tips.
It depends on the [PhyloNetworks](https://github.com/crsl4/PhyloNetworks.jl)
Expand Down
17 changes: 5 additions & 12 deletions docs/src/man/expected_qCFs.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,25 +34,18 @@ listed to the left.
We can also convert our list to a data frame:
```@repl expcf
using DataFrames
df = DataFrame(
t1 = [t[q.taxonnumber[1]] for q in eCFs],
t2 = [t[q.taxonnumber[2]] for q in eCFs],
t3 = [t[q.taxonnumber[3]] for q in eCFs],
t4 = [t[q.taxonnumber[4]] for q in eCFs],
CF12_34 = [q.data[1] for q in eCFs],
CF13_24 = [q.data[2] for q in eCFs],
CF14_23 = [q.data[3] for q in eCFs],
)
df = writeTableCF(eCFs, t) # requires PhyloNetworks v0.16.1
```

If we wanted to compare with the observed frequency among 100 gene trees,
we could use data frame manipulations to join the two data frames:

```@repl expcf
using PhyloCoalSimulations
genetrees = simulatecoalescent(net, 100, 1); # 1000 genes, 1 individual / pop
genetrees = simulatecoalescent(net, 100, 1); # 100 genes, 1 individual / pop
df_sim = writeTableCF(countquartetsintrees(genetrees; showprogressbar=false)...);
first(df_sim, 2)
select!(df_sim, Not(:ngenes)); # delete column with number of genes
df_both = outerjoin(df, df_sim; on=[:t1,:t2,:t3,:t4], renamecols = "exact" => "sim")
select!(df_sim, Not([:qind,:ngenes])); # delete columns with q index and number of genes
df_both = outerjoin(select(df, Not(:qind)), df_sim;
on=[:t1,:t2,:t3,:t4], renamecols = "exact" => "sim")
```
2 changes: 2 additions & 0 deletions src/QuartetNetworkGoodnessFit.jl
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ using StaticArrays
using Statistics: mean, median
using StatsFuns: normccdf, chisqccdf, betacdf, betaccdf

const PN = PhyloNetworks # for easier use of internals

export
quarnetGoFtest!,
network_expectedCF,
Expand Down
6 changes: 3 additions & 3 deletions src/quarnetGoF.jl
Original file line number Diff line number Diff line change
Expand Up @@ -274,18 +274,18 @@ function expectedCF_ordered(dcf::DataCF, net::HybridNetwork, suffix=""::Abstract
nq = length(dcf.quartet)
expCF = SharedArray{Float64, 2}(nq,3)
# careful ordering: ["t","t_0"] ordered differently after we add suffix "_1"
taxa = PhyloNetworks.sort_stringasinteger!(tipLabels(net) .* suffix) # same as done in countquartetsintrees
taxa = PN.sort_stringasinteger!(tipLabels(net) .* suffix) # same as done in countquartetsintrees
nsuff = length(suffix)
taxonnumber = Dict(chop(taxa[i];tail=nsuff) => i for i in eachindex(taxa))
ntax = length(taxa)
nCk = PhyloNetworks.nchoose1234(ntax) # matrix used to ranks 4-taxon sets
nCk = PN.nchoose1234(ntax) # matrix used to ranks 4-taxon sets
nq == nCk[ntax+1,4] || error("dcf is assumed to contain ALL $(nCk[ntax+1,4]) four-taxon sets, but contains $nq only.")
ptype = Vector{Int8}(undef,4) # permutation vector to sort the 4 taxa
resperm = Vector{Int8}(undef,3) # permutation to sort the 3 resolutions accordingly
for q in dcf.quartet
tn = map(i->taxonnumber[i], q.taxon)
sortperm!(ptype, tn)
qi = PhyloNetworks.quartetrank(tn[ptype]..., nCk)
qi = PN.quartetrank(tn[ptype]..., nCk)
# next: find the corresponding permutation of the 3 CFs.
# this permutation is invariant to permuting ptype like: [a b c d] -> [c d a b] or [b a d c]
# modify ptype to have 1 first, then only 6 permutations, corresponding to 6 permutations of 3 resolutions
Expand Down
32 changes: 16 additions & 16 deletions src/quarnetconcordancefactors.jl
Original file line number Diff line number Diff line change
Expand Up @@ -69,21 +69,21 @@ B1,B2,C,D: [1.0, 9.42e-7, 9.42e-7]
function network_expectedCF(net::HybridNetwork; showprogressbar=true,
inheritancecorrelation=0)
net.node[net.root].leaf && error("The root can't be a leaf.")
PhyloNetworks.check_nonmissing_nonnegative_edgelengths(net,
PN.check_nonmissing_nonnegative_edgelengths(net,
"Edge lengths are needed in coalescent units to calcualte expected CFs.")
all(e.gamma >= 0.0 for e in net.edge) || error("some γ's are missing for hybrid edges: can't calculate expected CFs.")
inheritancecorrelation >= 0 || error("the inheritance correlation should be non-negative")
inheritancecorrelation <= 1 || error("the inheritance correlation should be <= 1")
taxa = sort!(tipLabels(net))
taxonnumber = Dict(taxa[i] => i for i in eachindex(taxa))
ntax = length(taxa)
nCk = PhyloNetworks.nchoose1234(ntax) # matrix to rank 4-taxon sets
nCk = PN.nchoose1234(ntax) # matrix to rank 4-taxon sets
qtype = MVector{3,Float64} # 3 floats: CF12_34, CF13_24, CF14_23; initialized at 0.0
numq = nCk[ntax+1,4]
quartet = Vector{PhyloNetworks.QuartetT{qtype}}(undef, numq)
quartet = Vector{PN.QuartetT{qtype}}(undef, numq)
ts = [1,2,3,4]
for qi in 1:numq
quartet[qi] = PhyloNetworks.QuartetT(qi, SVector{4}(ts), MVector(0.,0.,0.))
quartet[qi] = PN.QuartetT(qi, SVector{4}(ts), MVector(0.,0.,0.))
# next: find the 4-taxon set with the next rank,
# faster than using the direct mapping function
ind = findfirst(x -> x>1, diff(ts))
Expand Down Expand Up @@ -128,11 +128,11 @@ mapping taxon labels in to their indices in `taxa`, for easier lookup.
For `inheritancecorrelation` see [`network_expectedCF`](@ref).
Its value should be between 0 and 1 (not checked by this internal function).
"""
function network_expectedCF!(quartet::PhyloNetworks.QuartetT{MVector{3,Float64}},
function network_expectedCF!(quartet::PN.QuartetT{MVector{3,Float64}},
net::HybridNetwork, taxa, taxonnumber,
inheritancecorrelation)
net = deepcopy(net)
PhyloNetworks.removedegree2nodes!(net)
PN.removedegree2nodes!(net)
# delete all taxa except for the 4 in the quartet
for taxon in taxa
taxonnumber[taxon] in quartet.taxonnumber && continue
Expand Down Expand Up @@ -172,13 +172,13 @@ function network_expectedCF_4taxa!(net::HybridNetwork, fourtaxa, inheritancecorr
deleteaboveLSA!(net)
# make sure the root is of degree 3+
if length(net.node[net.root].edge) <= 2
PhyloNetworks.fuseedgesat!(net.root, net)
PN.fuseedgesat!(net.root, net)
end
# find and delete degree-2 blobs along external edges
bcc = biconnectedComponents(net, true) # true: ignore trivial blobs
entry = PhyloNetworks.biconnectedcomponent_entrynodes(net, bcc)
entry = PN.biconnectedcomponent_entrynodes(net, bcc)
entryindex = indexin(entry, net.nodes_changed)
exitnodes = PhyloNetworks.biconnectedcomponent_exitnodes(net, bcc, false) # don't redo the preordering
exitnodes = PN.biconnectedcomponent_exitnodes(net, bcc, false) # don't redo the preordering
bloborder = sortperm(entryindex) # pre-ordering for blobs in their own blob tree
function isexternal(ib) # is bcc[ib] of degree 2 and adjacent to an external edge?
# yes if: 1 single exit adjacent to a leaf
Expand All @@ -195,7 +195,7 @@ function network_expectedCF_4taxa!(net::HybridNetwork, fourtaxa, inheritancecorr
# delete minor hybrid edge with options unroot=true: to make sure the
# root remains of degree 3+, in case a degree-2 blob starts at the root
# simplify=true: bc external blob
PhyloNetworks.deletehybridedge!(net,he, false,true,false,true,false)
PN.deletehybridedge!(net,he, false,true,false,true,false)
end
end
ndes = 4 # number of taxa descendant from lowest hybrid node
Expand All @@ -205,7 +205,7 @@ function network_expectedCF_4taxa!(net::HybridNetwork, fourtaxa, inheritancecorr
hyb = net.nodes_changed[findlast(n -> n.hybrid, net.nodes_changed)]
funneledge = [e for e in hyb.edge if getparent(e) === hyb]
ispolytomy = length(funneledge) > 1
funneldescendants = union([PhyloNetworks.descendants(e) for e in funneledge]...)
funneldescendants = union([PN.descendants(e) for e in funneledge]...)
ndes = length(funneldescendants)
n2 = (ispolytomy ? hyb : getchild(funneledge[1]))
ndes > 2 && n2.leaf && error("2+ descendants below the lowest hybrid, yet n2 is a leaf. taxa: $(fourtaxa)")
Expand Down Expand Up @@ -247,7 +247,7 @@ function network_expectedCF_4taxa!(net::HybridNetwork, fourtaxa, inheritancecorr
for j in 1:nhe
j == i && continue # don't delete hybrid edge i!
pe_index = findfirst(e -> e.number == parenthnumber[j], simplernet.edge)
PhyloNetworks.deletehybridedge!(simplernet, simplernet.edge[pe_index],
PN.deletehybridedge!(simplernet, simplernet.edge[pe_index],
false,true,false,false,false) # ., unroot=true, ., simplify=false,.
end
qCF .+= gamma .* network_expectedCF_4taxa!(simplernet, fourtaxa, inheritancecorrelation)
Expand All @@ -267,7 +267,7 @@ function network_expectedCF_4taxa!(net::HybridNetwork, fourtaxa, inheritancecorr
(sistertofirst == 3 ? MVector{3,Float64}(0.0,1.0-deepcoalprob,0.0) :
MVector{3,Float64}(0.0,0.0,1.0-deepcoalprob) ))
# no coalescence on cut-edge: delete it and extract parental networks
ispolytomy || PhyloNetworks.shrinkedge!(net, funneledge[1])
ispolytomy || PN.shrinkedge!(net, funneledge[1])
# shrinkedge! requires PhyloNetworks v0.15.2
childedge = [e for e in hyb.edge if getparent(e) === hyb]
length(childedge) == 2 ||
Expand All @@ -284,7 +284,7 @@ function network_expectedCF_4taxa!(net::HybridNetwork, fourtaxa, inheritancecorr
for k in 1:nhe
(k == i || k ==j) && continue # don't delete hybrid edges i or j
pe_index = findfirst(e -> e.number == parenthnumber[k], simplernet.edge)
PhyloNetworks.deletehybridedge!(simplernet, simplernet.edge[pe_index],
PN.deletehybridedge!(simplernet, simplernet.edge[pe_index],
false,true,false,false,false) # ., unroot=true,., simplify=false,.
end
if i != j
Expand All @@ -295,12 +295,12 @@ function network_expectedCF_4taxa!(net::HybridNetwork, fourtaxa, inheritancecorr
hn = getchild(pej) # hyb node, but in simplernet
ce2_index = findfirst(e -> e.number == childnumber[2], simplernet.edge)
ce2 = simplernet.edge[ce2_index]
PhyloNetworks.removeEdge!(hn,ce2)
PN.removeEdge!(hn,ce2)
hn_index = findfirst(x -> x === hn, ce2.node)
ce2.node[hn_index] = pn # ce2.isChild1 remains synchronized
push!(pn.edge, ce2)
# then delete hybedge j
PhyloNetworks.deletehybridedge!(simplernet, pej,
PN.deletehybridedge!(simplernet, pej,
false,true,false,false,false) # ., unroot=true,., simplify=false,.)
end
qCF_subnet = network_expectedCF_4taxa!(simplernet, fourtaxa, inheritancecorrelation)
Expand Down
2 changes: 1 addition & 1 deletion src/utils.jl
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@ function reroot!(net, refnet)
try
directEdges!(net)
catch e
isa(e, PhyloNetworks.RootMismatch) || rethrow(e)
isa(e, PN.RootMismatch) || rethrow(e)
continue
end
# now, i is admissible: internal and compatible with direction
Expand Down

2 comments on commit eb961e8

@cecileane
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator register

Release notes:

  • new: calculation of expected concordance factors on networks of any level, and under the coalescent model with possible correlated inheritance.
  • requires PhyloNetworks v0.16 for accessors (e.g. getchild, getparent).

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/81434

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v0.5.0 -m "<description of version>" eb961e80ed60751bf041629e2748cb4ff74f849f
git push origin v0.5.0

Please sign in to comment.