-
Notifications
You must be signed in to change notification settings - Fork 204
Microbiome Helper 2 Getting a phylogenetic taxonomic tree (metagenome)
Authors: Robyn Wright
Modifications by: NA
Please note: We are still testing/developing this so use with caution :)
While with marker gene analyses, we're often able to insert our reads into a phylogenetic tree (or build a phylogenetic tree de novo), the same isn't often the case with read-based metagenome analyses.
One way around this is to use reference genomes (and phylogenetic trees) for the taxa that we have identified in our samples. We previously did this for Figure 1 in this paper. We did this by matching the NCBI taxonomy ID's from our Kraken2 classifications to the GTDB bacterial and archaeal phylogenetic trees. This page will give instructions on doing this.
Note that we recommend using these trees for visualisation purposes only as their accuracy will depend on the accuracy of your taxonomic assignments!
It is assumed that you have either already run our Kraken2/Bracken workflow and have a Bracken output file that contains NCBI taxonomy ID's. We may consider developing this script to be more flexible as to the input type in the future, if that is of interest.
wget https://data.ace.uq.edu.au/public/gtdb/data/releases/release226/226.0/ar53_r226.tree
wget https://data.ace.uq.edu.au/public/gtdb/data/releases/release226/226.0/bac120_r226.tree
wget https://data.ace.uq.edu.au/public/gtdb/data/releases/release226/226.0/ar53_metadata_r226.tsv.gz
wget https://data.ace.uq.edu.au/public/gtdb/data/releases/release226/226.0/bac120_metadata_r226.tsv.gz
Unzip:
gunzip *metadata_r226.tsv.gz
Activate environment with needed programs:
mamba activate get_tree
First get the script:
wget http://kronos.pharmacology.dal.ca/public_files/MH2/scripts/filter_gtdb_tree_on_bracken_output.py
Now make an output folder for these files, and run the script:
mkdir gtdb_filtered
python filter_gtdb_tree_on_bracken_output.py --bracken bracken_out_merged/merged_output.species.bracken \
--outprefix gtdb_filtered/CAMI_marine_GTDB_tree \
--archaea ar53_metadata_r226.tsv \
--bacteria bac120_metadata_r226.tsv \
--arc_tree ar53_r226.tree \
--bac_tree bac120_r226.tree
We're giving this script:
-
--bracken: the merged (species-level - this is important!) Bracken output file -
--outprefix: the prefix to give out output files (this can optionally include a folder within it) -
--archaea: the path to the GTDB archaea metadata file (that we downloaded and unzipped above) -
--bacteria: the path to the GTDB bacteria metadata file (that we downloaded and unzipped above) -
--arc_tree: the path to the GTDB archaea tree file (that we downloaded and unzipped above) -
--bac_tree: the path to the GTDB bacteria tree file (that we downloaded and unzipped above)
The output that you'll get from this will be four files:
-
outprefix_ar53_r226_bracken.tree: the GTDB archaeal tree file filtered to only include the taxa that match the NCBI taxonomy ID's in your Bracken output -
outprefix_bac120_r226_bracken.tree: the GTDB bacterial tree file filtered to only include the taxa that match the NCBI taxonomy ID's in your Bracken output -
outprefix_bracken_merged_output.species.bracken: the same file as you gave above with--bracken, but with three extra columns:gtdb_accession,gtdb_taxonomyandgtdb_domain. These tell you the GTDB genome accession (so you can match up with the names in the filtered tree files), the full GTDB taxonomy, and the domain that they match (so you know which tree file to use), respectively. -
outprefix_fraction_reads_remaining_in_GTDB_tree.tsv: a file that gives a summary of the proportion of your reads that are represented within these two trees
- Please feel free to post a question on the Microbiome Helper google group if you have any issues.
- General comments or inquires about Microbiome Helper can be sent to [email protected].
Useful Links
Microbiome Helper 2
- Overview
- Tutorial data
- QIIME2 marker gene workflow
- QIIME2 basic statistics and visualisation
- Statistical analysis workflow in R
- Metagenomics first steps
- Taxonomic annotation with Kraken 2
- Getting a tree for metagenomic reads
- MAG assembly, binning, and curation with Anvi'o
Tutorials