-
Notifications
You must be signed in to change notification settings - Fork 105
Workflow
Below is an overview of the PICRUSt2 workflow, which includes example commands for processing 16S sequencing data and getting E.C. number and KEGG ortholog (KO) abundances. The E.C. numbers can then be used to calculate MetaCyc pathway abundances and coverages. Note that there are other gene family databases supported which may be more informative (but which cannot be collapsed to pathways by default). See the side-bar for more details on individual commands.
Note that you can type the option -h
to get a description of each below script.
The entire pipeline can be run with this command (details):
picrust2_pipeline.py -s study_seqs.fna -i study_seqs.biom -o picrust2_out_pipeline -p 1
If you would like to run each step individually you can also do that using the below commands. Using these commands is useful when you're running into problems using picrust2_pipeline.py
and want to isolate an issue or if you only want to re-run part of the PICRUSt2 pipeline.
Place amplicon sequence variants (or OTUs) into reference phylogeny (details)
place_seqs.py -s study_seqs.fna -o placed_seqs.tre -p 1 \
--intermediate placement_working
Run hidden-state prediction to get 16S copy numbers, E.C. number, and KO abundances per predicted genome (details).
Note that NSTI values will be added to the 16S prediction table (since the -n
option was given).
hsp.py -i 16S -t placed_seqs.tre -o marker_nsti_predicted.tsv.gz -p 1 -n
hsp.py -i EC -t placed_seqs.tre -o EC_predicted.tsv.gz -p 1
hsp.py -i KO -t placed_seqs.tre -o KO_predicted.tsv.gz -p 1
Predict E.C. and KO abundances in sequencing samples (adjusts gene family abundances by 16S sequence abundance) (details)
metagenome_pipeline.py -i study_seqs.biom \
-m marker_nsti_predicted.tsv.gz \
-f EC_predicted.tsv.gz \
-o EC_metagenome_out
metagenome_pipeline.py -i study_seqs.biom \
-m marker_nsti_predicted.tsv.gz \
-f KO_predicted.tsv.gz \
-o KO_metagenome_out
Infer MetaCyc pathway abundances and coverages based on predicted E.C. number abundances (details)
pathway_pipeline.py -i EC_metagenome_out/pred_metagenome_unstrat.tsv.gz \
-o pathways_out \
--intermediate pathways_working \
-p 1
Add descriptions as new column in gene family and pathway abundance tables (details)
add_descriptions.py -i EC_metagenome_out/pred_metagenome_unstrat.tsv.gz -m EC \
-o EC_metagenome_out/pred_metagenome_unstrat_descrip.tsv.gz
add_descriptions.py -i KO_metagenome_out/pred_metagenome_unstrat.tsv.gz -m KO \
-o KO_metagenome_out/pred_metagenome_unstrat_descrip.tsv.gz
add_descriptions.py -i pathways_out/path_abun_unstrat.tsv.gz -m METACYC \
-o pathways_out/path_abun_unstrat_descrip.tsv.gz
An optional additional step is to shuffle the ASV labels in the genome prediction tables (i.e. the outputs of hsp.py
). Any analyses based on these shuffled tables can then be compared with analyses based on the actual data to check if there is more signal in the unshuffled data. See here for more details.
Please first check our FAQ if you have any questions about PICRUSt2.
For other general questions and comments about PICRUSt2 please search the PICRUSt google group. If the question has not been previously answered then please make a new thread.
To report a bug or to make a feature request please make a new issue at the top of this page.