Sequence placement

PICRUSt2 wraps HMMER to place study sequences into a reference multiple-sequence alignment and then places these sequences into the reference phylogeny with EPA-NG or SEPP. The "study sequences" referred to will be the representative OTUs and/or ASVs under the typical workflow. The tool GAPPA is used to convert the resulting .jplace object into newick format.

Please note that before PICRUSt2-v2.6.0 the default running of this command was with the PICRUSt2-oldIMG database. As of PICRUSt2-v2.6.0 the default database will be the PICRUSt2-MPGA database. See here for further details on this new database. See the details for the --ref_dir option for using the PICRUSt2-oldIMG database with PICRUSt2-v2.6.0.

Note that your input study sequences need to be on the positive strand!

place_seqs.py -s study_seqs.fna -o placed_seqs.tre -p 1 --intermediate placement_working

The script takes these arguments/options:

-s FASTA: your study sequences (i.e. FASTA of amplicon sequence variants or operational taxonomic units)
--ref_dir DIRECTORY: Argument specifying non-default reference files to use for sequence placement. There are four expected files in this directory (see below). As of PICRUSt2-v2.6.0, the default for this will be to place in the bacterial tree, and it can now take 'bac'/'bacteria', 'arc'/'archaea' or 'oldIMG' as options. If you want to place sequences in the bacterial tree, run this with --ref_dir bac, if you want to place sequences in the archaeal tree, run this with --ref_dir arc, and if you want to run this with the oldIMG database, run this with --ref_dir oldIMG.
-o TREEFILE: Output tree with placed study sequences.
-t epa-ng|sepp: Placement tool to use when placing sequences into reference tree. One of "epa-ng" or "sepp" must be input (default: epa-ng)
-p INT: Number of processes to run in parallel.
--intermediate: Option to specify a folder where intermediate files will be written (otherwise they will not be kept).
--chunk_size: Number of query seqs to read in at once for EPA-NG (default: 5000).
--verbose: Option to specify that wrapped commands will be printed to screen (useful for troubleshooting!).

Using Custom Reference Files

To use custom reference files you need to specify a directory with --ref_dir that contains:

A multiple-sequence alignment (with the extension .fna or .fasta and can optionally be gzipped)
A tree in newick format (extension .tre)
A hidden-markov model of the multiple-sequence alignment (extension .hmm)
A modelfile output by RaXmL specifying the best parameters for the tree (extension .model)

Note that the prefix of these files needs to be the same as the specified folder name. For instance, the default reference files (prokaryotic 16S rRNA gene alignment) are in picrust2/default_files/prokaryotic/pro_ref and they all have the prefix "pro_ref":

pro_ref.fna.gz
pro_ref.hmm
pro_ref.model
pro_ref.tre

If you do not have a model file you can create one by following these instructions. You can create an HMM of your alignment with hmmbuild.

Further details on creating these files can be found in the wiki describing how the updated database was built here.

Please first check our FAQ if you have any questions about PICRUSt2.

For other general questions and comments about PICRUSt2 please search the PICRUSt google group. If the question has not been previously answered then please make a new thread.

To report a bug or to make a feature request please make a new issue at the top of this page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sequence placement

Using Custom Reference Files

Home

Major bug reports and announcements

Key limitations

Installation

Workflow

Updated PICRUSt2-MPGA database

Tutorial

QIIME 2 plugin

Validation with paired metagenomes

FAQ

Clone this wiki locally