This pipeline performs assembly of long read data using Flye and polishing carried out by Racon/Medaka. QC statistics are provided using QUAST.
This pipeline assumes data has been basecalled using Guppy and does not support trimming of adapters as the latest versions of Guppy trim these. Reads of short length or poor quality are removed within the pipeline to ensure high quality assembly either by nanoq or chopper based on preference.
A manifest is supplied using the --manifest option.
The manifest is a file consisting of three columns in CSV format.
ID,long_fastq,genome_size
ID: name of the sample output directory including assembly and quast outputslong_fastq: path to the basecalled, adapter trimmed and demultiplexed read associated to that IDgenome_size: approximate size of the genome; used by flye to reduce memory use while producing the draft assemblies (optional - leave blank if not supplied or unknown)
Within the runName directory you will have a copy of the submitted manifest and a quast summary TSV for all samples within the submitted manifest.
Each sample output directory has the following subdirectories:
Draft_assembly- Contains your draft assembly directly from flye and theflye.logfile for extra information on the assembly.lr_polished_assembly- Contains the consensus fasta final assembly from medaka and quast summary based on this.
- Currently the pipeline supports optional parameters within Flye and Racon processes. This will need to be removed at the time of production.
- Currently the pipeline publishes all assemblies and quast summaries under the
runNamedirectory. This needs to be changed before central use is enabled.