Skip to content

sanger-pathogens/assembly_flye_long_read

Repository files navigation

Assembly Flye Long Read

This pipeline performs assembly of long read data using Flye and polishing carried out by Racon/Medaka. QC statistics are provided using QUAST. This pipeline assumes data has been basecalled using Guppy and does not support trimming of adapters as the latest versions of Guppy trim these. Reads of short length or poor quality are removed within the pipeline to ensure high quality assembly either by nanoq or chopper based on preference.

Inputs:

A manifest is supplied using the --manifest option.

The manifest is a file consisting of three columns in CSV format.

ID,long_fastq,genome_size
  • ID: name of the sample output directory including assembly and quast outputs
  • long_fastq: path to the basecalled, adapter trimmed and demultiplexed read associated to that ID
  • genome_size: approximate size of the genome; used by flye to reduce memory use while producing the draft assemblies (optional - leave blank if not supplied or unknown)

Outputs:

Within the runName directory you will have a copy of the submitted manifest and a quast summary TSV for all samples within the submitted manifest.

Each sample output directory has the following subdirectories:

  • Draft_assembly - Contains your draft assembly directly from flye and the flye.log file for extra information on the assembly.
  • lr_polished_assembly - Contains the consensus fasta final assembly from medaka and quast summary based on this.

Unique to the current release:

  • Currently the pipeline supports optional parameters within Flye and Racon processes. This will need to be removed at the time of production.
  • Currently the pipeline publishes all assemblies and quast summaries under the runName directory. This needs to be changed before central use is enabled.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •