Break apart `pipeline.py` into constituent scripts, with checks in snakefile #16

alliblk · 2017-12-29T21:44:39Z

Currently numerous pipeline processes, including primer trimming, mapping reads to reference, and all nanopolish processes are called in fasta_to_consensus_1d.py which is called in pipeline.py. Because so many of the bioinformatic steps are occurring within this single script, there aren't currently any snakemake rules that check for intermediary files (such as the bam files and vcf files). This can make re-running a long process, because if there's any failure in pipeline.py you have to re-map and re-index, even if the intermediate files exist and are fine, and these processes have reasonably long run times.

I think it might be a good idea to break apart these steps so that we can build rules into the snakefile that check for bams, trimmed bams, and other intermediary files to reduce run times on pipeline re-runs. I think this may also improve readability/transparency of the pipeline.

The text was updated successfully, but these errors were encountered:

alliblk assigned barneypotter24 Dec 29, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Break apart `pipeline.py` into constituent scripts, with checks in snakefile #16

Break apart `pipeline.py` into constituent scripts, with checks in snakefile #16

alliblk commented Dec 29, 2017

Break apart pipeline.py into constituent scripts, with checks in snakefile #16

Break apart pipeline.py into constituent scripts, with checks in snakefile #16

Comments

alliblk commented Dec 29, 2017

Break apart `pipeline.py` into constituent scripts, with checks in snakefile #16

Break apart `pipeline.py` into constituent scripts, with checks in snakefile #16