Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Break apart pipeline.py into constituent scripts, with checks in snakefile #16

Open
alliblk opened this issue Dec 29, 2017 · 0 comments
Assignees

Comments

@alliblk
Copy link
Contributor

alliblk commented Dec 29, 2017

Currently numerous pipeline processes, including primer trimming, mapping reads to reference, and all nanopolish processes are called in fasta_to_consensus_1d.py which is called in pipeline.py. Because so many of the bioinformatic steps are occurring within this single script, there aren't currently any snakemake rules that check for intermediary files (such as the bam files and vcf files). This can make re-running a long process, because if there's any failure in pipeline.py you have to re-map and re-index, even if the intermediate files exist and are fine, and these processes have reasonably long run times.

I think it might be a good idea to break apart these steps so that we can build rules into the snakefile that check for bams, trimmed bams, and other intermediary files to reduce run times on pipeline re-runs. I think this may also improve readability/transparency of the pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants