snpQT (pronounced snip-cutie) makes your single-nucleotide polymorphisms cute. Also, it provides support for processing human genomic variants to do:
- human genome build conversion (b37 -> b38 and/or b38 -> b37)
- sample quality control
- population stratification
- variant quality control
- pre-imputation quality control
- local imputation
- post-imputation quality control
- genome-wide association studies
within an automated nextflow pipeline. We run a collection of versioned bioinformatics software in Singularity and Docker containers or Anaconda and Environment Modules environments to improve reliability and reproducibility.
snpQT
might be useful for you if:
- you want a clean genomic dataset using a reproducible, fast and comprehensive pipeline
- you are interested to identify significant SNP associations to a trait
- you want to identify and remove outliers based on their ancestry
- you wish to perform imputation locally
- you wish to prepare your genomic dataset for imputation in an external server (following a comprehensive QC and a pre-imputation QC preparation)
- you have already called your variants using human genome build 37 or 38
- your variants are in VCF or
plink
bfile format - your variants have "rs" ids
- your samples have either a binary or a quantitative phenotype
If this sounds like you, check out our online documentation at: https://snpqt.readthedocs.io/en/latest/
snpQT
definitely won't be useful for you if:
- you want to do quality control on raw sequence reads
- you want to call variants from raw sequence reads
- you are working on family GWAS data
- you're not working with human genomic data
If you find snpQT
useful please cite:
Vasilopoulou C, Wingfield B, Morris AP and Duddy W. snpQT: flexible, reproducible, and comprehensive quality control and imputation of genomic data [version 1; peer review: 2 approved with reservations]. F1000Research 2021, 10:567 https://doi.org/10.12688/f1000research.53821.1
snpQT
is distributed under an MIT license. Our pipeline wouldn't be possible without the following amazing third-party software:
Software | Version | Reference | License |
---|---|---|---|
EIGENSOFT | 7.2.1 | Price, Alkes L., et al. "Principal components analysis corrects for stratification in genome-wide association studies." Nature genetics 38.8 (2006): 904-909. | Custom open source |
impute5 | 1.1.4 | Rubinacci, Simone, Olivier Delaneau, and Jonathan Marchini. "Genotype imputation using the positional burrows wheeler transform." PLoS Genetics 16.11 (2020): e1009049.APA | Academic use only |
nextflow | 21.04.3 | Di Tommaso, Paolo, et al. "Nextflow enables reproducible computational workflows." Nature biotechnology 35.4 (2017): 316-319. | GPL3 |
picard | 2.24.0 | MIT | |
PLINK | 1.90b6.18 | Purcell, Shaun, et al. "PLINK: a tool set for whole-genome association and population-based linkage analyses." The American journal of human genetics 81.3 (2007): 559-575. | GPL3 |
PLINK2 | 2.00a2.3 | Chang CC, Chow CC, Tellier LCAM, Vattikuti S, Purcell SM, Lee JJ (2015) Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience, 4. | GPL3 |
samtools | 1.11 | Danecek, Petr et al. "Twelve years of SAMtools and BCFtools." GigaScience, 10(2), 1-4, 2021 | MIT |
bcftools | 1.9 | Danecek, Petr et al. "Twelve years of SAMtools and BCFtools." GigaScience, 10(2), 1–4, 2021 | MIT |
shapeit4 | 4.1.3 | Delaneau, Olivier, et al. "Accurate, scalable and integrative haplotype estimation." Nature communications 10.1 (2019): 1-10. | MIT |
snpflip | 0.0.6 | https://github.com/biocore-ntnu/snpflip | MIT |
We also use countless other bits of software like R, the R tidyverse, etc.
Full documentation is available at: https://snpqt.readthedocs.io/en/latest/