Releases: bioinform/somaticseq
Releases · bioinform/somaticseq
maintenance release
- Restructured the utilities scripts.
- Added the utilities/filter_SomaticSeq_VCF.py script that "demotes" PASS calls to LowQual calls based on a set of tunable hard filters.
- BamSurgeon scripts invokes modified BamSurgeon script that splits a (proper) BAM file without the need to sort by read name.
- No change to SomaticSeq algorithm
maintenance release
- Added run script generators for dockerized BAMSurgeon pipelines at utilities/dockered_pipelines/bamSurgeon
- Added an error message to r_scripts/ada_model_builder_ntChange.R when TrueVariants_or_False don't have both 0's and 1's.
- No change to SomaticSeq algorithm.
Improved run scripts
- Improved pipeline script generator, but still consider them experimental.
- No change to SomaticSeq algorithm
v2.3.0
Two changes
- Will cat all the VCF files and use vcfsort.pl to sort if GATK.jar is not provided to do GATK CombineVariants.
- Added a dockerfile.
scalpel and strelka2
- Consider if_Scalpel = 1 only if there is a SOMATIC tag in the INFO.
- Resolved a bug in the wrapper script where Strelka2 and Scalpel VCF files clash during GATK CombineVariants.
Added Strelka2 support
- Incorporated Strelka2 since it's now GPLv3.
- Added another R script (ada_model_builder_ntChange.R) that uses nucleotide substitution pattern as a feature. Limited experiences have shown us that it improves the accuracy, but it's not heavily tested yet.
- If a COSMIC site is labeled SNP in the COSMIC VCF file, if_cosmic and CNT will be labeled as 0. The COSMIC ID will still appear in the ID column. This will not change any results because both of those features are turned off in the training R script.
- Fixed a bug: if JointSNVMix2 is not included, the values should be "NaN" instead of 0's. This is to keep consistency with how we handle all other callers.
Minor improvement and bug fixes
- Got around an occasional unexplained issue in then ada package were the SOR is sometimes categorized as type, by forcing it to be numeric.
- Changed defaults PASS score (i.e., probability value) from 0.7 to 0.5, and make them configurable in the SomaticSeq.Wrapper.sh script (i.e., --pass-threshold 0.5 and --lowqual-threshold 0.1).
Minor improvement and bug fixes
- InDel_3bp now stands for indel counts within 3 bps of the variant site, instead of exactly 3 bps from the variant site as it was previously (likewise for InDel_2bp).
- Collapse MQ0 (mapping quality of 0) reads supporting reference/variant reads into a single metric of MQ0 reads (i.e., tBAM_MQ0 and nBAM_MQ0). From experience, the number of MQ0 reads is at least equally predictive of false positive calls, rather than distinguishing if those MQ0 reads support reference or variant.
- Obtain SOR (Somatic Odds Ratio) from BAM files instead of VarDict's VCF file.
- Fixed a typo in the SomaticSeq.Wrapper.sh script that did not handle inclusion region correctly.
Incorporated MuTect2
- Incorporated MuTect2 into SomaticSeq, along with some metrics from MuTect2's output VCF files.
- In the SomaticSeq.Wrapper.sh script, you may use either the original MuTect (--mutect)/ Indelocator (--indelocator) or the new MuTect2 (--mutect2) VCF files. However, if you include both, MuTect2 will take precedence.