Skip to content

Releases: bioinform/somaticseq

maintenance release

12 Sep 18:13
Compare
Choose a tag to compare
  • Restructured the utilities scripts.
  • Added the utilities/filter_SomaticSeq_VCF.py script that "demotes" PASS calls to LowQual calls based on a set of tunable hard filters.
  • BamSurgeon scripts invokes modified BamSurgeon script that splits a (proper) BAM file without the need to sort by read name.
  • No change to SomaticSeq algorithm

maintenance release

26 Aug 21:32
Compare
Choose a tag to compare
  • Added run script generators for dockerized BAMSurgeon pipelines at utilities/dockered_pipelines/bamSurgeon
  • Added an error message to r_scripts/ada_model_builder_ntChange.R when TrueVariants_or_False don't have both 0's and 1's.
  • No change to SomaticSeq algorithm.

Improved run scripts

11 Aug 00:50
Compare
Choose a tag to compare
  • Improved pipeline script generator, but still consider them experimental.
  • No change to SomaticSeq algorithm

v2.3.0

03 Aug 20:00
Compare
Choose a tag to compare
  • Changed some directory structures
  • Better integrated Strelka2

Two changes

24 Jul 08:09
Compare
Choose a tag to compare
  1. Will cat all the VCF files and use vcfsort.pl to sort if GATK.jar is not provided to do GATK CombineVariants.
  2. Added a dockerfile.

scalpel and strelka2

01 Apr 14:30
Compare
Choose a tag to compare
  • Consider if_Scalpel = 1 only if there is a SOMATIC tag in the INFO.
  • Resolved a bug in the wrapper script where Strelka2 and Scalpel VCF files clash during GATK CombineVariants.

Added Strelka2 support

08 Mar 04:00
Compare
Choose a tag to compare
  • Incorporated Strelka2 since it's now GPLv3.
  • Added another R script (ada_model_builder_ntChange.R) that uses nucleotide substitution pattern as a feature. Limited experiences have shown us that it improves the accuracy, but it's not heavily tested yet.
  • If a COSMIC site is labeled SNP in the COSMIC VCF file, if_cosmic and CNT will be labeled as 0. The COSMIC ID will still appear in the ID column. This will not change any results because both of those features are turned off in the training R script.
  • Fixed a bug: if JointSNVMix2 is not included, the values should be "NaN" instead of 0's. This is to keep consistency with how we handle all other callers.

Minor improvement and bug fixes

27 Jul 05:35
Compare
Choose a tag to compare
  • Got around an occasional unexplained issue in then ada package were the SOR is sometimes categorized as type, by forcing it to be numeric.
  • Changed defaults PASS score (i.e., probability value) from 0.7 to 0.5, and make them configurable in the SomaticSeq.Wrapper.sh script (i.e., --pass-threshold 0.5 and --lowqual-threshold 0.1).

Minor improvement and bug fixes

11 Jun 04:19
Compare
Choose a tag to compare
  • InDel_3bp now stands for indel counts within 3 bps of the variant site, instead of exactly 3 bps from the variant site as it was previously (likewise for InDel_2bp).
  • Collapse MQ0 (mapping quality of 0) reads supporting reference/variant reads into a single metric of MQ0 reads (i.e., tBAM_MQ0 and nBAM_MQ0). From experience, the number of MQ0 reads is at least equally predictive of false positive calls, rather than distinguishing if those MQ0 reads support reference or variant.
  • Obtain SOR (Somatic Odds Ratio) from BAM files instead of VarDict's VCF file.
  • Fixed a typo in the SomaticSeq.Wrapper.sh script that did not handle inclusion region correctly.

Incorporated MuTect2

06 Jun 21:02
Compare
Choose a tag to compare
  • Incorporated MuTect2 into SomaticSeq, along with some metrics from MuTect2's output VCF files.
  • In the SomaticSeq.Wrapper.sh script, you may use either the original MuTect (--mutect)/ Indelocator (--indelocator) or the new MuTect2 (--mutect2) VCF files. However, if you include both, MuTect2 will take precedence.