Skip to content

Slow RNA variant calling #118

@katjur01

Description

@katjur01

Hello,

I am trying to run SomaticSeq on RNA data (single-end reads) but it's really slow. It never finished because I had to kill it after 7 days. I have many large samples (~10 GB) and if the smallest one (3.0 GB) takes more than 7 days to finish, I can't use it like this. It could run for months. I’m using Kubernetes so I have enough computing capacity.

First, I used SplitNCigarReads tool (https://gatk.broadinstitute.org/hc/en-us/articles/360036858811-SplitNCigarReads) on my mapped RNA and after that, I ran variant callers (lofreq, mutect2, strelka, vardict, varscan). I ran SomaticSeq as the last with command:
somaticseq_parallel.py --threads 20 --output-directory somatic_varcalls/sample1 --genome-reference GRCh38-p10.fa --inclusion-region wgs.bed --minimum-num-callers 0.4 single --bam-file sample1.RNAsplit.bam --mutect2-vcf somatic_varcalls/sample1/MuTect2.vcf --vardict-vcf somatic_varcalls/sample1/VarDict.vcf --lofreq-vcf somatic_varcalls/sample1/Lofreq.vcf --strelka-vcf somatic_varcalls/sample1/variants.vcf.gz --varscan-vcf somatic_varcalls/sample1/VarScan2.vcf

I also tried to run SomaticSeq only with one variant caller. First, just with Vardict and it took 51 hours to finish. Second, just with Strelka and it took 32 hours to finish. I also tried to use a smaller bed file (only a few exome positions) but nothing changed.

My theory is that SomaticSeq has a problem when it encounters heavily covered reads because when it splits bed file for parallelization, some were counted fast but some took many hours or days. I think that maybe if I will do some subset of those heavily covered areas, It could help but I still don't know how to approach this.

Do you have any idea or advice on what can I do with it? I used SomaticSeq on DNA data many times before, so I know that normally it ran from a few minutes to a few hours.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions