Skip to content

Problem with the VEP ANNOTATE snakemake wrapper #167

@moldach

Description

@moldach

Snakemake version

Snakemake

5.23.0

Wrapper

Issue

I'm trying to adapt my regular VEP code to use the snakemake wrapper instead but am running into an issue.

I want to make sure that a) the wrapper works for me and b) it produces the same results as the following:

vep \
        -i {input.sample} \
        --species "caenorhabditis_elegans" \
        --format "vcf" \
        --everything \
        --offline \
        --force_overwrite \
        --fasta {input.ref} \
        --gff {input.annot} \
        --tab \
        --variant_class \
        --regulatory \
        --show_ref_allele \
        --numbers \
        --symbol \
        --protein \
        -o {params.sample}

In order to use VEP with wrappers there are 3 different rules.

I have got the following two working:

# VEP Download Plugins
rule download_vep_plugins:
    output:
        directory("resources/vep/plugins")
    params:
        release=100
    wrapper:
        "0.64.0/bio/vep/plugins"

# VEP Cache
rule get_vep_cache:
    output:
        directory("resources/vep/cache")
    params:
        species="caenorhabditis_elegans",
        build="WBcel235",
        release="100"
    log:
        "logs/vep/cache.log"
    wrapper:
        "0.64.0/bio/vep/cache"

The third rule is to actually run VEP for which I've written the following rule:

rule variant_annotation:
    input:
        calls= lambda wildcards: getVCFs(wildcards.sample),
        cache="resources/vep/cache",
        plugins="resources/vep/plugins",
    output:
        calls=os.path.join(dirs_dict["ANNOT_DIR"],config["ANNOT_TOOL"],"{sample}.annotated.vcf"),
        stats=os.path.join(dirs_dict["ANNOT_DIR"],config["ANNOT_TOOL"],"{sample}.html")
    params:
        plugins=["LoFtool"],
        extra="--everything"
    message: """--- Annotating Variants."""
    resources:
        mem = 30000,
        time = 120
    threads: 4
    wrapper:
        "0.64.0/bio/vep/annotate"

When submitting the job this is the error I receive:

Building DAG of jobs...
Using shell: /cvmfs/soft.computecanada.ca/nix/var/nix/profiles/16.09/bin/bash
Provided cores: 4
Rules claiming more threads will be scaled down.
Job counts:
        count   jobs
        1       variant_annotation
        1

[Tue Aug 25 11:23:04 2020]
Job 0: --- Annotating Variants.

Activating conda environment: /scratch/moldach/MADDOG/VCF-FILES/biostars439754/.snakemake/conda/f16fdb5f
Failed to open VARIANT_CALLING/varscan/470_sorted_dedupped_snp_varscan.vcf: unknown file type
Possible precedence issue with control flow operator at /scratch/moldach/MADDOG/VCF-FILES/biostars439754/.snakemake/conda/f16fdb5f/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 805.
Traceback (most recent call last):
  File "/scratch/moldach/MADDOG/VCF-FILES/biostars439754/.snakemake/scripts/tmpm4v6gdij.wrapper.py", line 44, in <module>
    "(bcftools view {snakemake.input.calls} | "
  File "/home/moldach/bin/snakemake/lib/python3.8/site-packages/snakemake/shell.py", line 156, in __new__
    raise sp.CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'set -euo pipefail;  (bcftools view VARIANT_CALLING/varscan/470_sorted_dedupped_snp_varscan.vcf | vep --everything --fork 4 --format vcf --vcf --cache --cache_version 100 --species caenorhabditis_elegans --assembly WBcel235 --dir_cache resources/vep/cache --dir_plugins resources/vep/plugins --offline --plugin LoFtool --output_file STDOUT --stats_file ANNOTATION/VEP/470.html | bcftools view -Ov > ANNOTATION/VEP/470.annotated.vcf)' returned non-zero exit status 1.
[Tue Aug 25 11:25:02 2020]
Error in rule variant_annotation:
    jobid: 0
    output: ANNOTATION/VEP/470.annotated.vcf, ANNOTATION/VEP/470.html
    conda-env: /scratch/moldach/MADDOG/VCF-FILES/biostars439754/.snakemake/conda/f16fdb5f

RuleException:
CalledProcessError in line 393 of /scratch/moldach/MADDOG/VCF-FILES/biostars439754/Snakefile:
Command 'source /home/moldach/miniconda3/bin/activate '/scratch/moldach/MADDOG/VCF-FILES/biostars439754/.snakemake/conda/f16fdb5f'; set -euo pipefail;  python /scratch/moldach/MADDOG/VCF-FILES/biostars439754/.snakemake/scripts/tmpm4v6gdij.wrapper.py' returned non-zero exit status 1.
  File "/scratch/moldach/MADDOG/VCF-FILES/biostars439754/Snakefile", line 393, in __rule_variant_annotation
  File "/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/python/3.8.0/lib/python3.8/concurrent/futures/thread.py", line 57, in run
Removing output files of failed job variant_annotation since they might be corrupted:
ANNOTATION/VEP/470.annotated.vcf, ANNOTATION/VEP/470.html
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message

This issue was originally brought up on StackOverflow, then on the ensembl-vep repo but seems it should be posted here for a definitive answer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions