Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

substr outside of string in VariationEffect.pm line 1329 #1764

Open
bartgrantham opened this issue Oct 9, 2024 · 6 comments
Open

substr outside of string in VariationEffect.pm line 1329 #1764

bartgrantham opened this issue Oct 9, 2024 · 6 comments
Assignees

Comments

@bartgrantham
Copy link

Describe the issue

I am getting the following error and I've narrowed it down to a single line VCF:

substr outside of string at /opt/vep/src/ensembl-vep/Bio/EnsEMBL/Variation/Utils/VariationEffect.pm line 1329, <$fh> li
ne 20970425.

System

I'm using the official VEP docker image id 607ee83f9536 (Ubuntu 22.04.4), containing the following versions:

  ensembl              : 112.7104005
  ensembl-funcgen      : 112.be19ffa
  ensembl-io           : 112.2851b6f
  ensembl-variation    : 112.4113356
  ensembl-vep          : 112.0

Full VEP command line

I was able to recreate from a completely clean install with the following on Debian 12:

docker pull ensemblorg/ensembl-vep:latest
docker run --rm -it ensemblorg/ensembl-vep bash

## then inside the container, with the tmp.vcf attached below
perl /opt/vep/src/ensembl-vep/INSTALL.pl -a cf -s gallus_gallus_merged

vep -i tmp.vcf -o tmp.vep.vcf --offline --species gallus_gallus_merged --everything --vcf --distance 0 --pick

Full error message

substr outside of string at /opt/vep/src/ensembl-vep/Bio/EnsEMBL/Variation/Utils/VariationEffect.pm line 1329, <$fh> li
ne 20970425.
Died in forked process 70938

Data files (if applicable)

This single-line VCF triggers the bug, it was narrowed down from a much (much) larger VCF. The original had the usual headers one might expect, they are not needed to trigger the error.

tmp.vcf.gz

@dglemos dglemos self-assigned this Oct 9, 2024
@dglemos
Copy link
Contributor

dglemos commented Oct 9, 2024

Hi @bartgrantham,
Thanks for explaining the issue so clearly, it really helps in understanding the problem.
I've been able to reproduce the issue, and we're working on a fix.
I'll let you know when we have updates.

@dglemos
Copy link
Contributor

dglemos commented Oct 28, 2024

I just wanted to let you know that this issue is specific to one of the RefSeq transcripts overlaping your variant.
For now, a workaround is to run vep with only Ensembl transcripts.

@bartgrantham
Copy link
Author

Very interesting. FWIW, once I excised that single position from our data I was able to annotate the remaining 50M+ positions.

Out of curiosity, is it known what exactly it is about the RefSeq transcript that triggers this bug for this one position? It's surprising that it was a single position out of tens of millions.

@dglemos
Copy link
Contributor

dglemos commented Oct 29, 2024

For the transcript XM_040697338, the peptide sequence calculated here is incomplete. This causes a problem for this variant located at the end of the translation sequence.

@bartgrantham
Copy link
Author

Incredible catch! Congrats. Is it possible to have an automated integration test for catching these kinds of database errors?

@dglemos
Copy link
Contributor

dglemos commented Nov 25, 2024

That is correct, improving the tests to catch these cases is essential.
We have a plan to improve the RefSeq annotation in our tests, but our time to focus on this is currently limited due to other priorities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants