Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Joint Junction Mutation and Repeat Expansion Interpretation #389

Open
zahraa992 opened this issue Dec 19, 2024 · 1 comment
Open

Joint Junction Mutation and Repeat Expansion Interpretation #389

zahraa992 opened this issue Dec 19, 2024 · 1 comment

Comments

@zahraa992
Copy link

Hi,

A gene has acquired a joint junction mutation at position 1610595, as identified by breseq. According to the reference sequence of that gene the sequence between 1610488 and 1610595 contains four out of seven repeats of the sequence 'GAAGATGGCTACTAAGGAAGACCTCCA'.

I assumed that if a new junction occurred, I would find the remaining three repeats of the seven in the newly joined sequence. However, I was unable to find the remaining repeats in the new joint sequence. Additionally, I could not match any part of the newly joined sequence to the reference genome.

My question is: Why did breseq report this mutation as "(GAAGATGGCTACTAAGGAAGACCTCCA)4→8"? Could this be a misinterpretation by breseq? And what might have actually happened to the remaining repeats?

the mutation:
image

part of the newly joint sequence
image

Thank you in advance,

@jeffreybarrick
Copy link
Contributor

This junction is consistent with 4 additional copies of the 27-bp repeat being added. Its frequency is at 57%, but it is in some kind of repeat region, so it seems plausible that one of the two copies in the reference has mutated in this way.

It looks like breseq was not able to recognize that there were an additional three copies in the reference already in addition to the four it reports as changing there in the reference. I agree that it should start at at 7 copies instead of 4 copies.

Therefore, I think the mutation should probably be output as (GAAGATGGCTACTAAGGAAGACCTCCA)7→11

Can you share the reads and reference genome with me by emailing the address in the breseq header? (I can set up a shared folder for upload if helpful.) That will let me investigate where the logic is off in how it is counting the repeat units.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants