Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identify linked mutations from mapped reads #357

Open
spleonard1 opened this issue Sep 8, 2023 · 2 comments
Open

Identify linked mutations from mapped reads #357

spleonard1 opened this issue Sep 8, 2023 · 2 comments

Comments

@spleonard1
Copy link

I'm butchering breseq's intended use case and identifying gene mutants that arose during high throughput gene variant synthesis and tracking their abundance over a short, selective time course (< 48 hours). In many cases there are multiple sets of linked mutations, which can clearly be seen from the read mapping evidence.

Is it possible to identify which mutations occur together on a single read? Does breseq keep track of which unique reads support particular mutation calls? Right now I am using some frequency correlations to loosely link mutations, but it would be nice to parse which reads support which mutations to confidently link them.

I have attached a couple representative pictures. Not a bug, just a discussion / feature request. Thanks!

image

image

@jeffreybarrick
Copy link
Contributor

breseq does not track linkage of mutations by read—not even in simple cases where there are base substitutions side-by-side (which is annoying). I can imagine a post-processing step that could go back and do this, at least in simple clear-cut cases like this.

If someone wanted to add this to breseq, they could pilot the step by making a program parse the output reference.bam file and look at the read alignment columns referred to by the RA evidence items that are within one read length of one another and counting how many times mutations are and are not within the same read. There could be some new field in the output GD file like "haplotype=XXXX" that could be used to group linked mutations.

Since this is unlikely to happen in the near future, maybe you could look into haplotype reconstruction programs used for virus genomes (and mixtures of those) to see if any of them can give you this kind of output?

@spleonard1
Copy link
Author

Oooh that’s a good idea re virus haplotyping approaches, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants