question: increase visibility of SNPs at wider views? #1551

kevfengler227 · 2024-08-28T15:52:32Z

Is there a way to increase the visibility of SNPs at wider views? In the example below, I can see SNP differences between alignments in a 56 kb window, but not in a 140 kb window, which encompasses the entire region I want to display.

kevfengler227 · 2024-08-28T15:58:31Z

I should add the coverage track has the desired SNP visibility, just not the alignments.

jrobinso · 2024-08-28T23:10:34Z

Probably not at the moment, but I will look into it. When zoomed in all mismatches are shown, not just those deemed significant for the coverage track. My vague recollection is we stop doing this at some resolution as it becomes too cluttered, but this should be revisited.

jrobinso · 2024-08-28T23:35:29Z

Just a note here -- this only seems to happen with long read (3rd gen) data.

kevfengler227 · 2024-08-28T23:39:14Z

Indeed. These are actually 140 kb genomic segments aligned as HiFi reads. But I am trying to show the SNP variation and haplotypes in each genome. This is one way to turn IGV into a pangenome viewer!

jrobinso · 2024-08-28T23:41:47Z

Interesting. I'll have this fixed soon. If the dataset you are using or creating is public let me know, it would be an interesting test case to add.

jrobinso · 2024-08-29T02:01:50Z

One issue that arises as you zoom out is many bases land on the same pixel. At 100kb approximately 100 bases / pixel. For typical reads this means that nearly every pixel of the alignment will have a mismatch, often multiple mismatches. We might need some user options for how to handle this.

jrobinso · 2024-08-29T06:03:13Z

As illustration here is a 247kb window of pacbio alignments with mismatches drawn. Its not usable, and rendering is extremely slow. So some preferences or special mode is needed here

kevfengler227 · 2024-08-29T14:25:57Z

yes, I did not intend to use this capability for PacBio reads or ONT reads, but rather genomes with relatively few differences. So a "genome" mode would be ideal. I can send you an example public dataset.

Of course, the user needs to do some upfront work to create the ideal input data, but reducing each genome to 1x is extremely powerful, rather than 30x PacBio, and the only practical way to view a large pangenome.

jrobinso · 2024-08-29T17:17:58Z

A public dataset would be helpful.

jrobinso · 2024-08-29T17:19:46Z

There will still be limits on zoom out as at a minimum the sequence for the entire region needs to be loaded, not to mention the read sequence in every alignment. We could not view an entire chromosome with read sequences for example.

kevfengler227 · 2024-08-29T19:26:17Z

Admittedly, this will probably only work well for low diversity applications like my initial request. In that case there are only ~13 SNPs in a handful of genomes in a 140 kb range, which was just out of visibility limit, so I was hoping for way to crank up the SNP visibility, but that wouldn't make sense if there was a ton of variation- which is often the case for plant pangenomes.

It seems that 114 kb is the visibility max for SNPs, but INDELs are visible at much wider ranges.

But this real world example from the maize pangenome probably has too many SNPs to display nicely at wider-ranges, but in some specific cases it would still be useful

kevfengler227 · 2024-08-29T19:27:06Z

here genomes were aligned in 100 kb consecutive chunks

kevfengler227 · 2024-08-29T19:48:39Z

Here is a test dataset of mock data, with a few SNPs over 245 kb

10genomes.fasta.gz

test.fasta.gz

kevfengler227 · 2024-08-29T19:49:07Z

minimap2 -ax map-hifi -t4 test.fasta 10genomes.fasta | samtools view -b -1 - | samtools sort --write-index -o 10genomes.bam

kevfengler227 · 2024-08-29T19:52:03Z

So basically trying to use IGV has a haplotype-viewer

jrobinso · 2024-08-29T21:27:58Z

I've never used minimap2 but that's o.k. I think the simplest resolution of this issue would be to just make the max window for showing mismatches user settable, probably as a preference. A new display mode is a bigger topic that deserves its own issue, and would be longer term and prioritized vs other bigger topics.

I will also make snp display subject to the limit. BTW currently the limit is not on the genomic window, which can vary by display size, but on the resolution in bp / pixel

kevfengler227 · 2024-08-29T21:33:24Z

sounds great. thanks!

baozg · 2024-09-05T11:55:16Z

Related question: If loading IGV with more than 100 genomes (wholge genome alignment by minimap2 -x asm20), the speed would be very slow. If there any way to speed it up?

kevfengler227 · 2024-09-05T13:30:28Z

Rather then performing whole-genome alignments, I typically align consecutively 10kb chunked genomes, which is faster for alignment and the alignments can be toggled by mapping quality and alignment score. If you add the genome name to the read group when running minimap2 and merge the resulting bam files, 100 genomes is essentially the same as 100x Illumina coverage and is quite rapid to view in IGV.

kevfengler227 · 2024-09-05T13:31:49Z

If you zoom out you can see the PAV in the genomes well, just not the SNPs

kevfengler227 · 2024-09-05T13:32:14Z

coloring and grouping by read group is key

kevfengler227 · 2024-09-05T13:36:14Z

finally, if you number the chunks consecutively you know exactly where it came from in the query- which is much better than using kmers or other methods where coordinates are lost. Then you know you are looking at syntenic alignments when you mouse over a chunk and see it's chunk# (position) is similar to reference

baozg · 2024-09-05T13:37:10Z

Thanks for sharing! Chunking could be a good idea, but this also lose the abiltiy to detect the variation longer than chunk length or introduce ambiguous alignment (TEs). It more like chain by yourself as you know the coordinates. I think it would be better if IGV use chunk in the browser but with more contiguous alignments. Actually, I use AnchorWave and wfmash more often, whihc nearly produce end-to-end alignment in A.thaliana (easier than maize). For the alternative approach other than IGV, I use https://github.com/cmdcolin/jbrowse-plugin-mafviewer for convert my paf to pseduomaf (which only can present SNPs or DEL)

kevfengler227 · 2024-09-05T13:50:20Z

you can use whatever chunk size you want for a given application depending on the level of similarity in the pangenome, typically 1-100 kb (aligned with map-hifi). With that you can see quite large INDELs. Again, you can control what is displayed by changing the visualization parameters in IGV more so than with whole-genome alignments. Also, the directionality of chunks is indicative of inversions. For major differences the lack of an aligned chunk also informative.

so the real beauty of the chunked alignment approach is that is highly parallelizable and rapid. One can do an all-by-all comparison in minutes, so that all/any reference(s) can be viewed in IGV with all queries on a whim. If you want to get fancy you can group your queries into various sub-groups, rather than 1 big one.

jrobinso added this to the 2.19.0 milestone Aug 30, 2024

baozg mentioned this issue Sep 9, 2024

Slow loading for assembly-assembly alignment bams #1520

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question: increase visibility of SNPs at wider views? #1551

question: increase visibility of SNPs at wider views? #1551

kevfengler227 commented Aug 28, 2024

kevfengler227 commented Aug 28, 2024

jrobinso commented Aug 28, 2024

jrobinso commented Aug 28, 2024

kevfengler227 commented Aug 28, 2024

jrobinso commented Aug 28, 2024

jrobinso commented Aug 29, 2024

jrobinso commented Aug 29, 2024

kevfengler227 commented Aug 29, 2024

jrobinso commented Aug 29, 2024

jrobinso commented Aug 29, 2024

kevfengler227 commented Aug 29, 2024

kevfengler227 commented Aug 29, 2024

kevfengler227 commented Aug 29, 2024

kevfengler227 commented Aug 29, 2024 •

edited

Loading

kevfengler227 commented Aug 29, 2024

jrobinso commented Aug 29, 2024

kevfengler227 commented Aug 29, 2024

baozg commented Sep 5, 2024

kevfengler227 commented Sep 5, 2024

kevfengler227 commented Sep 5, 2024 •

edited

Loading

kevfengler227 commented Sep 5, 2024

kevfengler227 commented Sep 5, 2024 •

edited

Loading

baozg commented Sep 5, 2024 •

edited

Loading

kevfengler227 commented Sep 5, 2024 •

edited

Loading

question: increase visibility of SNPs at wider views? #1551

question: increase visibility of SNPs at wider views? #1551

Comments

kevfengler227 commented Aug 28, 2024

kevfengler227 commented Aug 28, 2024

jrobinso commented Aug 28, 2024

jrobinso commented Aug 28, 2024

kevfengler227 commented Aug 28, 2024

jrobinso commented Aug 28, 2024

jrobinso commented Aug 29, 2024

jrobinso commented Aug 29, 2024

kevfengler227 commented Aug 29, 2024

jrobinso commented Aug 29, 2024

jrobinso commented Aug 29, 2024

kevfengler227 commented Aug 29, 2024

kevfengler227 commented Aug 29, 2024

kevfengler227 commented Aug 29, 2024

kevfengler227 commented Aug 29, 2024 • edited Loading

kevfengler227 commented Aug 29, 2024

jrobinso commented Aug 29, 2024

kevfengler227 commented Aug 29, 2024

baozg commented Sep 5, 2024

kevfengler227 commented Sep 5, 2024

kevfengler227 commented Sep 5, 2024 • edited Loading

kevfengler227 commented Sep 5, 2024

kevfengler227 commented Sep 5, 2024 • edited Loading

baozg commented Sep 5, 2024 • edited Loading

kevfengler227 commented Sep 5, 2024 • edited Loading

kevfengler227 commented Aug 29, 2024 •

edited

Loading

kevfengler227 commented Sep 5, 2024 •

edited

Loading

kevfengler227 commented Sep 5, 2024 •

edited

Loading

baozg commented Sep 5, 2024 •

edited

Loading

kevfengler227 commented Sep 5, 2024 •

edited

Loading