-
Notifications
You must be signed in to change notification settings - Fork 386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
question: increase visibility of SNPs at wider views? #1551
Comments
Probably not at the moment, but I will look into it. When zoomed in all mismatches are shown, not just those deemed significant for the coverage track. My vague recollection is we stop doing this at some resolution as it becomes too cluttered, but this should be revisited. |
Just a note here -- this only seems to happen with long read (3rd gen) data. |
Indeed. These are actually 140 kb genomic segments aligned as HiFi reads. But I am trying to show the SNP variation and haplotypes in each genome. This is one way to turn IGV into a pangenome viewer! |
Interesting. I'll have this fixed soon. If the dataset you are using or creating is public let me know, it would be an interesting test case to add. |
One issue that arises as you zoom out is many bases land on the same pixel. At 100kb approximately 100 bases / pixel. For typical reads this means that nearly every pixel of the alignment will have a mismatch, often multiple mismatches. We might need some user options for how to handle this. |
yes, I did not intend to use this capability for PacBio reads or ONT reads, but rather genomes with relatively few differences. So a "genome" mode would be ideal. I can send you an example public dataset. Of course, the user needs to do some upfront work to create the ideal input data, but reducing each genome to 1x is extremely powerful, rather than 30x PacBio, and the only practical way to view a large pangenome. |
A public dataset would be helpful. |
There will still be limits on zoom out as at a minimum the sequence for the entire region needs to be loaded, not to mention the read sequence in every alignment. We could not view an entire chromosome with read sequences for example. |
Admittedly, this will probably only work well for low diversity applications like my initial request. In that case there are only ~13 SNPs in a handful of genomes in a 140 kb range, which was just out of visibility limit, so I was hoping for way to crank up the SNP visibility, but that wouldn't make sense if there was a ton of variation- which is often the case for plant pangenomes. It seems that 114 kb is the visibility max for SNPs, but INDELs are visible at much wider ranges. But this real world example from the maize pangenome probably has too many SNPs to display nicely at wider-ranges, but in some specific cases it would still be useful |
here genomes were aligned in 100 kb consecutive chunks |
Here is a test dataset of mock data, with a few SNPs over 245 kb |
minimap2 -ax map-hifi -t4 test.fasta 10genomes.fasta | samtools view -b -1 - | samtools sort --write-index -o 10genomes.bam |
So basically trying to use IGV has a haplotype-viewer |
I've never used minimap2 but that's o.k. I think the simplest resolution of this issue would be to just make the max window for showing mismatches user settable, probably as a preference. A new display mode is a bigger topic that deserves its own issue, and would be longer term and prioritized vs other bigger topics. I will also make snp display subject to the limit. BTW currently the limit is not on the genomic window, which can vary by display size, but on the resolution in bp / pixel |
sounds great. thanks! |
Related question: If loading IGV with more than 100 genomes (wholge genome alignment by minimap2 -x asm20), the speed would be very slow. If there any way to speed it up? |
Rather then performing whole-genome alignments, I typically align consecutively 10kb chunked genomes, which is faster for alignment and the alignments can be toggled by mapping quality and alignment score. If you add the genome name to the read group when running minimap2 and merge the resulting bam files, 100 genomes is essentially the same as 100x Illumina coverage and is quite rapid to view in IGV. |
coloring and grouping by read group is key |
finally, if you number the chunks consecutively you know exactly where it came from in the query- which is much better than using kmers or other methods where coordinates are lost. Then you know you are looking at syntenic alignments when you mouse over a chunk and see it's chunk# (position) is similar to reference |
Thanks for sharing! Chunking could be a good idea, but this also lose the abiltiy to detect the variation longer than chunk length or introduce ambiguous alignment (TEs). It more like chain by yourself as you know the coordinates. I think it would be better if IGV use chunk in the browser but with more contiguous alignments. Actually, I use AnchorWave and wfmash more often, whihc nearly produce end-to-end alignment in A.thaliana (easier than maize). For the alternative approach other than IGV, I use https://github.com/cmdcolin/jbrowse-plugin-mafviewer for convert my paf to pseduomaf (which only can present SNPs or DEL) |
you can use whatever chunk size you want for a given application depending on the level of similarity in the pangenome, typically 1-100 kb (aligned with map-hifi). With that you can see quite large INDELs. Again, you can control what is displayed by changing the visualization parameters in IGV more so than with whole-genome alignments. Also, the directionality of chunks is indicative of inversions. For major differences the lack of an aligned chunk also informative. so the real beauty of the chunked alignment approach is that is highly parallelizable and rapid. One can do an all-by-all comparison in minutes, so that all/any reference(s) can be viewed in IGV with all queries on a whim. If you want to get fancy you can group your queries into various sub-groups, rather than 1 big one. |
Is there a way to increase the visibility of SNPs at wider views? In the example below, I can see SNP differences between alignments in a 56 kb window, but not in a 140 kb window, which encompasses the entire region I want to display.
The text was updated successfully, but these errors were encountered: