Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phase #53

Open
duhuipeng opened this issue Jun 26, 2021 · 9 comments
Open

Phase #53

duhuipeng opened this issue Jun 26, 2021 · 9 comments
Labels
documentation Improvements or additions to documentation

Comments

@duhuipeng
Copy link

duhuipeng commented Jun 26, 2021

Dear author
I would like to consult where the color is inconsistent in the picture, (the changing contig),What is the generating file that lets me know which contig it is?
image
image
I personally feel, looking from the picture, this switch error is somewhat serious
Last question
Why in the generated switches.txt,Show only the switch error results of one phase, where the other phase is to see?
image

@arangrhie
Copy link
Contributor

Hi @duhuipeng ,

  1. The changing color
    What you are seeing is the phase block.
    You can track the switched blocks from *.phased_block.bed.
    Each column in phased_block.bed is:
<scaff> <start> <end> <haplotype> <switch_error> <num. switches> <num. markers>

So in your case, search for Mat_hap2 in the 4th column. That is the locus where it got picked up as hap2.

  1. N* plot
    The N* plot is showing the phase block N* value relative to the conting N* value. The closer the block line (red) goes to contigs (blue), it means the phase block size is closer to the contig size. I think yours is very well phased. You can see how it looks on the absolute relativeness compared to the expected assembly size; the NG* scale.
    Use Rscript $MERQURY/plot/plot_block_N.R --help to see how to generate the NG* plots.

  2. switch error rate in one file?
    You can find both switch error rates with

cat *.phased_block.stats

See https://github.com/marbl/merqury/wiki/3.-Phasing-assessment-with-hap-mers#5-phased-block-statistics-and-switch-error-rates for more details.

@arangrhie arangrhie added the documentation Improvements or additions to documentation label Jun 26, 2021
@duhuipeng
Copy link
Author

Dear author
Thanks for your reply
According to your advices, I looked at the phased_block.bed file,But I now have a point not to understand ,Hope for your help(as follow)
image
I found hap2 in hap1 but it switch_error and num switches are 0 but why would it divide called hap2?
It is based on what divides the hap2, instead of the hap1?

@arangrhie
Copy link
Contributor

The last column here are the num. kmers found from the haplotype defined as in col. 4. So in your example, 1582 hap2 kmers were found, with no hap1 kmers, and 2234 hap2's, with no hap1s.
It looks to me the phasing did failed on these smaller pieces and took the wrong path.

-Arang

@duhuipeng
Copy link
Author

Dear author
I still don't understand some, according to your said
The num. markers is the total number of people from the hap2 and not the hap1
But why is the num. switches a 0?
Does num. switches refer to the number from hap1 when the num. switches is 0? If so, I would understand .
But another point that I don't understand came out again:

If hap2, appears in hap1 and num. switches is 0, then in theory, these hap2 should appear in the second phase, not in the first
Do you feel I understand that right?
What is the cause for this?

@arangrhie
Copy link
Contributor

arangrhie commented Jun 26, 2021

Merqury is defining a phase block based on how many hap1 or hap2 kmers are found.
Once the conditions are met to be decided that a block belongs to hap1, the block becomes 'hap1 block' (col. 4).
The switches are defined from the occurrence of the other haplotype's kmer.
So, if a block was defined as 'hap1', the last col. shows how many kmers were seen from 'hap1'.
If there were any kmers found from 'hap2', then that will become a 'switch'.

Note the switch here is defined based on a phased block, not the assembly. In a lot of assemblies, especially pseudo-haplotype assemblies, it is hard to make an assumption that a given fasta was assembled from one haplotype. Therefore, Merqury is reporting switch error rate based on what has been defined as a phase block, given how many haplotype-specific kmers are seen.

As your assembly seems already completely phased, it may be an interest to report the hamming error rate, which measures the occurrence of the k-mers from the unexpected haplotype to the other in the entire assembly, not the phase blocks.

It's easy to calculate that from *.hapmers.count.
Each columns are:

<fasta> <scaffold/contig> <hap1> <hap2> <total>

So you can grep all lines from the hap1 assembly (given the fasta name in the 1st col.) and count how many switches occurred with SUM(hap2) / SUM(total).
For example, if you were running merqury.sh with merqury.sh read.meryl mat.hapmers.meryl pat.hapmers.meryl mat.fasta pat.fasta output,

awk '$1=="mat" {mat_kmer+=$3; pat_kmer+=$4; total+=$NF} END {print mat_kmer"\t"pat_kmer"\t"total"\t"pat_kmer/total}' *.hapmers.count

Will give you maternal-mers paternal-mers total-mers switch-rate.

In most cases though, I feel per-block switch error rate is more important. As shown in your case, there were only two small blocks that were entirely switched. The hamming error rate does not account for 'where' the switches are. It could be more problematic when it is spread all over the assembly versus locally happening in a block. If it's locally happening, as a block, the chance of corrupting a gene from chimeric haplotype joins goes down.

@duhuipeng
Copy link
Author

Dear author
I still don't quite understand the swtich error in this result(as follow),This is the result of mat phase
For example,
image,By truth, it belongs to the mother phase, the hap1
Some of it contig were divided to hap2, so at this time should be a switch,
in the 1 ,The num. markers of the contig is 1,175,The num. markers of this contig is 1175, this 1175 should belong to hap2, and at this time 59 belongs to hap1, so, 59 / 1175 is it switcher error,
Similarly, 2 It is also very good to understand this
But 3 doesn't understand .in the 3,The num. markers of the contig is 68,this 68 should belong to hap2,But the num. switches is 0,That is, his switch error should be 0,
Why did this fragment of contig appear in hap1, not in hap2? Similarly, the pat phase is the same result
image

@arangrhie
Copy link
Contributor

What you understood is correct.

Why did this fragment of contig appear in hap1, not in hap2?

What you are asking is beyond what Merqury does.
It could be a base error that accidentally overlapped with an actual haplotype specific kmer. However, the chance for this is very low.
Alternatively, it could be a phasing error in the assembly method. Ask the developers of the assembler used why you see these haplotype switches.

Merqury is a tool designed to detect these kinds of switch errors and helps you localize where the error is.

@duhuipeng
Copy link
Author

I got it
Thanks

1 similar comment
@duhuipeng
Copy link
Author

I got it
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants