Don't have maternal and paternal sequence data, how to build hap-kmer #111

socialhang · 2023-09-03T13:39:17Z

Hi,

Due to the nature of our system we are unable to acquire the parents of our sequenced individual, we sequencing a lot of children data.
What should I do to build hap-kmer using these children NGS data?

best wishes!
hang

arangrhie · 2023-10-02T19:38:21Z

Hello,

This is an interesting question, not so easy to answer though.

Using imputed phased variants
If you have a collapsed assembly from the child (or children), you could potentially re-map the reads to the assembly (assuming it's nicely polished as well, and near-T2T), and phase the variants called from short-reads using long-reads (HiFi or ONT) or long-range reads (Strand-Seq or HiC). There are several tools that help doing the phasing part.
Once the het variants are phased, you could make two haplotype consensuses using bcftools consensus.

Then, you could collect kmers from each haplotype assembly, and use the $MERQURY/trio/hapmer.sh with -no-filter option to get pseudo-haplotype specific kmers and use that to evaluate the original collapsed assembly.

It's not guaranteed that a 'hapmer' is consistent across all chromosomes though, and likely be inconsistent between chromosomal arms if lacking long-range information. Lack of HiC coverage could miss a few true heterozygous variants, especially those between blocks in long stretches of homozygosity.
Also, there are so many assumptions made here. There may be regions where it is not easy to map with short-reads, therefore the variant calling accuracy will be poor in those region. I'd only use the phase block information and switch error rate results form Merqury, and not put too much on the blob-plots.

If you already have a diploid assembly, it will be counter intuitive to re-use the assembly specific kmers for evaluation, so unfortunately no, I don't have a good answer.

Using population specific kmers
If the parents come from different populations (i.e. subsepcies), and there are some genome sequencing reads available for those, you could try using those population data kmers as the 'parental' kmer db. Run $MERQURY/trio/hapmers.sh with each kmers from the population merged with meryl union-sum.
Sex chromosome limited kmers
If your genome follows XY or WZ system, you could use the homogametic (e.g. XX) vs. heterogametic (e.g. XY) samples to extract kmers that are only present in the XY samples. This is not perfect, nor ideal to evaluate the whole genome, but should allow to get an estimate of the Y (or W) specific sequences.

Also linking #6

Best,
Arang

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't have maternal and paternal sequence data, how to build hap-kmer #111

Don't have maternal and paternal sequence data, how to build hap-kmer #111

socialhang commented Sep 3, 2023

arangrhie commented Oct 2, 2023

Don't have maternal and paternal sequence data, how to build hap-kmer #111

Don't have maternal and paternal sequence data, how to build hap-kmer #111

Comments

socialhang commented Sep 3, 2023

arangrhie commented Oct 2, 2023