How to get haplotype-specific reads? #52

DayTimeMouse · 2024-09-26T11:42:36Z

Hi,

Thanks for developing this awesome tools. I'd like to get haplotype-specific HiFi reads via haplotype-specific kmers.

I run script to get haplotype-specific kmers, meryl version is 1.3:
meryl count k=21 hap1.fa output hap1.meryl
meryl count k=21 hap2.fa output hap2.meryl
meryl difference hap1.meryl hap2.meryl output hap1.only.meryl
meryl difference hap2.meryl hap1.meryl output hap2.only.meryl
meryl intersect hap1.meryl hap2.meryl output hap1_hap2_intersect.meryl

get read-meryl:
meryl count k=21 hifi_reads.fq.gz output hifi.meryl

I want to partition HiFi reads(each read) into two haplotypes. Now, I do not know how to do next? meryl can do this? Or, do you have some ideas for this?

Thanks in advance.

The text was updated successfully, but these errors were encountered:

arangrhie · 2024-09-26T15:55:10Z

Hi @DayTimeMouse,

One fundamental question is, how confident you are with your hap1.fa and hap2.fa.
Looks like your hap1.meryl and hap2.meryl are reflecting what's unique in each assembly, not the biologically inherited parental specific kmers.

Anyways, you could try meryl-lookup -existence -sequence hifi_reads.fq.gz -mers hap1.meryl hap2.meryl > hifi_reads.hapmer.count.
This will provide count of kmers shared (found) in each read sequence.

Small explanation from the help message:

...
     The output file has format:
         <sequence_name> <tab> <kmers_in_sequence> <tab> <kmers_in_db> <tab> <kmers_shared> [...]

     With one input database:
         sequence1 <tab> 8415 <tab> 12856825 <tab> 8145

     With two input databases:
         sequence1 <tab> 8415 <tab> 12856825 <tab> 8145 <tab> 575757256 <tab> 8354

You could bin the reads based on the hap1 and hap2 mers found in each fastq sequence.
Usually, it's expected to find kmers from one haplotype and a few from the other haplotype, due to possible noises in the reads. There may be reads with no hapmers if the two haplotypes share a large homozygous block.
If there are any switch errors in the assembly, you'd see reads with substantial kmers with both hap kmers.

Alternate option is to use trio-binning implemented in Canu, but this might fit better if you are actually using hapmers inferred from the parental genomes as it normalizes based on the kmers found in the dbs.

/path/to/canu/build/bin/splitHaplotype \
 -cl 1000 -memory 48 -threads 24 \
 -H ../hap1.meryl 1 hap1.fasta.gz \
 -H ../hap2.meryl 1 hap2.fasta.gz \
 -A ./unk.fasta.gz \
 hifi_reads.fq.gz

-cl 1000 will filter out reads < 1kb.

DayTimeMouse · 2024-09-27T12:00:55Z

But I do not have real parental genomes, I used hifiasm(HiFi + ONT +HiC) to get hap1.fa and hap2.fa.
In this case, do you still recommand me to use Canu(trio-binning splitHaplotype)? Canu(trio-binning) and count hapmers to partition reads via meryl-lookup, which method is fit better for me?

arangrhie · 2024-09-27T17:20:45Z

Then using meryl-lookup might work the best. Use meryl v1.4.1 by the way, there were some improvements / fixes made particularly in meryl-lookup.

DayTimeMouse · 2024-09-28T12:29:51Z

Many thanks!

DayTimeMouse · 2024-10-20T10:46:33Z

Hi，

I have another question. How to get haplotype-specific HiC reads(paired-end)? Using haplotype-specific kmers?

Best regards.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get haplotype-specific reads? #52

How to get haplotype-specific reads? #52

DayTimeMouse commented Sep 26, 2024 •

edited

Loading

arangrhie commented Sep 26, 2024

DayTimeMouse commented Sep 27, 2024

arangrhie commented Sep 27, 2024

DayTimeMouse commented Sep 28, 2024

DayTimeMouse commented Oct 20, 2024

How to get haplotype-specific reads? #52

How to get haplotype-specific reads? #52

Comments

DayTimeMouse commented Sep 26, 2024 • edited Loading

arangrhie commented Sep 26, 2024

DayTimeMouse commented Sep 27, 2024

arangrhie commented Sep 27, 2024

DayTimeMouse commented Sep 28, 2024

DayTimeMouse commented Oct 20, 2024

DayTimeMouse commented Sep 26, 2024 •

edited

Loading