Complete T2T reconstruction of a human genome. Changes from v1.0 include filled rDNA gaps and improved polishing within telomeres. One rare heterozygous variant causing a premature stop codon was changed at chr9:134589924 to the more common allele. Also available at NCBI GCA_009914755.3. Changes made from v1.0 to v1.1 are available as a VCF.
Complete T2T reconstruction of a human genome, with the exception of 5 known gaps within the rDNA arrays. Polished assembly based on v0.9. Introduces 4 structural corrections and 993 small variant corrections, including a 4 kb telomere extension on chr18. Polishing was performed using a conservative custom pipeline based on DeepVariant calls and structural corrections were manually curated. Consensus quality exceeds Q60. Prior to a preprint being drafted, a brief summary can be found at this blog post. Also available at NCBI GCA_009914755.2. Changes made from v0.9 to v1.0 are available as a VCF.
T2T reconstruction of all 23 chromosomes of CHM13 based on a custom assembly pipeline, briefly featuring:
- Homopolymer-compression and self-correction of Pacbio HiFi reads
- Rescoring of overlaps to account for recurrent Pacbio HiFi errors
- Construction and custom pruning of a string graph built over 100% identical overlaps
- Manual reconstruction on chromosomal paths through the graph, if necessary aided by ultra-long Nanopore reads
- Layout/consensus of original HiFi reads, corresponding to the resulting paths
- Patching of regions absent from HiFi data with v0.7 draft sequences
Consensus quality exceeds Q60. Mitochondrial sequence DNA included. Centers of the 5 rDNA arrays are represented by N-gaps.
Assembly draft v0.7 was generated with Canu v1.7.1 including rel1 data up to 2018/11/15 and incorporating the previously released PacBio data. Two gaps on the X plus the centromere were manually resolved. Contigs with low coverage support were split and the assembly was scaffolded with BioNano. The assembly was polished with two rounds of nanopolish and two rounds of arrow. The X polishing was done using unique markers matched between the assembly and the raw read data, the rest of the genome used traditional polishing. Finally, the assembly was polished with 10X Genomics data. We validated the assembly using independent BACs. The overall QV is estimated to be Q37 (Q42 in unique regions) and the assembly resolves over 80% of available CHM13 BACs (280/341). The assembly is 2.94 Gbp in size with 359 scaffolds (448 contigs) and an NG50 of 83 Mbp (70 Mbp). Outside of Chr8 and ChrX, this should be considered a draft and likely has mis-assemblies. Older unpolished assemblies are available for benchmarking purposes, but are of lower quality and should not be used for analyses. Also available at NCBI GCA_009914755.1.
- Assembly draft v1.1 (md5: 1cab2b2776005cdf339ec9f283ba2c70)
- Annotation from CAT and Liftoff
- annotation gff3 file (md5: 14865ece7fe6367b8e2b06776a3d522f)
- Telomere identified by the VGP pipeline
- telomere bed file (md5: d6b148d16bf303e25552e381cddff9df)
- Liftover from v1.0 to v1.1
- chain file (md5: 804d2a81dbf79199fa637f6bbed9a1a8)
- Liftover from v1.1 to v1.0
- chain file (md5: 03180ca0210957e85affc72bb7083b2b)
- Alignments (the index bai file is available under the same name as the bam with .bai appended (e.g. chm13.draft_v1.1.hifi_20k.wm_2.0.1.pri.bam has a chm13.draft_v1.1.hifi_20k.wm_2.0.1.pri.bam.bai)
- PacBio HiFi alignments (generated via Winnowmap v2.01 -x map-pb) (md5: ab6b38cb00efa919f6d93bc89787a121)
- Oxford nanopore Guppy alignments (generated via Winnowmap v2.01 -x map-ont) (md5: 5cb543ac85513995893015a3709806f4)
- PCRFree Illumina alignments (generated via bwa mem v0.7.15) (md5: bb41008d0f5de787d26896fb49027420)
- Annotation from CAT and Liftoff
- Assembly draft v1.0 (md5: 6d827b6512562630137008830c46e1ac)
- Annotation from CAT and Liftoff
- annotation gff3 file (md5: a39f18f553d5a426eaef9cfd4f858bf6)
- Telomere identified by the VGP pipeline
- telomere bed file (md5: 5cdca0c8b563b87f7a624d61ae0b5497)
- Liftover from hg38 to v1.0 (all files from UCSC Genome Browser)
- chain file (md5: ade08feeb01b75644cb1da383ebaa607)
- Liftover from v1.0 to hg38
- chain file (md5: 9edff5e020cc3f170350ff78fbe01d5c)
- Alignments (the index bai file is available under the same name as the bam with .bai appended (e.g. chm13.draft_v1.0.wm_2.01.hifi.pri.bam has a chm13.draft_v1.0.wm_2.01.hifi.pri.bam.bai)
- PacBio CLR alignments (generated via Winnowmap v2.01 -x map-pb-clr) (md5: 235e23c72676279714a091fb226f3b1a)
- PacBio HiFi alignments (generated via Winnowmap v2.01 -x map-pb) (md5: 2380bee4c3544d179b51cf22846e33ab)
- Oxford nanopore Guppy alignments (generated via Winnowmap v2.01 -x map-ont) (md5: 5a012ae791f48678b829da6770216f5d)
- Oxford nanopore Bonito alignments (generated via Winnowmap v2.01 -x map-ont) (md5: 84b0b9d5935140ead1d032b0a1610c39)
- PCRFree Illumina alignments (generated via bwa mem v0.7.15) (md5: 9143c6d6dc3e8f537c49f43f9e6cbedd)
- Annotation from CAT and Liftoff
- Assembly draft v0.9 (md5: 05fd40ffc5d68a9b6754773a56381db8)
- Regions patched by non-HiFi data & rDNA loci (md5: a754f98d5e960b3d1e9029cba4414cf2)
- v0.9 assembly graph in GFA format (built over homopolymer-compressed HiFi reads) (md5: df2218db9ebbcd239d07d2544372cfa5)
- Consensus sequences for individual nodes of the v0.9 assembly graph (since the sequence is not homopolymer compressed, the lengths and overlap sizes will not match the GFA!) (md5: 086d3d968b2c8cbc8c4be891e56ad177)
- Genomic paths through the v0.9 graph (part of chr9 was reconstructed by a different assembly method excluded) (md5: 913205d75f5f9c49e5269eb4363fbf16)
- Alignments (the index bai file is available under the same name as the bam with .bai appended (e.g. chm13.draft_v0.9.clr.bam has a chm13.draft_v0.9.clr.bam.bai)
- PacBio CLR alignments (generated via Winnowmap v1.11 -x map-pb-clr) (md5: 7cd9c812e4398db6ed318969fe7080f9)
- PacBio HiFi alignments (generated via Winnowmap v1.11 -x map-pb) (md5: 7527b44aba07d9acbed597fbc445b61a)
- Oxford nanopore alignments (generated via Winnowmap v1.11 -x map-ont) (md5: 4a5bbf70193e65c35a287a70099bb99c)
- PCRFree Illumina alignments (generated via bwa mem v0.7.15) (md5: 7c13fd36ae404eb41697ec5d54ba608f)
- Chromosome X v0.7 (md5: 89b3dd61db66177dd830527b920956fa)
- Chromosome X v0.7 Nanopore rel1 unique k-mer anchored mappings (md5: ada12a00d4781f6b0101a09be19abe93)
- Chromosome X v0.7 PacBio HiFi unique k-mer anchored mappings (md5: bd22daaf6d4a2cd775f109a853a911a9)
- Chromosome X v0.7 PacBio CLR unique k-mer anchored mappings (md5: 69be7bd105ee590bf57853c249e1f8d8)
- Chromosome 8 v9 (md5: cc33037728ab1f743d3e79f85e8c10ac)
- Chromosome 8 v9 Nanopore rel5 unique k-mer anchored mappings (md5: e953525b097c98d8485a3a7b152da897)
- Assembly draft v0.7 (md5: b9777540aaa0251c7dbb4974fb0a69d6)
- Assembly draft v0.6 (md5: c3e3318e82ba5dc64b74f458f4989b85)
- Assembly draft v0.4 (md5: 7e3c2fff9479ba45f7916fa1eee1310b)
- HG002 chrX draft v0.7 (not T2T, missing p-arm PAR region) (md5: 1d79ac022424fc5671135e2ac362d91d)