Skip to content

Commit 6060978

Browse files
committed
Warn against using 'ovl' for correction; remove 'overlapper=' parameter alias. Issue 1924.
1 parent e925d5f commit 6060978

File tree

5 files changed

+68
-28
lines changed

5 files changed

+68
-28
lines changed

documentation/source/parameter-reference.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -440,7 +440,7 @@ Overlapper Configuration, ovl Algorithm
440440
* :ref:`obtOvlErrorRate <obtOvlErrorRate>` applied to overlaps generated for trimming reads;
441441
* :ref:`utgOvlErrorRate <utgOvlErrorRate>` applies to overlaps generated for assembling reads.
442442
These limits apply to the 'ovl' overlap algorithm and when alignments are computed for mhap
443-
overlaps with :ref:`mhapReAlign <mhapReAlign>`.
443+
overlaps with :ref:`reAlign <reAlign>`.
444444

445445
{prefix}OvlFrequentMers <string=undefined>
446446
Do not seed overlaps with these kmers, or, for mhap, do not seed with these kmers unless necessary (down-weight them).

documentation/source/tutorial.rst

Lines changed: 27 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -66,8 +66,7 @@ way becomes a 'library' of reads. The reads should have been (physically) gener
6666
time using the same steps, but perhaps sequenced in multiple batches. In canu, each library has a
6767
set of options setting various algorithmic parameters, for example, how aggressively to trim. To
6868
explicitly set library parameters, a text 'gkp' file describing the library and the input files must
69-
be created. Don't worry too much about this yet, it's an advanced feature, fully described in
70-
Section :ref:`gkp-files`.
69+
be created. Don't worry too much about this; it's an advanced feature and not described anywhere.
7170

7271
The read-files contain sequence data in either FASTA or FASTQ format (or both! A quirk of the
7372
implementation allows files that contain both FASTA and FASTQ format reads). The files can be
@@ -139,7 +138,7 @@ The tags are:
139138
|utgovl | the standard overlapper, as used in the assembly phase |
140139
+--------+-------------------------------------------------------------------+
141140
+--------+-------------------------------------------------------------------+
142-
|mhap | the mhap overlapper |
141+
|mhap | the `mhap <https://github.com/marbl/MHAP>`_ overlapper |
143142
+--------+-------------------------------------------------------------------+
144143
|cormhap | the mhap overlapper, as used in the correction phase |
145144
+--------+-------------------------------------------------------------------+
@@ -148,13 +147,13 @@ The tags are:
148147
|utgmhap | the mhap overlapper, as used in the assembly phase |
149148
+--------+-------------------------------------------------------------------+
150149
+--------+-------------------------------------------------------------------+
151-
|mmap | the `minimap <https://github.com/lh3/minimap>`_ overlapper |
150+
|mmap | the `minimap2 <https://github.com/lh3/minimap2>`_ overlapper |
152151
+--------+-------------------------------------------------------------------+
153-
|cormmap | the minimap overlapper, as used in the correction phase |
152+
|cormmap | the minimap2 overlapper, as used in the correction phase |
154153
+--------+-------------------------------------------------------------------+
155-
|obtmmap | the minimap overlapper, as used in the trimming phase |
154+
|obtmmap | the minimap2 overlapper, as used in the trimming phase |
156155
+--------+-------------------------------------------------------------------+
157-
|utgmmap | the minimap overlapper, as used in the assembly phase |
156+
|utgmmap | the minimap2 overlapper, as used in the assembly phase |
158157
+--------+-------------------------------------------------------------------+
159158
+--------+-------------------------------------------------------------------+
160159
|ovb | the bucketizing phase of overlap store building |
@@ -269,7 +268,7 @@ correctedErrorRate 0.045 0.144
269268
In practice, only :ref:`correctedErrorRate <correctedErrorRate>` is usually changed. The :ref:`faq`
270269
has :ref:`specific suggestions <tweak>` on when to change this.
271270

272-
Canu v1.4 and earlier used the :ref:`errorRate <errorRate>` parameter, which set the expected
271+
Canu v1.4 and earlier used the ``errorRate`` parameter, which set the expected
273272
rate of error in a single corrected read.
274273

275274
.. _minimum-lengths:
@@ -302,14 +301,29 @@ For example:
302301
- To change the k-mer size for just the ovl overlapper used during correction, 'corMerSize=16' would be used.
303302
- To change the mhap k-mer size for all instances, 'mhapMerSize=18' would be used.
304303
- To change the mhap k-mer size just during correction, 'corMhapMerSize=15' would be used.
305-
- To use minimap for overlap computation just during correction, 'corOverlapper=minimap' would be used. The minimap2 executable must be symlinked from the Canu binary folder ('Linux-amd64/bin' or 'Darwin-amd64/bin' depending on your system).
304+
- To use minimap2 for overlap computation just during correction, 'corOverlapper=minimap' would be used. The minimap2 executable must be symlinked from the Canu binary folder ('Linux-amd64/bin' or 'Darwin-amd64/bin' depending on your system).
306305

307306
Ovl Overlapper Configuration
308307
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
309308

310309
<tag>Overlapper
311-
select the overlap algorithm to use, 'ovl' or 'mhap'.
312-
310+
select the overlap algorithm to use, 'ovl', 'mhap' or 'minimap'.
311+
312+
For correction:
313+
* ``corOverlapper=ovl`` is not supported and is very very slow.
314+
* ``corOverlapper=mhap`` is the default.
315+
* ``corOverlapper=minimap`` is not widely tested.
316+
317+
For trimming:
318+
* ``obtOverlapper=ovl`` is the default.
319+
* ``obtOverlapper=mhap`` is significantly faster, but assemblies are slightly worse.
320+
* ``obtOverlapper=minimap`` is significantly faster, but assemblies are slightly worse and has not been widely tested.
321+
322+
For contig creation (unitigging):
323+
* ``utgOverlapper=ovl`` is the default.
324+
* ``utgOverlapper=mhap`` is significantly faster, but assemblies are slightly worse.
325+
* ``utgOverlapper=minimap`` is significantly faster, but assemblies are slightly worse and has not been widely tested.
326+
313327
Ovl Overlapper Parameters
314328
~~~~~~~~~~~~~~~~~~~~~~~~~~~
315329

@@ -366,6 +380,8 @@ and 8 'distinct' kmers.
366380
<tag>FrequentMers
367381
don't compute frequent kmers, use those listed in this file
368382

383+
.. _reAlign:
384+
369385
Mhap Overlapper Parameters
370386
~~~~~~~~~~~~~~~~~~~~~~~~~~
371387

src/pipelines/canu.pl

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -706,6 +706,20 @@
706706
printf STDERR "-- utgErrorRate %6.4f (%6.2f%%)\n", getGlobal("utgErrorRate"), getGlobal("utgErrorRate") * 100.0;
707707
printf STDERR "-- cnsErrorRate %6.4f (%6.2f%%)\n", getGlobal("cnsErrorRate"), getGlobal("cnsErrorRate") * 100.0;
708708
print STDERR "--\n";
709+
if (getGlobal("corOverlapper") eq "ovl") {
710+
print STDERR "-- WARNING--DO-NOT-USE--DO-NOT-USE--DO-NOT-USE--DO-NOT-USE--WARNING\n";
711+
print STDERR "-- WARNING WARNING\n";
712+
print STDERR "-- WARNING corOverlapper=ovl is NOT SUPPORTED WARNING\n";
713+
print STDERR "-- WARNING is MISCONFIGURED WARNING\n";
714+
print STDERR "-- WARNING is LUDICROUSLY SLOW WARNING\n";
715+
print STDERR "-- WARNING has LOUSY SENSITIVITY WARNING\n";
716+
print STDERR "-- WARNING and WORSE SPECIFICITY WARNING\n";
717+
print STDERR "-- WARNING WARNING\n";
718+
print STDERR "-- WARNING USE THE DEFAULT corOverlapper=mhap INSTEAD WARNING\n";
719+
print STDERR "-- WARNING WARNING\n";
720+
print STDERR "-- WARNING--DO-NOT-USE--DO-NOT-USE--DO-NOT-USE--DO-NOT-USE--WARNING\n";
721+
print STDERR "--\n"; sleep(10);
722+
}
709723
print STDERR "-- Stages to run:\n";
710724
print STDERR "-- separate reads into haplotypes.\n" if (($mode eq "run") && (scalar(keys %haplotypeReads) > 0));
711725
print STDERR "-- correct raw reads.\n" if (($mode eq "run"));

src/pipelines/canu/Configure.pm

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -581,10 +581,10 @@ sub configureAssembler ($$$) {
581581
my $hx = 1.25 * 1000000;
582582

583583
if (getGlobal("genomeSize") < adjustGenomeSize("40m")) {
584-
setGlobalIfUndef("corOvlHashBlockLength", 2500000); setGlobalIfUndef("obtOvlHashBlockLength", 64 * $hx); setGlobalIfUndef("utgOvlHashBlockLength", 64 * $hx);
585-
setGlobalIfUndef("corOvlRefBlockLength", 2000000); setGlobalIfUndef("obtOvlRefBlockLength", 1000000000); setGlobalIfUndef("utgOvlRefBlockLength", 1000000000); # 1 Gbp
584+
setGlobalIfUndef("corOvlHashBlockLength", 250000000); setGlobalIfUndef("obtOvlHashBlockLength", 64 * $hx); setGlobalIfUndef("utgOvlHashBlockLength", 64 * $hx);
585+
setGlobalIfUndef("corOvlRefBlockLength", 250000000); setGlobalIfUndef("obtOvlRefBlockLength", 1000000000); setGlobalIfUndef("utgOvlRefBlockLength", 1000000000); # 1 Gbp
586586

587-
setGlobalIfUndef("corOvlMemory", "2"); setGlobalIfUndef("corOvlThreads", "1"); setGlobalIfUndef("corOvlHashBits", 22);
587+
setGlobalIfUndef("corOvlMemory", "24"); setGlobalIfUndef("corOvlThreads", "4-16"); setGlobalIfUndef("corOvlHashBits", 25);
588588
setGlobalIfUndef("obtOvlMemory", "4"); setGlobalIfUndef("obtOvlThreads", "2-8"); setGlobalIfUndef("obtOvlHashBits", 22);
589589
setGlobalIfUndef("utgOvlMemory", "4"); setGlobalIfUndef("utgOvlThreads", "2-8"); setGlobalIfUndef("utgOvlHashBits", 22);
590590

@@ -597,10 +597,10 @@ sub configureAssembler ($$$) {
597597
setGlobalIfUndef("utgMMapMemory", "4-6"); setGlobalIfUndef("utgMMapThreads", "1-16");
598598

599599
} elsif (getGlobal("genomeSize") < adjustGenomeSize("500m")) {
600-
setGlobalIfUndef("corOvlHashBlockLength", 2500000); setGlobalIfUndef("obtOvlHashBlockLength", 128 * $hx); setGlobalIfUndef("utgOvlHashBlockLength", 128 * $hx);
601-
setGlobalIfUndef("corOvlRefBlockLength", 2000000); setGlobalIfUndef("obtOvlRefBlockLength", 5000000000); setGlobalIfUndef("utgOvlRefBlockLength", 5000000000); # 5 Gbp
600+
setGlobalIfUndef("corOvlHashBlockLength", 250000000); setGlobalIfUndef("obtOvlHashBlockLength", 128 * $hx); setGlobalIfUndef("utgOvlHashBlockLength", 128 * $hx);
601+
setGlobalIfUndef("corOvlRefBlockLength", 1000000000); setGlobalIfUndef("obtOvlRefBlockLength", 5000000000); setGlobalIfUndef("utgOvlRefBlockLength", 5000000000); # 5 Gbp
602602

603-
setGlobalIfUndef("corOvlMemory", "2"); setGlobalIfUndef("corOvlThreads", "1"); setGlobalIfUndef("corOvlHashBits", 23);
603+
setGlobalIfUndef("corOvlMemory", "24"); setGlobalIfUndef("corOvlThreads", "4-16"); setGlobalIfUndef("corOvlHashBits", 25);
604604
setGlobalIfUndef("obtOvlMemory", "8"); setGlobalIfUndef("obtOvlThreads", "2-8"); setGlobalIfUndef("obtOvlHashBits", 23);
605605
setGlobalIfUndef("utgOvlMemory", "8"); setGlobalIfUndef("utgOvlThreads", "2-8"); setGlobalIfUndef("utgOvlHashBits", 23);
606606

@@ -613,10 +613,10 @@ sub configureAssembler ($$$) {
613613
setGlobalIfUndef("utgMMapMemory", "8-13"); setGlobalIfUndef("utgMMapThreads", "1-16");
614614

615615
} elsif (getGlobal("genomeSize") < adjustGenomeSize("2g")) {
616-
setGlobalIfUndef("corOvlHashBlockLength", 2500000); setGlobalIfUndef("obtOvlHashBlockLength", 256 * $hx); setGlobalIfUndef("utgOvlHashBlockLength", 256 * $hx);
617-
setGlobalIfUndef("corOvlRefBlockLength", 2000000); setGlobalIfUndef("obtOvlRefBlockLength", 15000000000); setGlobalIfUndef("utgOvlRefBlockLength", 15000000000); # 15 Gbp
616+
setGlobalIfUndef("corOvlHashBlockLength", 500000000); setGlobalIfUndef("obtOvlHashBlockLength", 256 * $hx); setGlobalIfUndef("utgOvlHashBlockLength", 256 * $hx);
617+
setGlobalIfUndef("corOvlRefBlockLength", 4000000000); setGlobalIfUndef("obtOvlRefBlockLength", 15000000000); setGlobalIfUndef("utgOvlRefBlockLength", 15000000000); # 15 Gbp
618618

619-
setGlobalIfUndef("corOvlMemory", "8"); setGlobalIfUndef("corOvlThreads", "1"); setGlobalIfUndef("corOvlHashBits", 24);
619+
setGlobalIfUndef("corOvlMemory", "24"); setGlobalIfUndef("corOvlThreads", "4-16"); setGlobalIfUndef("corOvlHashBits", 26);
620620
setGlobalIfUndef("obtOvlMemory", "16"); setGlobalIfUndef("obtOvlThreads", "4-16"); setGlobalIfUndef("obtOvlHashBits", 24);
621621
setGlobalIfUndef("utgOvlMemory", "16"); setGlobalIfUndef("utgOvlThreads", "4-16"); setGlobalIfUndef("utgOvlHashBits", 24);
622622

@@ -629,10 +629,10 @@ sub configureAssembler ($$$) {
629629
setGlobalIfUndef("utgMMapMemory", "16-32"); setGlobalIfUndef("utgMMapThreads", "1-16");
630630

631631
} elsif (getGlobal("genomeSize") < adjustGenomeSize("5g")) {
632-
setGlobalIfUndef("corOvlHashBlockLength", 2500000); setGlobalIfUndef("obtOvlHashBlockLength", 512 * $hx); setGlobalIfUndef("utgOvlHashBlockLength", 512 * $hx);
633-
setGlobalIfUndef("corOvlRefBlockLength", 2000000); setGlobalIfUndef("obtOvlRefBlockLength", 20000000000); setGlobalIfUndef("utgOvlRefBlockLength", 20000000000); # 20 Gbp
632+
setGlobalIfUndef("corOvlHashBlockLength", 500000000); setGlobalIfUndef("obtOvlHashBlockLength", 512 * $hx); setGlobalIfUndef("utgOvlHashBlockLength", 512 * $hx);
633+
setGlobalIfUndef("corOvlRefBlockLength", 10000000000); setGlobalIfUndef("obtOvlRefBlockLength", 20000000000); setGlobalIfUndef("utgOvlRefBlockLength", 20000000000); # 20 Gbp
634634

635-
setGlobalIfUndef("corOvlMemory", "8"); setGlobalIfUndef("corOvlThreads", "1"); setGlobalIfUndef("corOvlHashBits", 25);
635+
setGlobalIfUndef("corOvlMemory", "32"); setGlobalIfUndef("corOvlThreads", "4-16"); setGlobalIfUndef("corOvlHashBits", 26);
636636
setGlobalIfUndef("obtOvlMemory", "24"); setGlobalIfUndef("obtOvlThreads", "4-16"); setGlobalIfUndef("obtOvlHashBits", 25);
637637
setGlobalIfUndef("utgOvlMemory", "24"); setGlobalIfUndef("utgOvlThreads", "4-16"); setGlobalIfUndef("utgOvlHashBits", 25);
638638

@@ -645,10 +645,10 @@ sub configureAssembler ($$$) {
645645
setGlobalIfUndef("utgMMapMemory", "16-48"); setGlobalIfUndef("utgMMapThreads", "1-16");
646646

647647
} else {
648-
setGlobalIfUndef("corOvlHashBlockLength", 2500000); setGlobalIfUndef("obtOvlHashBlockLength", 512 * $hx); setGlobalIfUndef("utgOvlHashBlockLength", 512 * $hx);
649-
setGlobalIfUndef("corOvlRefBlockLength", 2000000); setGlobalIfUndef("obtOvlRefBlockLength", 30000000000); setGlobalIfUndef("utgOvlRefBlockLength", 30000000000); # 30 Gbp
648+
setGlobalIfUndef("corOvlHashBlockLength", 500000000); setGlobalIfUndef("obtOvlHashBlockLength", 512 * $hx); setGlobalIfUndef("utgOvlHashBlockLength", 512 * $hx);
649+
setGlobalIfUndef("corOvlRefBlockLength", 20000000000); setGlobalIfUndef("obtOvlRefBlockLength", 30000000000); setGlobalIfUndef("utgOvlRefBlockLength", 30000000000); # 30 Gbp
650650

651-
setGlobalIfUndef("corOvlMemory", "8"); setGlobalIfUndef("corOvlThreads", "1"); setGlobalIfUndef("corOvlHashBits", 25);
651+
setGlobalIfUndef("corOvlMemory", "32"); setGlobalIfUndef("corOvlThreads", "4-16"); setGlobalIfUndef("corOvlHashBits", 26);
652652
setGlobalIfUndef("obtOvlMemory", "24"); setGlobalIfUndef("obtOvlThreads", "4-16"); setGlobalIfUndef("obtOvlHashBits", 25);
653653
setGlobalIfUndef("utgOvlMemory", "24"); setGlobalIfUndef("utgOvlThreads", "4-16"); setGlobalIfUndef("utgOvlHashBits", 25);
654654

src/pipelines/canu/Defaults.pm

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -152,10 +152,20 @@ sub setGlobal ($$) {
152152
if ($var eq "gridoptionsmhap") { setGlobal("gridOptionsCORmhap", $val); setGlobal("gridOptionsOBTmhap", $val); setGlobal("gridOptionsUTGmhap", $val); return; }
153153
if ($var eq "gridoptionsmmap") { setGlobal("gridOptionsCORmmap", $val); setGlobal("gridOptionsOBTmmap", $val); setGlobal("gridOptionsUTGmmap", $val); return; }
154154

155+
#
156+
# Cycle through some aliases that will set parameters for all primary
157+
# stages (correction, trimming, and unitigging), primarily overlapper
158+
# options.
159+
#
160+
# We used to allow 'overlapper=ovl' here, but that would enable ovl for
161+
# correction, and that isn't supported without MUCH fiddling of
162+
# parameters (it's super slow, and by default makes a billion jobs - see
163+
# https://github.com/marbl/canu/issues/1924).
164+
#
165+
155166
foreach my $opt ("ovlmemory", "mhapmemory", "mmapmemory", # Execution options
156167
"ovlthreads", "mhapthreads", "mmapthreads",
157168
"ovlconcurrency", "mhapconcurrency", "mmapconcurrency",
158-
"overlapper", # Overlap algorithm selection
159169
"realign",
160170
"ovlerrorrate", # Overlapper options
161171
"ovlhashblocklength",

0 commit comments

Comments
 (0)