Skip to content

Commit 2aec84e

Browse files
committed
Add pcre2_code_copy_with_tables().
1 parent 43e541a commit 2aec84e

30 files changed

+2018
-1442
lines changed

ChangeLog

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -181,6 +181,9 @@ wrong name.
181181

182182
27. In pcre2test, give some offset information for errors in hex patterns.
183183

184+
28. Implemented pcre2_code_copy_with_tables(), and added pushtablescopy to
185+
pcre2test for testing it.
186+
184187

185188
Version 10.22 29-July-2016
186189
--------------------------
@@ -250,7 +253,7 @@ a report of compiler warnings from Visual Studio 2013 and a few tests with
250253
gcc's -Wconversion (which still throws up a lot).
251254

252255
15. Implemented pcre2_code_copy(), and added pushcopy and #popcopy to pcre2test
253-
for testing it.
256+
for testing it.
254257

255258
16. Change 66 for 10.21 introduced the use of snprintf() in PCRE2's version of
256259
regerror(). When the error buffer is too small, my version of snprintf() puts a

Makefile.am

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ dist_html_DATA = \
2525
doc/html/pcre2.html \
2626
doc/html/pcre2_callout_enumerate.html \
2727
doc/html/pcre2_code_copy.html \
28+
doc/html/pcre2_code_copy_with_tables.html \
2829
doc/html/pcre2_code_free.html \
2930
doc/html/pcre2_compile.html \
3031
doc/html/pcre2_compile_context_copy.html \
@@ -107,6 +108,7 @@ dist_man_MANS = \
107108
doc/pcre2.3 \
108109
doc/pcre2_callout_enumerate.3 \
109110
doc/pcre2_code_copy.3 \
111+
doc/pcre2_code_copy_with_tables.3 \
110112
doc/pcre2_code_free.3 \
111113
doc/pcre2_compile.3 \
112114
doc/pcre2_compile_context_copy.3 \

doc/html/NON-AUTOTOOLS-BUILD.txt

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -174,7 +174,11 @@ can skip ahead to the CMake section.
174174

175175
(11) If you want to use the pcre2grep command, compile and link
176176
src/pcre2grep.c; it uses only the basic 8-bit PCRE2 library (it does not
177-
need the pcre2posix library).
177+
need the pcre2posix library). If you have built the PCRE2 library with JIT
178+
support by defining SUPPORT_JIT in src/config.h, you can also define
179+
SUPPORT_PCRE2GREP_JIT, which causes pcre2grep to make use of JIT (unless
180+
it is run with --no-jit). If you define SUPPORT_PCRE2GREP_JIT without
181+
defining SUPPORT_JIT, pcre2grep does not try to make use of JIT.
178182

179183

180184
STACK SIZE IN WINDOWS ENVIRONMENTS
@@ -389,4 +393,4 @@ and executable, is in EBCDIC and native z/OS file formats and this is the
389393
recommended download site.
390394

391395
=============================
392-
Last Updated: 16 July 2015
396+
Last Updated: 13 October 2016

doc/html/README.txt

Lines changed: 44 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ wrappers.
4444

4545
The distribution does contain a set of C wrapper functions for the 8-bit
4646
library that are based on the POSIX regular expression API (see the pcre2posix
47-
man page). These can be found in a library called libpcre2posix. Note that this
47+
man page). These can be found in a library called libpcre2-posix. Note that this
4848
just provides a POSIX calling interface to PCRE2; the regular expressions
4949
themselves still follow Perl syntax and semantics. The POSIX API is restricted,
5050
and does not give full access to all of PCRE2's facilities.
@@ -58,8 +58,8 @@ renamed or pointed at by a link.
5858
If you are using the POSIX interface to PCRE2 and there is already a POSIX
5959
regex library installed on your system, as well as worrying about the regex.h
6060
header file (as mentioned above), you must also take care when linking programs
61-
to ensure that they link with PCRE2's libpcre2posix library. Otherwise they may
62-
pick up the POSIX functions of the same name from the other library.
61+
to ensure that they link with PCRE2's libpcre2-posix library. Otherwise they
62+
may pick up the POSIX functions of the same name from the other library.
6363

6464
One way of avoiding this confusion is to compile PCRE2 with the addition of
6565
-Dregcomp=PCRE2regcomp (and similarly for the other POSIX functions) to the
@@ -204,13 +204,6 @@ library. They are also documented in the pcre2build man page.
204204
--enable-newline-is-crlf, --enable-newline-is-anycrlf, or
205205
--enable-newline-is-any to the "configure" command, respectively.
206206

207-
If you specify --enable-newline-is-cr or --enable-newline-is-crlf, some of
208-
the standard tests will fail, because the lines in the test files end with
209-
LF. Even if the files are edited to change the line endings, there are likely
210-
to be some failures. With --enable-newline-is-anycrlf or
211-
--enable-newline-is-any, many tests should succeed, but there may be some
212-
failures.
213-
214207
. By default, the sequence \R in a pattern matches any Unicode line ending
215208
sequence. This is independent of the option specifying what PCRE2 considers
216209
to be the end of a line (see above). However, the caller of PCRE2 can
@@ -253,13 +246,13 @@ library. They are also documented in the pcre2build man page.
253246
sizes in the pcre2stack man page.
254247

255248
. In the 8-bit library, the default maximum compiled pattern size is around
256-
64K. You can increase this by adding --with-link-size=3 to the "configure"
257-
command. PCRE2 then uses three bytes instead of two for offsets to different
258-
parts of the compiled pattern. In the 16-bit library, --with-link-size=3 is
259-
the same as --with-link-size=4, which (in both libraries) uses four-byte
260-
offsets. Increasing the internal link size reduces performance in the 8-bit
261-
and 16-bit libraries. In the 32-bit library, the link size setting is
262-
ignored, as 4-byte offsets are always used.
249+
64K bytes. You can increase this by adding --with-link-size=3 to the
250+
"configure" command. PCRE2 then uses three bytes instead of two for offsets
251+
to different parts of the compiled pattern. In the 16-bit library,
252+
--with-link-size=3 is the same as --with-link-size=4, which (in both
253+
libraries) uses four-byte offsets. Increasing the internal link size reduces
254+
performance in the 8-bit and 16-bit libraries. In the 32-bit library, the
255+
link size setting is ignored, as 4-byte offsets are always used.
263256

264257
. You can build PCRE2 so that its internal match() function that is called from
265258
pcre2_match() does not call itself recursively. Instead, it uses memory
@@ -339,12 +332,23 @@ library. They are also documented in the pcre2build man page.
339332

340333
Of course, the relevant libraries must be installed on your system.
341334

342-
. The default size (in bytes) of the internal buffer used by pcre2grep can be
343-
set by, for example:
335+
. The default starting size (in bytes) of the internal buffer used by pcre2grep
336+
can be set by, for example:
344337

345338
--with-pcre2grep-bufsize=51200
346339

347-
The value must be a plain integer. The default is 20480.
340+
The value must be a plain integer. The default is 20480. The amount of memory
341+
used by pcre2grep is actually three times this number, to allow for "before"
342+
and "after" lines. If very long lines are encountered, the buffer is
343+
automatically enlarged, up to a fixed maximum size.
344+
345+
. The default maximum size of pcre2grep's internal buffer can be set by, for
346+
example:
347+
348+
--with-pcre2grep-max-bufsize=2097152
349+
350+
The default is either 1048576 or the value of --with-pcre2grep-bufsize,
351+
whichever is the larger.
348352

349353
. It is possible to compile pcre2test so that it links with the libreadline
350354
or libedit libraries, by specifying, respectively,
@@ -368,6 +372,22 @@ library. They are also documented in the pcre2build man page.
368372
If you get error messages about missing functions tgetstr, tgetent, tputs,
369373
tgetflag, or tgoto, this is the problem, and linking with the ncurses library
370374
should fix it.
375+
376+
. There is a special option called --enable-fuzz-support for use by people who
377+
want to run fuzzing tests on PCRE2. At present this applies only to the 8-bit
378+
library. If set, it causes an extra library called libpcre2-fuzzsupport.a to
379+
be built, but not installed. This contains a single function called
380+
LLVMFuzzerTestOneInput() whose arguments are a pointer to a string and the
381+
length of the string. When called, this function tries to compile the string
382+
as a pattern, and if that succeeds, to match it. This is done both with no
383+
options and with some random options bits that are generated from the string.
384+
Setting --enable-fuzz-support also causes a binary called pcre2fuzzcheck to
385+
be created. This is normally run under valgrind or used when PCRE2 is
386+
compiled with address sanitizing enabled. It calls the fuzzing function and
387+
outputs information about it is doing. The input strings are specified by
388+
arguments: if an argument starts with "=" the rest of it is a literal input
389+
string. Otherwise, it is assumed to be a file name, and the contents of the
390+
file are the test string.
371391

372392
The "configure" script builds the following files for the basic C library:
373393

@@ -543,7 +563,7 @@ script creates the .txt and HTML forms of the documentation from the man pages.
543563

544564

545565
Testing PCRE2
546-
------------
566+
-------------
547567

548568
To test the basic PCRE2 library on a Unix-like system, run the RunTest script.
549569
There is another script called RunGrepTest that tests the pcre2grep command.
@@ -757,6 +777,7 @@ The distribution should contain the files listed below.
757777
src/pcre2_xclass.c )
758778

759779
src/pcre2_printint.c debugging function that is used by pcre2test,
780+
src/pcre2_fuzzsupport.c function for (optional) fuzzing support
760781

761782
src/config.h.in template for config.h, when built by "configure"
762783
src/pcre2.h.in template for pcre2.h when built by "configure"
@@ -814,7 +835,7 @@ The distribution should contain the files listed below.
814835
libpcre2-8.pc.in template for libpcre2-8.pc for pkg-config
815836
libpcre2-16.pc.in template for libpcre2-16.pc for pkg-config
816837
libpcre2-32.pc.in template for libpcre2-32.pc for pkg-config
817-
libpcre2posix.pc.in template for libpcre2posix.pc for pkg-config
838+
libpcre2-posix.pc.in template for libpcre2-posix.pc for pkg-config
818839
ltmain.sh file used to build a libtool script
819840
missing ) common stub for a few missing GNU programs while
820841
) installing, generated by automake
@@ -845,4 +866,4 @@ The distribution should contain the files listed below.
845866
Philip Hazel
846867
Email local part: ph10
847868
Email domain: cam.ac.uk
848-
Last updated: 01 April 2016
869+
Last updated: 01 November 2016

doc/html/index.html

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,9 @@ <h1>Perl-compatible Regular Expressions (revised API: PCRE2)</h1>
9494
<tr><td><a href="pcre2_code_copy.html">pcre2_code_copy</a></td>
9595
<td>&nbsp;&nbsp;Copy a compiled pattern</td></tr>
9696

97+
<tr><td><a href="pcre2_code_copy_with_tables.html">pcre2_code_copy_with_tables</a></td>
98+
<td>&nbsp;&nbsp;Copy a compiled pattern and its character tables</td></tr>
99+
97100
<tr><td><a href="pcre2_code_free.html">pcre2_code_free</a></td>
98101
<td>&nbsp;&nbsp;Free a compiled pattern</td></tr>
99102

doc/html/pcre2_code_copy.html

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,8 +28,9 @@ <h1>pcre2_code_copy man page</h1>
2828
This function makes a copy of the memory used for a compiled pattern, excluding
2929
any memory used by the JIT compiler. Without a subsequent call to
3030
<b>pcre2_jit_compile()</b>, the copy can be used only for non-JIT matching. The
31-
yield of the function is NULL if <i>code</i> is NULL or if sufficient memory
32-
cannot be obtained.
31+
pointer to the character tables is copied, not the tables themselves (see
32+
<b>pcre2_code_copy_with_tables()</b>). The yield of the function is NULL if
33+
<i>code</i> is NULL or if sufficient memory cannot be obtained.
3334
</P>
3435
<P>
3536
There is a complete description of the PCRE2 native API in the
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
<html>
2+
<head>
3+
<title>pcre2_code_copy_with_tables specification</title>
4+
</head>
5+
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
6+
<h1>pcre2_code_copy_with_tables man page</h1>
7+
<p>
8+
Return to the <a href="index.html">PCRE2 index page</a>.
9+
</p>
10+
<p>
11+
This page is part of the PCRE2 HTML documentation. It was generated
12+
automatically from the original man page. If there is any nonsense in it,
13+
please consult the man page, in case the conversion went wrong.
14+
<br>
15+
<br><b>
16+
SYNOPSIS
17+
</b><br>
18+
<P>
19+
<b>#include &#60;pcre2.h&#62;</b>
20+
</P>
21+
<P>
22+
<b>pcre2_code *pcre2_code_copy_with_tables(const pcre2_code *<i>code</i>);</b>
23+
</P>
24+
<br><b>
25+
DESCRIPTION
26+
</b><br>
27+
<P>
28+
This function makes a copy of the memory used for a compiled pattern, excluding
29+
any memory used by the JIT compiler. Without a subsequent call to
30+
<b>pcre2_jit_compile()</b>, the copy can be used only for non-JIT matching.
31+
Unlike <b>pcre2_code_copy()</b>, a separate copy of the character tables is also
32+
made, with the new code pointing to it. This memory will be automatically freed
33+
when <b>pcre2_code_free()</b> is called. The yield of the function is NULL if
34+
<i>code</i> is NULL or if sufficient memory cannot be obtained.
35+
</P>
36+
<P>
37+
There is a complete description of the PCRE2 native API in the
38+
<a href="pcre2api.html"><b>pcre2api</b></a>
39+
page and a description of the POSIX API in the
40+
<a href="pcre2posix.html"><b>pcre2posix</b></a>
41+
page.
42+
<p>
43+
Return to the <a href="index.html">PCRE2 index page</a>.
44+
</p>

doc/html/pcre2_set_max_pattern_length.html

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,11 @@ <h1>pcre2_set_max_pattern_length man page</h1>
2626
DESCRIPTION
2727
</b><br>
2828
<P>
29-
This function sets, in a compile context, the maximum length (in code units) of
30-
the pattern that can be compiled. The result is always zero.
29+
This function sets, in a compile context, the maximum text length (in code
30+
units) of the pattern that can be compiled. The result is always zero. If a
31+
longer pattern is passed to <b>pcre2_compile()</b> there is an immediate error
32+
return. The default is effectively unlimited, being the largest value a
33+
PCRE2_SIZE variable can hold.
3134
</P>
3235
<P>
3336
There is a complete description of the PCRE2 native API in the

doc/html/pcre2api.html

Lines changed: 38 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -294,6 +294,9 @@ <h1>pcre2api man page</h1>
294294
<b>pcre2_code *pcre2_code_copy(const pcre2_code *<i>code</i>);</b>
295295
<br>
296296
<br>
297+
<b>pcre2_code *pcre2_code_copy_with_tables(const pcre2_code *<i>code</i>);</b>
298+
<br>
299+
<br>
297300
<b>int pcre2_get_error_message(int <i>errorcode</i>, PCRE2_UCHAR *<i>buffer</i>,</b>
298301
<b> PCRE2_SIZE <i>bufflen</i>);</b>
299302
<br>
@@ -567,8 +570,9 @@ <h1>pcre2api man page</h1>
567570
(perhaps waiting to see if the pattern is used often enough) similar logic is
568571
required. JIT compilation updates a pointer within the compiled code block, so
569572
a thread must gain unique write access to the pointer before calling
570-
<b>pcre2_jit_compile()</b>. Alternatively, <b>pcre2_code_copy()</b> can be used
571-
to obtain a private copy of the compiled code.
573+
<b>pcre2_jit_compile()</b>. Alternatively, <b>pcre2_code_copy()</b> or
574+
<b>pcre2_code_copy_with_tables()</b> can be used to obtain a private copy of the
575+
compiled code.
572576
</P>
573577
<br><b>
574578
Context blocks
@@ -736,7 +740,8 @@ <h1>pcre2api man page</h1>
736740
<br>
737741
This parameter ajusts the limit, set when PCRE2 is built (default 250), on the
738742
depth of parenthesis nesting in a pattern. This limit stops rogue patterns
739-
using up too much system stack when being compiled.
743+
using up too much system stack when being compiled. The limit applies to
744+
parentheses of all kinds, not just capturing parentheses.
740745
<b>int pcre2_set_compile_recursion_guard(pcre2_compile_context *<i>ccontext</i>,</b>
741746
<b> int (*<i>guard_function</i>)(uint32_t, void *), void *<i>user_data</i>);</b>
742747
<br>
@@ -1058,6 +1063,9 @@ <h1>pcre2api man page</h1>
10581063
<br>
10591064
<br>
10601065
<b>pcre2_code *pcre2_code_copy(const pcre2_code *<i>code</i>);</b>
1066+
<br>
1067+
<br>
1068+
<b>pcre2_code *pcre2_code_copy_with_tables(const pcre2_code *<i>code</i>);</b>
10611069
</P>
10621070
<P>
10631071
The <b>pcre2_compile()</b> function compiles a pattern into an internal form.
@@ -1079,9 +1087,22 @@ <h1>pcre2api man page</h1>
10791087
<a href="#jitcompiling">below),</a>
10801088
the JIT information cannot be copied (because it is position-dependent).
10811089
The new copy can initially be used only for non-JIT matching, though it can be
1082-
passed to <b>pcre2_jit_compile()</b> if required. The <b>pcre2_code_copy()</b>
1083-
function provides a way for individual threads in a multithreaded application
1084-
to acquire a private copy of shared compiled code.
1090+
passed to <b>pcre2_jit_compile()</b> if required.
1091+
</P>
1092+
<P>
1093+
The <b>pcre2_code_copy()</b> function provides a way for individual threads in a
1094+
multithreaded application to acquire a private copy of shared compiled code.
1095+
However, it does not make a copy of the character tables used by the compiled
1096+
pattern; the new pattern code points to the same tables as the original code.
1097+
(See
1098+
<a href="#jitcompiling">"Locale Support"</a>
1099+
below for details of these character tables.) In many applications the same
1100+
tables are used throughout, so this behaviour is appropriate. Nevertheless,
1101+
there are occasions when a copy of a compiled pattern and the relevant tables
1102+
are needed. The <b>pcre2_code_copy_with_tables()</b> provides this facility.
1103+
Copies of both the code and the tables are made, with the new code pointing to
1104+
the new tables. The memory for the new tables is automatically freed when
1105+
<b>pcre2_code_free()</b> is called for the new copy of the compiled code.
10851106
</P>
10861107
<P>
10871108
NOTE: When one of the matching functions is called, pointers to the compiled
@@ -1119,7 +1140,14 @@ <h1>pcre2api man page</h1>
11191140
error code and an offset (number of code units) within the pattern,
11201141
respectively, when <b>pcre2_compile()</b> returns NULL because a compilation
11211142
error has occurred. The values are not defined when compilation is successful
1122-
and <b>pcre2_compile()</b> returns a non-NULL value.
1143+
and <b>pcre2_compile()</b> returns a non-NULL value.
1144+
</P>
1145+
<P>
1146+
The value returned in <i>erroroffset</i> is an indication of where in the
1147+
pattern the error occurred. It is not necessarily the furthest point in the
1148+
pattern that was read. For example, after the error "lookbehind assertion is
1149+
not fixed length", the error offset points to the start of the failing
1150+
assertion.
11231151
</P>
11241152
<P>
11251153
The <b>pcre2_get_error_message()</b> function (see "Obtaining a textual error
@@ -1215,8 +1243,8 @@ <h1>pcre2api man page</h1>
12151243
PCRE2_AUTO_CALLOUT
12161244
</pre>
12171245
If this bit is set, <b>pcre2_compile()</b> automatically inserts callout items,
1218-
all with number 255, before each pattern item. For discussion of the callout
1219-
facility, see the
1246+
all with number 255, before each pattern item, except immediately before or
1247+
after a callout in the pattern. For discussion of the callout facility, see the
12201248
<a href="pcre2callout.html"><b>pcre2callout</b></a>
12211249
documentation.
12221250
<pre>
@@ -3235,7 +3263,7 @@ <h1>pcre2api man page</h1>
32353263
</P>
32363264
<br><a name="SEC41" href="#TOC1">REVISION</a><br>
32373265
<P>
3238-
Last updated: 17 June 2016
3266+
Last updated: 22 November 2016
32393267
<br>
32403268
Copyright &copy; 1997-2016 University of Cambridge.
32413269
<br>

0 commit comments

Comments
 (0)