@@ -46,13 +46,18 @@ <h1>pcre2compat man page</h1>
4646any kind of quantifier on non-lookaround assertions.
4747</ P >
4848< P >
49- 4. Capture groups that occur inside negative lookaround assertions are counted,
49+ 4. If a quantifier appears where there is nothing to repeat (for example, at
50+ the start of a branch), PCRE2 raises an error whereas Perl treats the
51+ quantifier characters as literal.
52+ </ P >
53+ < P >
54+ 5. Capture groups that occur inside negative lookaround assertions are counted,
5055but their entries in the offsets vector are set only when a negative assertion
5156is a condition that has a matching branch (that is, the condition is false).
5257Perl may set such capture groups in other circumstances.
5358</ P >
5459< P >
55- 5 . The following Perl escape sequences are not supported: \F, \l, \L, \u,
60+ 6 . The following Perl escape sequences are not supported: \F, \l, \L, \u,
5661\U, and \N when followed by a character name. \N on its own, matching a
5762non-newline character, and \N{U+dd..}, matching a Unicode code point, are
5863supported. The escapes that modify the case of following letters are
@@ -63,7 +68,7 @@ <h1>pcre2compat man page</h1>
6368interprets them.
6469</ P >
6570< P >
66- 6 . The Perl escape sequences \p, \P, and \X are supported only if PCRE2 is
71+ 7 . The Perl escape sequences \p, \P, and \X are supported only if PCRE2 is
6772built with Unicode support (the default). The properties that can be tested
6873with \p and \P are limited to the general category properties such as Lu and
6974Nd, script names such as Greek or Han, Bidi_Class, Bidi_Control, and the
@@ -75,7 +80,7 @@ <h1>pcre2compat man page</h1>
7580to prefix any of these properties with "Is".
7681</ P >
7782< P >
78- 7 . PCRE2 supports the \Q...\E escape for quoting substrings. Characters
83+ 8 . PCRE2 supports the \Q...\E escape for quoting substrings. Characters
7984in between are treated as literals. However, this is slightly different from
8085Perl in that $ and @ are also handled as literals inside the quotes. In Perl,
8186they cause variable interpolation (PCRE2 does not have variables). Also, Perl
@@ -96,19 +101,19 @@ <h1>pcre2compat man page</h1>
96101by both PCRE2 and Perl.
97102</ P >
98103< P >
99- 8 . Fairly obviously, PCRE2 does not support the (?{code}) and (??{code})
104+ 9 . Fairly obviously, PCRE2 does not support the (?{code}) and (??{code})
100105constructions. However, PCRE2 does have a "callout" feature, which allows an
101106external function to be called during pattern matching. See the
102107< a href ="pcre2callout.html "> < b > pcre2callout</ b > </ a >
103108documentation for details.
104109</ P >
105110< P >
106- 9 . Subroutine calls (whether recursive or not) were treated as atomic groups up
111+ 10 . Subroutine calls (whether recursive or not) were treated as atomic groups up
107112to PCRE2 release 10.23, but from release 10.30 this changed, and backtracking
108113into subroutine calls is now supported, as in Perl.
109114</ P >
110115< P >
111- 10 . In PCRE2, if any of the backtracking control verbs are used in a group that
116+ 11 . In PCRE2, if any of the backtracking control verbs are used in a group that
112117is called as a subroutine (whether or not recursively), their effect is
113118confined to that group; it does not extend to the surrounding pattern. This is
114119not always the case in Perl. In particular, if (*THEN) is present in a group
@@ -117,20 +122,20 @@ <h1>pcre2compat man page</h1>
117122processed as anchored at the point where they are tested.
118123</ P >
119124< P >
120- 11 . If a pattern contains more than one backtracking control verb, the first
125+ 12 . If a pattern contains more than one backtracking control verb, the first
121126one that is backtracked onto acts. For example, in the pattern
122127A(*COMMIT)B(*PRUNE)C a failure in B triggers (*COMMIT), but a failure in C
123128triggers (*PRUNE). Perl's behaviour is more complex; in many cases it is the
124129same as PCRE2, but there are cases where it differs.
125130</ P >
126131< P >
127- 12 . There are some differences that are concerned with the settings of captured
132+ 13 . There are some differences that are concerned with the settings of captured
128133strings when part of a pattern is repeated. For example, matching "aba" against
129134the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE2 it is set to
130135"b".
131136</ P >
132137< P >
133- 13 . PCRE2's handling of duplicate capture group numbers and names is not as
138+ 14 . PCRE2's handling of duplicate capture group numbers and names is not as
134139general as Perl's. This is a consequence of the fact the PCRE2 works internally
135140just with numbers, using an external table to translate between numbers and
136141names. In particular, a pattern such as (?|(?<a>A)|(?<b>B)), where the two
@@ -140,34 +145,34 @@ <h1>pcre2compat man page</h1>
140145number 1. To avoid this confusing situation, an error is given at compile time.
141146</ P >
142147< P >
143- 14 . Perl used to recognize comments in some places that PCRE2 does not, for
148+ 15 . Perl used to recognize comments in some places that PCRE2 does not, for
144149example, between the ( and ? at the start of a group. If the /x modifier is
145150set, Perl allowed white space between ( and ? though the latest Perls give an
146151error (for a while it was just deprecated). There may still be some cases where
147152Perl behaves differently.
148153</ P >
149154< P >
150- 15 . Perl, when in warning mode, gives warnings for character classes such as
155+ 16 . Perl, when in warning mode, gives warnings for character classes such as
151156[A-\d] or [a-[:digit:]]. It then treats the hyphens as literals. PCRE2 has no
152157warning features, so it gives an error in these cases because they are almost
153158certainly user mistakes.
154159</ P >
155160< P >
156- 16 . In PCRE2, the upper/lower case character properties Lu and Ll are not
161+ 17 . In PCRE2, the upper/lower case character properties Lu and Ll are not
157162affected when case-independent matching is specified. For example, \p{Lu}
158163always matches an upper case letter. I think Perl has changed in this respect;
159164in the release at the time of writing (5.34), \p{Lu} and \p{Ll} match all
160165letters, regardless of case, when case independence is specified.
161166</ P >
162167< P >
163- 17 . From release 5.32.0, Perl locks out the use of \K in lookaround
168+ 18 . From release 5.32.0, Perl locks out the use of \K in lookaround
164169assertions. From release 10.38 PCRE2 does the same by default. However, there
165170is an option for re-enabling the previous behaviour. When this option is set,
166171\K is acted on when it occurs in positive assertions, but is ignored in
167172negative assertions.
168173</ P >
169174< P >
170- 18 . PCRE2 provides some extensions to the Perl regular expression facilities.
175+ 19 . PCRE2 provides some extensions to the Perl regular expression facilities.
171176Perl 5.10 included new features that were not in earlier versions of Perl, some
172177of which (such as named parentheses) were in PCRE2 for some time before. This
173178list is with respect to Perl 5.34:
@@ -219,7 +224,7 @@ <h1>pcre2compat man page</h1>
219224lookarounds are atomic.
220225</ P >
221226< P >
222- 19 . Perl has different limits than PCRE2. See the
227+ 20 . Perl has different limits than PCRE2. See the
223228< a href ="pcre2limit.html "> < b > pcre2limit</ b > </ a >
224229documentation for details. Perl went with 5.10 from recursion to iteration
225230keeping the intermediate matches on the heap, which is ~10% slower but does not
@@ -241,7 +246,7 @@ <h1>pcre2compat man page</h1>
241246REVISION
242247</ b > < br >
243248< P >
244- Last updated: 11 August 2023
249+ Last updated: 13 September 2023
245250< br >
246251Copyright © 1997-2023 University of Cambridge.
247252< br >
0 commit comments