Skip to content

Commit 8057c3c

Browse files
committed
Renamed dftables as pcre2_dftables and enable it to write the tables in binary.
Update documentation about character tables.
1 parent 953d4e9 commit 8057c3c

30 files changed

+1062
-694
lines changed

CMakeLists.txt

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,7 @@
8585
# 2018-11-14 PH removed unnecessary checks for stdint.h and inttypes.h
8686
# 2018-11-16 PH added PCRE2GREP_SUPPORT_CALLOUT_FORK support and tidied
8787
# 2019-02-16 PH hacked to avoid CMP0026 policy issue (see comments below)
88+
# 2020-03-26 PH renamed dftables as pcre2_dftables (as elsewhere)
8889

8990
PROJECT(PCRE2 C)
9091

@@ -423,11 +424,11 @@ CONFIGURE_FILE(src/pcre2.h.in
423424

424425
OPTION(PCRE2_REBUILD_CHARTABLES "Rebuild char tables" OFF)
425426
IF(PCRE2_REBUILD_CHARTABLES)
426-
ADD_EXECUTABLE(dftables src/dftables.c)
427+
ADD_EXECUTABLE(pcre2_dftables src/pcre2_dftables.c)
427428
ADD_CUSTOM_COMMAND(
428429
COMMENT "Generating character tables (pcre2_chartables.c) for current locale"
429-
DEPENDS dftables
430-
COMMAND dftables
430+
DEPENDS pcre2_dftables
431+
COMMAND pcre2_dftables
431432
ARGS ${PROJECT_BINARY_DIR}/pcre2_chartables.c
432433
OUTPUT ${PROJECT_BINARY_DIR}/pcre2_chartables.c
433434
)

ChangeLog

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,18 @@ could be mis-compiled and therefore not match correctly. This is the example
8282
that found this: /(?(DEFINE)(?<foo>bar))(?<![-a-z0-9])word/ which failed to
8383
match "word" because the "move back" value was set to zero.
8484

85+
21. Following a request from a user, some extensions and tidies to the
86+
character tables handling have been done:
87+
88+
(a) The dftables auxiliary program is renamed pcre2_dftables, but it is still
89+
not installed for public use.
90+
91+
(b) There is now a -b option for pcre2_dftables, which causes the tables to
92+
be written in binary. There is also a -help option.
93+
94+
(c) PCRE2_CONFIG_TABLES_LENGTH is added to pcre2_config() so that an
95+
application that wants to save tables in binary knows how long they are.
96+
8597

8698
Version 10.34 21-November-2019
8799
------------------------------

Makefile.am

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -325,18 +325,18 @@ include_HEADERS = src/pcre2posix.h
325325
bin_SCRIPTS = pcre2-config
326326

327327
## ---------------------------------------------------------------
328-
## The dftables program is used to rebuild character tables before compiling
329-
## PCRE2, if --enable-rebuild-chartables is specified. It is not a user-visible
330-
## program. The default (when --enable-rebuild-chartables is not specified) is
331-
## to copy a distributed set of tables that are defined for ASCII code. In this
332-
## case, dftables is not needed.
328+
## The pcre2_dftables program is used to rebuild character tables before
329+
## compiling PCRE2, if --enable-rebuild-chartables is specified. It is not an
330+
## installed program. The default (when --enable-rebuild-chartables is not
331+
## specified) is to copy a distributed set of tables that are defined for ASCII
332+
## code. In this case, pcre2_dftables is not needed.
333333

334334
if WITH_REBUILD_CHARTABLES
335-
noinst_PROGRAMS += dftables
336-
dftables_SOURCES = src/dftables.c
337-
src/pcre2_chartables.c: dftables$(EXEEXT)
335+
noinst_PROGRAMS += pcre2_dftables
336+
pcre2_dftables_SOURCES = src/pcre2_dftables.c
337+
src/pcre2_chartables.c: pcre2_dftables$(EXEEXT)
338338
rm -f $@
339-
./dftables$(EXEEXT) $@
339+
./pcre2_dftables$(EXEEXT) $@
340340
else
341341
src/pcre2_chartables.c: $(srcdir)/src/pcre2_chartables.c.dist
342342
rm -f $@
@@ -634,6 +634,7 @@ EXTRA_DIST += \
634634
testdata/grepoutputCN \
635635
testdata/grepoutputN \
636636
testdata/greppatN4 \
637+
testdata/testbtables \
637638
testdata/testinput1 \
638639
testdata/testinput2 \
639640
testdata/testinput3 \

NON-AUTOTOOLS-BUILD

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -74,14 +74,14 @@ can skip ahead to the CMake section.
7474
src/pcre2_chartables.c.
7575

7676
OR:
77-
Compile src/dftables.c as a stand-alone program (using -DHAVE_CONFIG_H
78-
if you have set up src/config.h), and then run it with the single
79-
argument "src/pcre2_chartables.c". This generates a set of standard
80-
character tables and writes them to that file. The tables are generated
81-
using the default C locale for your system. If you want to use a locale
82-
that is specified by LC_xxx environment variables, add the -L option to
83-
the dftables command. You must use this method if you are building on a
84-
system that uses EBCDIC code.
77+
Compile src/pcre2_dftables.c as a stand-alone program (using
78+
-DHAVE_CONFIG_H if you have set up src/config.h), and then run it with
79+
the single argument "src/pcre2_chartables.c". This generates a set of
80+
standard character tables and writes them to that file. The tables are
81+
generated using the default C locale for your system. If you want to use
82+
a locale that is specified by LC_xxx environment variables, add the -L
83+
option to the pcre2_dftables command. You must use this method if you
84+
are building on a system that uses EBCDIC code.
8585

8686
The tables in src/pcre2_chartables.c are defaults. The caller of PCRE2 can
8787
specify alternative tables at run time.

PrepareRelease

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -190,7 +190,7 @@ files="\
190190
libpcre2-16.pc.in \
191191
libpcre2-32.pc.in \
192192
libpcre2-posix.pc.in \
193-
src/dftables.c \
193+
src/pcre2_dftables.c \
194194
src/pcre2.h.in \
195195
src/pcre2_auto_possess.c \
196196
src/pcre2_compile.c \

README

Lines changed: 34 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -269,9 +269,9 @@ library. They are also documented in the pcre2build man page.
269269

270270
--enable-rebuild-chartables
271271

272-
a program called dftables is compiled and run in the default C locale when
273-
you obey "make". It builds a source file called pcre2_chartables.c. If you do
274-
not specify this option, pcre2_chartables.c is created as a copy of
272+
a program called pcre2_dftables is compiled and run in the default C locale
273+
when you obey "make". It builds a source file called pcre2_chartables.c. If
274+
you do not specify this option, pcre2_chartables.c is created as a copy of
275275
pcre2_chartables.c.dist. See "Character tables" below for further
276276
information.
277277

@@ -548,21 +548,22 @@ Cross-compiling using autotools
548548

549549
You can specify CC and CFLAGS in the normal way to the "configure" command, in
550550
order to cross-compile PCRE2 for some other host. However, you should NOT
551-
specify --enable-rebuild-chartables, because if you do, the dftables.c source
552-
file is compiled and run on the local host, in order to generate the inbuilt
553-
character tables (the pcre2_chartables.c file). This will probably not work,
554-
because dftables.c needs to be compiled with the local compiler, not the cross
555-
compiler.
551+
specify --enable-rebuild-chartables, because if you do, the pcre2_dftables.c
552+
source file is compiled and run on the local host, in order to generate the
553+
inbuilt character tables (the pcre2_chartables.c file). This will probably not
554+
work, because pcre2_dftables.c needs to be compiled with the local compiler,
555+
not the cross compiler.
556556

557557
When --enable-rebuild-chartables is not specified, pcre2_chartables.c is
558558
created by making a copy of pcre2_chartables.c.dist, which is a default set of
559559
tables that assumes ASCII code. Cross-compiling with the default tables should
560560
not be a problem.
561561

562562
If you need to modify the character tables when cross-compiling, you should
563-
move pcre2_chartables.c.dist out of the way, then compile dftables.c by hand
564-
and run it on the local host to make a new version of pcre2_chartables.c.dist.
565-
Then when you cross-compile PCRE2 this new version of the tables will be used.
563+
move pcre2_chartables.c.dist out of the way, then compile pcre2_dftables.c by
564+
hand and run it on the local host to make a new version of
565+
pcre2_chartables.c.dist. See the pcre2build section "Creating character tables
566+
at build time" for more details.
566567

567568

568569
Making new tarballs
@@ -721,8 +722,8 @@ compile context.
721722
The source file called pcre2_chartables.c contains the default set of tables.
722723
By default, this is created as a copy of pcre2_chartables.c.dist, which
723724
contains tables for ASCII coding. However, if --enable-rebuild-chartables is
724-
specified for ./configure, a different version of pcre2_chartables.c is built
725-
by the program dftables (compiled from dftables.c), which uses the ANSI C
725+
specified for ./configure, a new version of pcre2_chartables.c is built by the
726+
program pcre2_dftables (compiled from pcre2_dftables.c), which uses the ANSI C
726727
character handling functions such as isalnum(), isalpha(), isupper(),
727728
islower(), etc. to build the table sources. This means that the default C
728729
locale that is set for your system will control the contents of these default
@@ -732,32 +733,31 @@ file does not get automatically re-generated. The best way to do this is to
732733
move pcre2_chartables.c.dist out of the way and replace it with your customized
733734
tables.
734735

735-
When the dftables program is run as a result of --enable-rebuild-chartables,
736-
it uses the default C locale that is set on your system. It does not pay
737-
attention to the LC_xxx environment variables. In other words, it uses the
738-
system's default locale rather than whatever the compiling user happens to have
739-
set. If you really do want to build a source set of character tables in a
740-
locale that is specified by the LC_xxx variables, you can run the dftables
741-
program by hand with the -L option. For example:
736+
When the pcre2_dftables program is run as a result of specifying
737+
--enable-rebuild-chartables, it uses the default C locale that is set on your
738+
system. It does not pay attention to the LC_xxx environment variables. In other
739+
words, it uses the system's default locale rather than whatever the compiling
740+
user happens to have set. If you really do want to build a source set of
741+
character tables in a locale that is specified by the LC_xxx variables, you can
742+
run the pcre2_dftables program by hand with the -L option. For example:
742743

743-
./dftables -L pcre2_chartables.c.special
744+
./pcre2_dftables -L pcre2_chartables.c.special
744745

745-
The first two 256-byte tables provide lower casing and case flipping functions,
746-
respectively. The next table consists of three 32-byte bit maps which identify
747-
digits, "word" characters, and white space, respectively. These are used when
748-
building 32-byte bit maps that represent character classes for code points less
749-
than 256. The final 256-byte table has bits indicating various character types,
750-
as follows:
746+
The second argument names the file where the source code for the tables is
747+
written. The first two 256-byte tables provide lower casing and case flipping
748+
functions, respectively. The next table consists of a number of 32-byte bit
749+
maps which identify certain character classes such as digits, "word"
750+
characters, white space, etc. These are used when building 32-byte bit maps
751+
that represent character classes for code points less than 256. The final
752+
256-byte table has bits indicating various character types, as follows:
751753

752754
1 white space character
753755
2 letter
754-
4 decimal digit
755-
8 hexadecimal digit
756+
4 lower case letter
757+
8 decimal digit
756758
16 alphanumeric or '_'
757-
128 regular expression metacharacter or binary zero
758759

759-
You should not alter the set of characters that contain the 128 bit, as that
760-
will cause PCRE2 to malfunction.
760+
See also the pcre2build section "Creating character tables at build time".
761761

762762

763763
File manifest
@@ -768,7 +768,7 @@ The distribution should contain the files listed below.
768768
(A) Source files for the PCRE2 library functions and their headers are found in
769769
the src directory:
770770

771-
src/dftables.c auxiliary program for building pcre2_chartables.c
771+
src/pcre2_dftables.c auxiliary program for building pcre2_chartables.c
772772
when --enable-rebuild-chartables is specified
773773

774774
src/pcre2_chartables.c.dist a default set of character tables that assume
@@ -894,4 +894,4 @@ The distribution should contain the files listed below.
894894
Philip Hazel
895895
Email local part: ph10
896896
Email domain: cam.ac.uk
897-
Last updated: 16 April 2019
897+
Last updated: 20 March 2020

doc/html/NON-AUTOTOOLS-BUILD.txt

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -74,14 +74,14 @@ can skip ahead to the CMake section.
7474
src/pcre2_chartables.c.
7575

7676
OR:
77-
Compile src/dftables.c as a stand-alone program (using -DHAVE_CONFIG_H
78-
if you have set up src/config.h), and then run it with the single
79-
argument "src/pcre2_chartables.c". This generates a set of standard
80-
character tables and writes them to that file. The tables are generated
81-
using the default C locale for your system. If you want to use a locale
82-
that is specified by LC_xxx environment variables, add the -L option to
83-
the dftables command. You must use this method if you are building on a
84-
system that uses EBCDIC code.
77+
Compile src/pcre2_dftables.c as a stand-alone program (using
78+
-DHAVE_CONFIG_H if you have set up src/config.h), and then run it with
79+
the single argument "src/pcre2_chartables.c". This generates a set of
80+
standard character tables and writes them to that file. The tables are
81+
generated using the default C locale for your system. If you want to use
82+
a locale that is specified by LC_xxx environment variables, add the -L
83+
option to the pcre2_dftables command. You must use this method if you
84+
are building on a system that uses EBCDIC code.
8585

8686
The tables in src/pcre2_chartables.c are defaults. The caller of PCRE2 can
8787
specify alternative tables at run time.

doc/html/README.txt

Lines changed: 34 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -269,9 +269,9 @@ library. They are also documented in the pcre2build man page.
269269

270270
--enable-rebuild-chartables
271271

272-
a program called dftables is compiled and run in the default C locale when
273-
you obey "make". It builds a source file called pcre2_chartables.c. If you do
274-
not specify this option, pcre2_chartables.c is created as a copy of
272+
a program called pcre2_dftables is compiled and run in the default C locale
273+
when you obey "make". It builds a source file called pcre2_chartables.c. If
274+
you do not specify this option, pcre2_chartables.c is created as a copy of
275275
pcre2_chartables.c.dist. See "Character tables" below for further
276276
information.
277277

@@ -548,21 +548,22 @@ Cross-compiling using autotools
548548

549549
You can specify CC and CFLAGS in the normal way to the "configure" command, in
550550
order to cross-compile PCRE2 for some other host. However, you should NOT
551-
specify --enable-rebuild-chartables, because if you do, the dftables.c source
552-
file is compiled and run on the local host, in order to generate the inbuilt
553-
character tables (the pcre2_chartables.c file). This will probably not work,
554-
because dftables.c needs to be compiled with the local compiler, not the cross
555-
compiler.
551+
specify --enable-rebuild-chartables, because if you do, the pcre2_dftables.c
552+
source file is compiled and run on the local host, in order to generate the
553+
inbuilt character tables (the pcre2_chartables.c file). This will probably not
554+
work, because pcre2_dftables.c needs to be compiled with the local compiler,
555+
not the cross compiler.
556556

557557
When --enable-rebuild-chartables is not specified, pcre2_chartables.c is
558558
created by making a copy of pcre2_chartables.c.dist, which is a default set of
559559
tables that assumes ASCII code. Cross-compiling with the default tables should
560560
not be a problem.
561561

562562
If you need to modify the character tables when cross-compiling, you should
563-
move pcre2_chartables.c.dist out of the way, then compile dftables.c by hand
564-
and run it on the local host to make a new version of pcre2_chartables.c.dist.
565-
Then when you cross-compile PCRE2 this new version of the tables will be used.
563+
move pcre2_chartables.c.dist out of the way, then compile pcre2_dftables.c by
564+
hand and run it on the local host to make a new version of
565+
pcre2_chartables.c.dist. See the pcre2build section "Creating character tables
566+
at build time" for more details.
566567

567568

568569
Making new tarballs
@@ -721,8 +722,8 @@ compile context.
721722
The source file called pcre2_chartables.c contains the default set of tables.
722723
By default, this is created as a copy of pcre2_chartables.c.dist, which
723724
contains tables for ASCII coding. However, if --enable-rebuild-chartables is
724-
specified for ./configure, a different version of pcre2_chartables.c is built
725-
by the program dftables (compiled from dftables.c), which uses the ANSI C
725+
specified for ./configure, a new version of pcre2_chartables.c is built by the
726+
program pcre2_dftables (compiled from pcre2_dftables.c), which uses the ANSI C
726727
character handling functions such as isalnum(), isalpha(), isupper(),
727728
islower(), etc. to build the table sources. This means that the default C
728729
locale that is set for your system will control the contents of these default
@@ -732,32 +733,31 @@ file does not get automatically re-generated. The best way to do this is to
732733
move pcre2_chartables.c.dist out of the way and replace it with your customized
733734
tables.
734735

735-
When the dftables program is run as a result of --enable-rebuild-chartables,
736-
it uses the default C locale that is set on your system. It does not pay
737-
attention to the LC_xxx environment variables. In other words, it uses the
738-
system's default locale rather than whatever the compiling user happens to have
739-
set. If you really do want to build a source set of character tables in a
740-
locale that is specified by the LC_xxx variables, you can run the dftables
741-
program by hand with the -L option. For example:
736+
When the pcre2_dftables program is run as a result of specifying
737+
--enable-rebuild-chartables, it uses the default C locale that is set on your
738+
system. It does not pay attention to the LC_xxx environment variables. In other
739+
words, it uses the system's default locale rather than whatever the compiling
740+
user happens to have set. If you really do want to build a source set of
741+
character tables in a locale that is specified by the LC_xxx variables, you can
742+
run the pcre2_dftables program by hand with the -L option. For example:
742743

743-
./dftables -L pcre2_chartables.c.special
744+
./pcre2_dftables -L pcre2_chartables.c.special
744745

745-
The first two 256-byte tables provide lower casing and case flipping functions,
746-
respectively. The next table consists of three 32-byte bit maps which identify
747-
digits, "word" characters, and white space, respectively. These are used when
748-
building 32-byte bit maps that represent character classes for code points less
749-
than 256. The final 256-byte table has bits indicating various character types,
750-
as follows:
746+
The second argument names the file where the source code for the tables is
747+
written. The first two 256-byte tables provide lower casing and case flipping
748+
functions, respectively. The next table consists of a number of 32-byte bit
749+
maps which identify certain character classes such as digits, "word"
750+
characters, white space, etc. These are used when building 32-byte bit maps
751+
that represent character classes for code points less than 256. The final
752+
256-byte table has bits indicating various character types, as follows:
751753

752754
1 white space character
753755
2 letter
754-
4 decimal digit
755-
8 hexadecimal digit
756+
4 lower case letter
757+
8 decimal digit
756758
16 alphanumeric or '_'
757-
128 regular expression metacharacter or binary zero
758759

759-
You should not alter the set of characters that contain the 128 bit, as that
760-
will cause PCRE2 to malfunction.
760+
See also the pcre2build section "Creating character tables at build time".
761761

762762

763763
File manifest
@@ -768,7 +768,7 @@ The distribution should contain the files listed below.
768768
(A) Source files for the PCRE2 library functions and their headers are found in
769769
the src directory:
770770

771-
src/dftables.c auxiliary program for building pcre2_chartables.c
771+
src/pcre2_dftables.c auxiliary program for building pcre2_chartables.c
772772
when --enable-rebuild-chartables is specified
773773

774774
src/pcre2_chartables.c.dist a default set of character tables that assume
@@ -894,4 +894,4 @@ The distribution should contain the files listed below.
894894
Philip Hazel
895895
Email local part: ph10
896896
Email domain: cam.ac.uk
897-
Last updated: 16 April 2019
897+
Last updated: 20 March 2020

doc/html/pcre2_set_character_tables.html

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,9 +27,12 @@ <h1>pcre2_set_character_tables man page</h1>
2727
</b><br>
2828
<P>
2929
This function sets a pointer to custom character tables within a compile
30-
context. The second argument must be the result of a call to
31-
<b>pcre2_maketables()</b> or NULL to request the default tables. The result is
32-
always zero.
30+
context. The second argument must point to a set of PCRE2 character tables or
31+
be NULL to request the default tables. The result is always zero. Character
32+
tables can be created by calling <b>pcre2_maketables()</b> or by running the
33+
<b>pcre2_dftables</b> maintenance command in binary mode (see the
34+
<a href="pcre2build.html"><b>pcre2build</b></a>
35+
documentation).
3336
</P>
3437
<P>
3538
There is a complete description of the PCRE2 native API in the

0 commit comments

Comments
 (0)