Skip to content

Commit 9cebee7

Browse files
committed
Tidies and updates to maintenance programs utf8 and ucptest.
1 parent c472f3f commit 9cebee7

File tree

7 files changed

+1082
-790
lines changed

7 files changed

+1082
-790
lines changed

maint/README

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -54,18 +54,20 @@ Unicode.tables The files in this directory were downloaded from the Unicode
5454
ucptest.c A short C program for testing the Unicode property macros
5555
that do lookups in the pcre2_ucd.c data, mainly useful after
5656
rebuilding the Unicode property table. Compile and run this in
57-
the "maint" directory (see comments at its head).
57+
the "maint" directory (see comments at its head). This program
58+
can also be used to find characters with specific properties.
5859

59-
ucptestdata A directory containing two files, testinput1 and testoutput1,
60-
to use in conjunction with the ucptest program.
60+
ucptestdata A directory containing four files, testinput{1,2} and
61+
testoutput{1,2}, for use in conjunction with the ucptest
62+
program.
6163

6264
utf8.c A short, freestanding C program for converting a Unicode code
6365
point into a sequence of bytes in the UTF-8 encoding, and vice
6466
versa. If its argument is a hex number such as 0x1234, it
6567
outputs a list of the equivalent UTF-8 bytes. If its argument
6668
is a sequence of concatenated UTF-8 bytes (e.g. e188b4) it
6769
treats them as a UTF-8 character and outputs the equivalent
68-
code point in hex.
70+
code point in hex. See comments at its head for details.
6971

7072

7173
Updating to a new Unicode release
@@ -96,9 +98,10 @@ lists of scripts.
9698

9799
The ucptest program can be compiled and used to check that the new tables in
98100
pcre2_ucd.c work properly, using the data files in ucptestdata to check a
99-
number of test characters. The source file ucptest.c should also be updated
100-
whenever new Unicode script names are added, and adding a few tests for new
101-
scripts is a good idea.
101+
number of test characters. It used to be necessary to update the source
102+
ucptest.c whenever new Unicode scripts were added, but this is no longer
103+
required because that program now uses the lists in the PCRE2 source. However,
104+
adding a few tests for new scripts to the files in ucptestdata is a good idea.
102105

103106

104107
Preparing for a PCRE2 release
@@ -437,4 +440,4 @@ very sensible; some are rather wacky. Some have been on this list for years.
437440
Philip Hazel
438441
Email local part: ph10
439442
Email domain: cam.ac.uk
440-
Last updated: 03 June 2019
443+
Last updated: 01 April 2020

0 commit comments

Comments
 (0)