-
Notifications
You must be signed in to change notification settings - Fork 143
Open
Description
The TRE documentation defines a range as
Two characters separated by -. This is shorthand for the full range of characters between those two (inclusive) in the collating sequence.
(here in the repository)
However, testing with the Estonian locale (in R's imported version of TRE) shows that T is incorrectly matched by [A-Z] ... this comment says
/* XXX - Should use collation order instead of encoding values in character ranges. */
Would it be correct to change the documentation to say
The characters to include are determined by Unicode code point ordering.
as in the ICU documentation ... ?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels