Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion icu4c/source/common/uchar.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -488,7 +488,7 @@ u_getUnicodeVersion(UVersionInfo versionArray) {
}
}

U_CFUNC uint32_t
U_CAPI uint32_t
u_getMainProperties(UChar32 c) {
uint32_t props;
GET_PROPS(c, props);
Expand Down
2 changes: 1 addition & 1 deletion icu4c/source/common/uprops.h
Original file line number Diff line number Diff line change
Expand Up @@ -324,7 +324,7 @@ inline constexpr uint32_t UPROPS_MAX_BLOCK = 0x3ff;
* Gets the main properties value for a code point.
* Implemented in uchar.c for uprops.cpp.
*/
U_CFUNC uint32_t
U_CAPI uint32_t
u_getMainProperties(UChar32 c);

/**
Expand Down
47 changes: 47 additions & 0 deletions icu4c/source/tools/icuexportdata/icuexportdata.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -247,6 +247,48 @@ void dumpBidiMirroringGlyph(FILE* f) {
usrc_writeUCPTrie(f, shortPropName, utrie.getAlias(), UPRV_TARGET_SYNTAX_TOML);
}

/*
* Export Numeric_Value values in a similar way to how enumerated
* properties are dumped to file.
*/
void dumpNumericValue(FILE* f) {
IcuToolErrorCode status("icuexportdata: dumpNumericValue");
UProperty uproperty = UCHAR_NUMERIC_VALUE;
const char* fullPropName = u_getPropertyName(uproperty, U_LONG_PROPERTY_NAME);
const char* shortPropName = u_getPropertyName(uproperty, U_SHORT_PROPERTY_NAME);

UCPTrieValueWidth width = UCPTRIE_VALUE_BITS_32;
LocalUMutableCPTriePointer builder(umutablecptrie_open(0, 0, status));

for(UChar32 c = UCHAR_MIN_VALUE; c <= UCHAR_MAX_VALUE; c++) {
int32_t ntv = static_cast<int32_t>(GET_NUMERIC_TYPE_VALUE(u_getMainProperties(c)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GET_NUMERIC_TYPE_VALUE seems to be an internal function. Could we stick with the public function
u_getNumericValue?

Copy link
Author

@m4rch3n1ng m4rch3n1ng Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but that is lossy, and it would be impossible to "recover" the original lossless value on the icu4x side, if we ever wanted to.

like i said in the post:

floating point numbers cannot accurately represent some fractions and the highest number that unicode provides is higher than the max safe integer of a double

doing just the most basic "convert 1/3 to a float and then back to a fraction" (using the num-rational rust crate) gives me 6004799503160661/18014398509481984 and it would be nice, if sometime in the future the actual 1/3 value could be somehow extracted, for example for programming languages that natively support fractionals or similar.

additionally, the current representation fits nicely into a 32-bit uint32_t, while the other would need a 64-bit double.

i am not sure if there is a better way to do this than relying on an internal function though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, ok


if (ntv != UPROPS_NTV_NONE) {
umutablecptrie_set(builder.getAlias(), c, ntv, status);
}
}

LocalUCPTriePointer utrie(umutablecptrie_buildImmutable(
builder.getAlias(),
trieType,
width,
status));
handleError(status, __LINE__, fullPropName);

fputs("[[enum_property]]\n", f);
fprintf(f, "long_name = \"%s\"\n", fullPropName);
if (shortPropName) fprintf(f, "short_name = \"%s\"\n", shortPropName);
fprintf(f, "upropert_discr = 0x%X\n", uproperty);
dumpPropertyAliases(uproperty, f);

const UCPMap* umap = reinterpret_cast<UCPMap *>(utrie.getAlias());
usrc_writeUCPMap(f, umap, nullptr, UPRV_TARGET_SYNTAX_TOML);
fputs("\n", f);

fputs("[enum_property.code_point_trie]\n", f);
usrc_writeUCPTrie(f, shortPropName, utrie.getAlias(), UPRV_TARGET_SYNTAX_TOML);
}

// After printing property value `v`, print `mask` if and only if `mask` comes immediately
// after the property in the listing
void maybeDumpMaskValue(UProperty uproperty, uint32_t v, uint32_t mask, FILE* f) {
Expand Down Expand Up @@ -1110,6 +1152,9 @@ int exportUprops(int argc, char* argv[]) {
i = UCHAR_SCRIPT_EXTENSIONS;
}
if (i == UCHAR_SCRIPT_EXTENSIONS + 1) {
i = UCHAR_NUMERIC_VALUE;
}
if (i == UCHAR_NUMERIC_VALUE + 1) {
break;
}
UProperty uprop = static_cast<UProperty>(i);
Expand Down Expand Up @@ -1196,6 +1241,8 @@ int exportUprops(int argc, char* argv[]) {
dumpBidiMirroringGlyph(f);
} else if (propEnum == UCHAR_SCRIPT_EXTENSIONS) {
dumpScriptExtensions(f);
} else if (propEnum == UCHAR_NUMERIC_VALUE) {
dumpNumericValue(f);
} else {
std::cerr << "Don't know how to write property: " << propEnum << std::endl;
return U_INTERNAL_PROGRAM_ERROR;
Expand Down