ICU-22284 dump Numeric_Value property in icuexportdata.cpp#3751
ICU-22284 dump Numeric_Value property in icuexportdata.cpp#3751m4rch3n1ng wants to merge 1 commit intounicode-org:mainfrom
Conversation
e9154e1 to
3326b56
Compare
|
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
|
i noticed (a little late), that what i was doing here previously was essentially just what bmg already does, but using a new |
|
@sffc @robertbastian @hsivonen does this look like what you would want for ICU4X? |
3326b56 to
ab0d451
Compare
|
Hooray! The files in the branch are the same across the force-push. 😃 ~ Your Friendly Jira-GitHub PR Checker Bot |
sffc
left a comment
There was a problem hiding this comment.
Thanks! Sorry I didn't see this sooner
| LocalUMutableCPTriePointer builder(umutablecptrie_open(0, 0, status)); | ||
|
|
||
| for(UChar32 c = UCHAR_MIN_VALUE; c <= UCHAR_MAX_VALUE; c++) { | ||
| int32_t ntv = static_cast<int32_t>(GET_NUMERIC_TYPE_VALUE(u_getMainProperties(c))); |
There was a problem hiding this comment.
GET_NUMERIC_TYPE_VALUE seems to be an internal function. Could we stick with the public function
u_getNumericValue?
There was a problem hiding this comment.
yes, but that is lossy, and it would be impossible to "recover" the original lossless value on the icu4x side, if we ever wanted to.
like i said in the post:
floating point numbers cannot accurately represent some fractions and the highest number that unicode provides is higher than the max safe integer of a double
doing just the most basic "convert 1/3 to a float and then back to a fraction" (using the num-rational rust crate) gives me 6004799503160661/18014398509481984 and it would be nice, if sometime in the future the actual 1/3 value could be somehow extracted, for example for programming languages that natively support fractionals or similar.
additionally, the current representation fits nicely into a 32-bit uint32_t, while the other would need a 64-bit double.
i am not sure if there is a better way to do this than relying on an internal function though.
| LocalUMutableCPTriePointer builder(umutablecptrie_open(0, 0, status)); | ||
|
|
||
| for(UChar32 c = UCHAR_MIN_VALUE; c <= UCHAR_MAX_VALUE; c++) { | ||
| int32_t ntv = static_cast<int32_t>(GET_NUMERIC_TYPE_VALUE(u_getMainProperties(c))); |
|
i'm really not sure why the ci is failing. the valgrind one seems spurious, but the rest are complaining about not finding |
|
CI says: You need to export |
ab0d451 to
e7cf0a1
Compare
|
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
e7cf0a1 to
6ec53e6
Compare
|
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
|
i changed the |
sffc
left a comment
There was a problem hiding this comment.
This should be fine; there are already lots and lots of functions with U_EXPORT that aren't in the public headers. @markusicu or @richgillam can confirm.
currently, there is no way to get the numeric value of a character from the icuexportdata, so this exports the values into a
nv.tomlfile. this is a value, that icu4x would like to be able to provide (unicode-org/icu4x#3014).for the newsimilar to thenv.tomlexport, i added a new type of property, a[[value_property]]. a value property is similar to an[[enum_property]], but it doesn't have thevalueskey for the enum variants and it doesn't have anamefield for each of the range maps.bmg.toml, this exports a[[enum_property]], but without thevaluesand without thenamefield in each of the ranges.i was a little unsure, of what value to export, as there were two options: exporting it as a double or exporting the raw numeric type value (via
GET_NUMERIC_TYPE_VALUE(u_getMainProperties(c))). i have decided on the second, both for being smaller (adoublevs anint32_t) and for being more accurate (floating point numbers cannot accurately represent some fractions and the highest number that unicode provides is higher than the max safe integer of a double). it is also more flexible, potentially allowing languages with native support for fractions to actually consume them as fractions. this does put the burden of reinterpreting the value again on the consumer side, but i think, that is a fine tradeoff.i have also made a icu4x branch, where i provide this new property: https://github.com/m4rch3n1ng/icu4x/tree/numeric-value. you can also see how the new nv.toml file looks like there.
Checklist