You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
globalise (v1.7.0) number formatting is incorrect for cldr-data (v36.0.0), when cldr numeric digits are from the UTF-16 supplemental plane (from U+010000 to U+10FFFF).
Short example, discussed below: 44.56 formatted in ccp locale
But returned by globalise: "��.��" = [ 'd804', 'd804', '2e', 'dd38', 'd804' ]
Based on the formatted value returned by globalise, I initially suspected that individual characters are somehow being represented in globalize as surrogate pairs (so two 16-bit hex values), but only the first of these hex values is returned. There's a worked example below, except I now have some doubts over this theory: for the 4 numeric digits involved, 3 of the digits returned by globalize seem to be the first half of a surrogate pair, but one isn't.
Example (no code)
For the "ccp" locale, digitals 0-9 are "𑄶𑄷𑄸𑄹𑄺𑄻𑄼𑄽𑄾𑄿", which have unicode hex codepoints of ["11136", "11137", "11138", "11139", "1113a", "1113b", "1113c", "1113d", "1113e", "1113f"].
So the number 44.56 formatted in ccp should be "𑄺𑄺.𑄻𑄼" = ["1113a", "1113a", "2e", "1113b", "1113c"]
What is actually returned from globalise is "��.��" = [ 'd804', 'd804', '2e', 'dd38', 'd804' ]
Using the Surrogate Pair Calculator for the individual characters in "𑄺𑄺.𑄻𑄼" = ["1113a", "1113a", "2e", "1113b", "1113c"]
So maybe globalise is returning the first hex value from each surrogate pair? But dd38 is returned, not D804 (for 1113b)
Example (code)
// Output hex values for Javascript unicode characters
var asUnicodePoints = function(value) {
return Array.from(value).map(function(codePoint) {
return codePoint.codePointAt(0).toString(16);
});
};
// For us locale, works fine
var result = Globalize('us').numberFormatter()(44.56);
console.log(result);
=> 44.56
console.log(asUnicodePoints(result));
=> [ '34', '34', '2e', '35', '36' ]
// For cpp locale, wrongly returns first hex value from each surrogate pair?
var result = Globalize('ccp').numberFormatter()(44.56);
console.log(result);
=> ��.��
console.log(asUnicodePoints(result));
=> [ 'd804', 'd804', '2e', 'dd38', 'd804' ]
// For ccp locale, the true hex values for formatted 44.56 should be..
console.log(asUnicodePoints("𑄺𑄺.𑄻𑄼"));
=> [ '1113a', '1113a', '2e', '1113b', '1113c' ]
The text was updated successfully, but these errors were encountered:
OK, this issue isn't going to be my highest priority, though I will hopefully get round to it at some point. I believe the issue only affects 4 locales, all related to the base ccp locale: ccp, ccp-u-nu-native, ccp-IN and ccp-IN-u-nu-native.
Hi there
globalise (v1.7.0) number formatting is incorrect for cldr-data (v36.0.0), when cldr numeric digits are from the UTF-16 supplemental plane (from U+010000 to U+10FFFF).
Short example, discussed below: 44.56 formatted in ccp locale
Based on the formatted value returned by globalise, I initially suspected that individual characters are somehow being represented in globalize as surrogate pairs (so two 16-bit hex values), but only the first of these hex values is returned. There's a worked example below, except I now have some doubts over this theory: for the 4 numeric digits involved, 3 of the digits returned by globalize seem to be the first half of a surrogate pair, but one isn't.
Example (no code)
For the "ccp" locale, digitals 0-9 are "𑄶𑄷𑄸𑄹𑄺𑄻𑄼𑄽𑄾𑄿", which have unicode hex codepoints of ["11136", "11137", "11138", "11139", "1113a", "1113b", "1113c", "1113d", "1113e", "1113f"].
So the number 44.56 formatted in ccp should be "𑄺𑄺.𑄻𑄼" = ["1113a", "1113a", "2e", "1113b", "1113c"]
What is actually returned from globalise is "��.��" = [ 'd804', 'd804', '2e', 'dd38', 'd804' ]
Using the Surrogate Pair Calculator for the individual characters in "𑄺𑄺.𑄻𑄼" = ["1113a", "1113a", "2e", "1113b", "1113c"]
So maybe globalise is returning the first hex value from each surrogate pair? But dd38 is returned, not D804 (for 1113b)
Example (code)
The text was updated successfully, but these errors were encountered: