Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unicode.SimpleFold(r) #30

Open
wallclockbuilder opened this issue Jun 29, 2015 · 5 comments
Open

Fix unicode.SimpleFold(r) #30

wallclockbuilder opened this issue Jun 29, 2015 · 5 comments

Comments

@wallclockbuilder
Copy link
Owner

We expect that
Right now
unicode.SimpleFold('k') == '\u212A'
*\u212A is 'K' the Kelvin char
This is not intuitive for ASCII simple folding.
Fix it so that
unicode.SimpleFold('k') == 'K'

@wallclockbuilder
Copy link
Owner Author

lowercase letter 'k' is \u004B
uppercase lettter 'K' is \u006B

@wallclockbuilder
Copy link
Owner Author

Simply mapping [a-z] to [A-Z] should work for most simple ASCII-only text documents.

@wallclockbuilder
Copy link
Owner Author

the Unicode 6.0 spec has this to say about U+212A (KELVIN SIGN):

Three letterlike symbols have been given canonical equivalence to regular letters: U+2126 OHM SIGN, U+212A KELVIN SIGN, and U+212B ANGSTROM SIGN. In all three instances, the regular letter should be used. If text is normalized according to Unicode Standard Annex #15, “Unicode Normalization Forms,” these three characters will be replaced by their regular equivalents.

In other words, you shouldn't really be using U+212A, you should be using U+004B (LATIN CAPITAL LETTER K) instead, and if you normalize your Unicode text, U+212A should be replaced with U+004B.

@wallclockbuilder
Copy link
Owner Author

Three letterlike symbols have been given canonical equivalence to regular letters:
U+2126 ohm sign,
U+212A kelvin sign, and
U+212B angstrom sign.

In all three instances, the regular letter should be used. If text is normalized according to Unicode Standard Annex#15, “Unicode Normalization Forms,” these three characters will be replaced by their regular equivalents.

http://www.unicode.org/versions/Unicode6.0.0/ch15.pdf

@wallclockbuilder
Copy link
Owner Author

Unicode 8.0 Character Code Charts
The most current code chart containing U+212A is:

http://www.unicode.org/charts/PDF/U2100.pdf 

And it specs that the Kelvin sign is equivalent to the Latin Capital letter k.
Heres a snapshot.
kelvin sign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant