-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dedupe location names against script-root #5966
base: main
Are you sure you want to change the base?
Conversation
🎉 All dependencies have been resolved ! |
8aa3ffd
to
1e46e4f
Compare
🎉 All dependencies have been resolved ! |
ac00aef
to
4820cf3
Compare
Do we not do similar deduplication in the other time zone keys as well? |
So the problem with |
Hmm. An approach that would work with our framework would be to have two keys:
Every locale has an entry in each key, but |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requesting changes because this code won't work with data slicing.
I do not want to change this to script fallback, because it should definitely do region fallback. The two-key approach is pretty complex, the base keys would have to contain the intersection of all data structs that are deduped against the locale, which is hard to generate in our current framework (it's also not clear if grouping by script leads to the most deduping). This currently has a data size improvement of <10%, I'm not sure it's worth pursuing at the moment. |
Hm? Right now it is language fallback. Script fallback is identical except that it adds
Hm? I don't understand this, either. Generating the base key is trivial. It is equal to the language data for the likely language of the locale's script. We could do something fancier but that's not what I suggested above
1'407'337 B to 1'297'285 B, or 110 kB. It's about the raw data size improvement, yes, but also about language equity and not privileging English. |
@robertbastian Can you clarify your comment about why you don't want to use script fallback? |
I guess that's right.
|
Fixes #5901
I'm not introducing script fallback (because time zone names are not meant to fall back by script), and I'm not introducing
und-Xxxx
locales, as those would increase data size (due to the other fields in the struct). Instead, I store adedupe_locale
in each data struct, which contains the locale that was deduplicated against. So forsr
, this would beru
, becausesr
->und-Cyrl
->ru
, and forru
it'd beru
as well, andru
is complete. This way, we also directly know which dedupe locale to load; if we didn't store it we'd have to load a fallbacker and step through it until we hitund
(and then use the previous value).Depends on #5967