Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3-letter language & country codes #40

Open
rwd opened this issue Jan 21, 2019 · 2 comments
Open

3-letter language & country codes #40

rwd opened this issue Jan 21, 2019 · 2 comments

Comments

@rwd
Copy link
Contributor

rwd commented Jan 21, 2019

I would like to implement this todo from the README:

include other language/country code formats (3-letter codes...) ?

In the process, I would also like to propose some refactoring. What I have in mind here is:

  1. Use 3-letter language and country codes in filenames and for lookup keys in data caches, because all languages have a 3-letter code, but not all have a 2-letter code.
  2. Use codes in the case they are specified in the relevant ISO standard, i.e. lower-case for language codes, upper-case for country codes.
  3. Change the storage format of the cached data to JSON, e.g.
    Languages in English:
    {
      "eng": "English",
      "fra": "French",
      "spa": "Spanish; Castilian"
    }
  4. Rename the cache files thus:
    • Countries: /cache/countries/eng.json, /cache/countries/fra.json, etc
    • Languages: /cache/languages/eng.json, /cache/languages/fra.json, etc
  5. Make the live data provider add to the cache two JSON dictionaries of alternate codes (2-letter/bibliographic/numeric) mapping them to their 3-letter equivalent, e.g.
    Language code dictionary:
    {
      "en": "eng",
      "es": "spa",
      "fr": "fra",
      "chi": "zho"
    }
    Country code dictionary:
    {
      "GMB": "GM",
      "887": "YEM"
    }
  6. When performing lookups by code, if the supplied code is unknown, i.e. has no matching cache file, lookup a 3-letter equivalent in the relevant dictionary and use that.

Before I get started working on a PR for the above, is there general support for some/any/all of this proposal?

@grosser
Copy link
Owner

grosser commented Jan 21, 2019

Sounds good.

An alternative might be keeping 2-letter codes around and adding 3-letter files where necessary and then doing a 3-letter->2-letter lookup on missing, but that might get more complicated than jjust doing 3-letter lookup.

@grosser
Copy link
Owner

grosser commented Jan 21, 2019

Make sure to keep the 2-3 lookup-table in memory so we don't load that file all the time ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants