Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non unique codes - standard/codelists /geoCodeType.csv #391

Open
michaelwood opened this issue Jan 13, 2025 · 6 comments
Open

Non unique codes - standard/codelists /geoCodeType.csv #391

michaelwood opened this issue Jan 13, 2025 · 6 comments
Assignees

Comments

@michaelwood
Copy link
Member

Unlike other codelists the list of geoCodeTypes is non unique - there are multiple codes with multiple differing titles/descriptions. I'm not sure how to choose the correct one when trying to display the non-code version when looking at Grant data.

https://github.com/ThreeSixtyGiving/standard/blob/main/codelists/geoCodeType.csv

@mariongalley
Copy link
Contributor

@michaelwood There are a couple of issues:

  1. The geoCode type CSV needs to be updated
  2. The geoCode type codelist is not currently validated and so contains data that isn't drawn from the codelist, which means the codes can't always be looked up.

@neelima-j neelima-j self-assigned this Jan 20, 2025
@neelima-j
Copy link
Contributor

Why does geoCodeType have duplicates?

geoCodeType is not constrained in the schema and can contain any string.

History

#29 is how we chose this codelist.
The duplication question was raised then (exactly 10 years ago) in #29 (comment)

However, I fear this has:
(a) Duplication - I see 'WD' used as the abbreviation for a number of different ONS Codesets - so not sure the abbreviations > are appropriate to use as our codes;

This was not addressed and the codelist was added.

Looking at the megalist from Register of Geographic Codes uploaded to drive as (RGC_DECEMBER_2020_UK v2)and the GSS wikipedia page ,
It appears that we have used a non unique 'Entity abbreviation' as the code. The source uses 185 unique Entity codes. The current codelist is a subset of the source, except for 6 Codes/ Entity abbreviations.

Duplicates

There are 46 unique values with 18 duplicates.

Duplicate Codes / Entity abbreviations occur because they are repeated for different Entity coverages (One of the 9 - Wales, Channel Islands, Isle of Man, Scotland, England and Wales, England, United Kingdom, Northern Ireland, Great Britain)

Here is a table showing the 3 character GSS code prefix, the code the codelist uses (which is an non unique entity abbreviation), the geographic coverage of the code, and the title of the code (name of the entity)

Code Entity abbrev. / Code in codelist Entity coverage Entity Name / Title in codelist
E41 CMLAD England Census Merged Local Authority Districts
W40 CMLAD Wales Census Merged Local Authority Districts
J03 WD England and Wales 1961 Census Wards
N08 WD Northern Ireland Electoral Wards
S13 WD Scotland Electoral Wards
W05 WD Wales Electoral Wards
E05 WD England Electoral Wards/Divisions

Deduplicating the codelist

If we deduplicate the codelist, there should be no issue, because the geoCode field is unique in itself within the GSS system.
However, if publishers have used other geoTypes and populated the geoCode field as they see fit, it is conceivable that there may be duplicates.

Governance - As per https://standard.threesixtygiving.org/en/latest/about/governance/#versions this will be a PATCH

Updating the codelist

Ideally, the codelist woud use what the GSS calls 'code', the unique 3 character prefix - however, making this change would be a MAJOR change. Updating the codelist within the realm of PATCH means only cleaning up the list.

Work to be done before we know how to update the codelist:

  • Why was only a subset of the codes included in this codelist?
  • What codes are currenty used?
  • How many publishers use the field?
  • What is a comprehensive source?
  • These codes have changed with time, and may change again - how does this affect the tools using this field? How are publishers getting this data - from the ONS?

@mariongalley
Copy link
Contributor

It seems to me that we have a few options moving forward:

  • Update/maintain geocode types list - deduplicate terms, add new ones (on what basis?), remove out-of-date ones
    • If we do this we can lookup the human-readable names associated with the codes in GrantNav like we do for the other codelists
    • We could also validate the codes in this case (backwards-incompatible) - although this would only guarantee that they are drawn from the codelist, not necessarily that they match the geocode provided
  • Stop maintaining geocode types list and refer to ONS list instead (as we do with ISO country codes)
    • This also would allow us to validate the codelist, with the same caveats as above
  • Deprecate the field entirely
    • If the geocodes themselves allow us to unambiguously determine the type, there's an argument that this field adds no value
    • This also resolves any possible ambiguity from geocode types that do not match the codes provided

michaelwood added a commit to ThreeSixtyGiving/datastore that referenced this issue Jan 21, 2025
This codelist is currently not able to be supported.
See ThreeSixtyGiving/standard#391

Backout the work done for this with the aim of creating a draft PR that
could revert this commit.
michaelwood added a commit to ThreeSixtyGiving/datastore that referenced this issue Jan 21, 2025
This codelist is currently not able to be supported.
See ThreeSixtyGiving/standard#391

Backout the work done for this with the aim of creating a draft PR that
could revert this commit.
@mariongalley
Copy link
Contributor

@michaelwood Katherine agrees that the field is a candidate for deprecation - hence not worth investing effort in updating the list (also this would likely be backwards-incompatible)

@neelima-j
Copy link
Contributor

@mariongalley I'll get a branch ready which deprecates the field and updates the description appropriately.

How should the Data Quality Tool report on deprecation if this field is used in the data - is it a yellow question mark?

@michaelwood @R2ZER0

@mariongalley
Copy link
Contributor

@neelima-j I don't think we're ready to pull the trigger on deprecating the field yet, but yes it's a good question if we do this, how will the DQT represent it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants