Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update json-schema version to include date format #283

Open
drkane opened this issue Nov 4, 2020 · 4 comments
Open

Update json-schema version to include date format #283

drkane opened this issue Nov 4, 2020 · 4 comments

Comments

@drkane
Copy link
Contributor

drkane commented Nov 4, 2020

We've noticed that it's possible at the moment for a user to include an invalid date in their file, and for it to not get picked up by cove or other checks.

This is because the standard currently allows for date fields like awardDate to be either a date-time value or to match a regular expression "^[0-9]{4}-(0[1-9]|1[012])-(0[1-9]|[12][0-9]|3[01])$". The regex does some validation, but invalid dates are still allowed - for example 2020-02-31 would match the regex even though its not a valid date.

We currently use JSON schema version draft-04. Later versions of the schema allow for a date format, which I think would allow for us to specify it could be either date-time or date.

This in itself would not strictly be backwards-compatible, as there could be data that validates against the current schema version but does not validate if the date format is included. But it could be argued that it would better match the current intention of the standard, as it clearly expects a valid date.

Upgrading json-schema itself would potentially introduce other breaking changes. The date format first appears in draft-07. There are guides for upgrading from draft-04 to draft-06 and from draft-06 to draft-07 - the former mentions backwards-incompatible changes.

@robredpath
Copy link
Contributor

One other way that we could approach this would be to add to the additional checks - we've already got some on the backlog about checking that dates are the right way round, so we could add "check that dates make sense".

We could include:

  • checking that dates actually exist
  • warning against times other than 00:00:00
  • checking that dates are sensible (eg after 1990 and before now()+5 years). Fun date fact - ISO 8601 doesn't technically support dates before 1583!

Technically, we could also write a regex that allowed fewer invalid dates, except that it wouldn't know about leap years. That way lies madness, however, so let's not do that.

I am very much up for us moving on from draft-04. I don't immediately know, but I suspect that the id/$id change would mean that we'd have to bump to 2.0 . That feels like it should be a big deal, but it probably doesn't have to be if we don't want it to be.

@KDuerden
Copy link
Contributor

KDuerden commented Nov 9, 2023

I am very much up for us moving on from draft-04. I don't immediately know, but I suspect that the id/$id change would mean that we'd have to bump to 2.0 . That feels like it should be a big deal, but it probably doesn't have to be if we don't want it to be.

Does this mean moving to draft 0.6 or 0.7 would mean the id field (just the grant id or all ids?) would become $id in the JSON schema?

As we provide a Title for spreadsheet users, would this only impact JSON users directly - eg would we be able to leave the Title as Identifier but it then maps to $id instead of id?

Coming back to the dates validation issue. If a change to a new JSON schema version does introduce a breaking change it would need a thorough MAJOR upgrade change management process (as per our governance rules). So incorporating additional checks validation sooner would help ensure that we catch the actual occurrences of invalid dates early so no data needs to be broken in reality (when the time comes).

I understand there are potentially two interpretations about what a JSON schema upgrade could mean, based on whether the schema is the Standard because it sets out the rules, or the schema is the way that the rules of the Standard are enforced and so a change to the date formatting could be interpreted as bringing the schema in line with the Standard.

My instinct is that the first interpretation is correct because changes to the schema can have the practical impact of breaking data (the definition of a backward incompatible change), even if no-ones data happens to get broken at the time of the change.

We'll be discussing this issue at our upcoming Stewardship committee meeting, so I welcome any further thoughts, and if there are perspectives from how other Standards might approach this question.

@mrshll1001
Copy link
Contributor

My initial take on this is that the id/$id change doesn't instigate a version change, but we'd need to be clear on our approach to the date features.

According to the list of changes between Draft 04 and Draft 06 (when this change occured), the keyword id was replaced with $id explicitly to avoid confusion with object properties called id:

no longer easily confused with instance properties named "id"

The id keyword in JSON Schema Draft 04 (reference) is designed to alter a "resolution scope" for a URI when dereferencing things against a base schema.

We don't use this in this way, and only create properties called id. The change to $id as put forth in the Draft 06 notes are therefore describing something else, and all of our objects and grants etc would not suddenly have to have an $id field. This is actually following the original motivations of the change to $id on the JSON Schema end of things.

Therefore if we upgraded to JSON Schema Draft 2020-12, we would not break backwards compatibility on this basis.

There are other factors, though.

The way $ref works is changed. We make heavy use of $ref, but I believe the way we use it is compatible with both the older and newer uses. We'd need to confirm this behaviour before proceeding.

We'd need to renamed definitions to $defs and update the references accordingly throughout the schema.

Technically these are all changes to the schema, which is versioned, so it should trigger some version upgrade. However, I'd argue that from the perspective of validation rules and semantics of fields defined – simply upgrading the JSON Schema version and making the required changes could be conceptualised as a PATCH level change.

As for the date validation; I think we can approach this in two ways. We can be very strict and say "well this invalidates data that was valid before, therefore is a MAJOR upgrade at minimum." Another approach would be to treat the problem of invalid dates as a bug and therefore say that this is being addressed as a PATCH level fix.

We could investigate this by spinning up a branch of the schema (ideally after 1.4 is merged) and experiment with upgrading the JSON schema version, to see what existing data breaks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants