-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add "unevaluatedProperties": false, (or "additionalProperties": false,) to jsonschema dump #75
Comments
we may be able to use |
we will also have to explicitly drop |
can't we add/define it to jsonschema export? |
the issue is on the ingestion side for validate/migrate/process |
@candleindark please look into how we could make our exported jsonschema more stringent and why pydantic by default does not define ATM we have only 1 object type defined with additional restriction -- that all additional properties of a ❯ grep -4 additionalProper ../schema/releases/0.6.8/asset.json
"nskey": "schema",
"title": "File encoding format"
},
"digest": {
"additionalProperties": {
"type": "string"
},
"nskey": "dandi",
"title": "A map of dandi digests to their values", And indeed we would likely need to explicitly define |
@yarikoptic @satra Below is a demo of including import json
from pydantic import BaseModel, Field, AnyHttpUrl, ConfigDict, ValidationError
from jsonschema import validate, Draft202012Validator
import jsonschema
class M(BaseModel):
context: AnyHttpUrl = Field(..., alias="@context")
model_config = ConfigDict(extra="forbid")
# Validate a model expressed in JSON
m = M.model_validate_json('{"@context": "https://schema.org"}')
# Dump validated model
print(m.model_dump_json(by_alias=True))
'{"@context":"https://schema.org/"}'
# Provide extra field in validation input
try:
M.model_validate_json('{"@context": "https://schema.org", "e": 42}')
except ValidationError as e:
print("====================================")
print(e)
"""
1 validation error for M
e
Extra inputs are not permitted [type=extra_forbidden, input_value=42, input_type=int]
For further information visit https://errors.pydantic.dev/2.9/v/extra_forbidden
"""
# JSON schema of the model
schema = M.model_json_schema()
with open("json_schema.json", "w") as f:
json.dump(schema, f, indent=2)
instance = {"@context": "https://schema.org"}
instance_with_invalid_context = {"@context": "invalid context"}
instance_with_extra = {"@context": "https://schema.org", "e": 42}
validate(instance, schema, format_checker=Draft202012Validator.FORMAT_CHECKER)
try:
validate(
instance_with_invalid_context,
schema,
format_checker=Draft202012Validator.FORMAT_CHECKER,
)
except jsonschema.exceptions.ValidationError as e:
print("====================================")
print(e)
"""
'invalid context' is not a 'uri'
Failed validating 'format' in schema['properties']['@context']:
{'format': 'uri', 'minLength': 1, 'title': '@Context', 'type': 'string'}
On instance['@context']:
'invalid context'
"""
try:
validate(
instance_with_extra, schema, format_checker=Draft202012Validator.FORMAT_CHECKER
)
except jsonschema.exceptions.ValidationError as e:
print("====================================")
print(e)
"""
Additional properties are not allowed ('e' was unexpected)
Failed validating 'additionalProperties' in schema:
{'additionalProperties': False,
'properties': {'@context': {'format': 'uri',
'minLength': 1,
'title': '@Context',
'type': 'string'}},
'required': ['@context'],
'title': 'M',
'type': 'object'}
On instance:
{'@context': 'https://schema.org', 'e': 42}
""" The JSON schema of the model is the following. The generated `json_schema.json`{
"additionalProperties": false,
"properties": {
"@context": {
"format": "uri",
"minLength": 1,
"title": "@Context",
"type": "string"
}
},
"required": [
"@context"
],
"title": "M",
"type": "object"
} Please let me know if you have any objection to adding the Questions:
Note: |
Very nice. Thanks!
yes, AFAIK in all the models if we want to unify it fully with jsonld
it seems we have somewhat of dichotomy now:
❯ pwd
/home/yoh/proj/dandi/schema/releases/0.6.8
❯ grep -l @context *
context.json
❯
❯ curl --silent -X 'GET' 'https://api.dandiarchive.org/api/dandisets/000003/versions/draft/' -H 'accept: application/json' | jq . | grep @cont
"@context": "https://raw.githubusercontent.com/dandi/schema/master/releases/0.6.0/context.json", ❯ curl --silent -X GET 'https://api.dandiarchive.org/api/dandisets/000003/versions/draft/assets/d426ff9a-bab3-446d-8104-e373ae188bd3/' | jq -r '."@context"'
https://raw.githubusercontent.com/dandi/schema/master/releases/0.4.4/context.json so would be good to unify. Please check if anything else differs between "pure" json model and "jsonld" we are exporting
ideally we would need to check all current models in the dandisets before jumping to such a change while avoid "minor" (which is our "major") version boost and some metadata migrations. It is unfortunate that our json dumps of models we carry in https://github.com/dandisets/ and not exact copy of the manifests we dump on S3... so for such a check it would be needed to "quickly" get them all from S3 |
we can go from linkml to jsonld context (the jsonld context is only relevant to instances of data). for a pydantic model, we do want to autogenerate it from linkml, and thus we would rely on the linkml generator. i don't think we want to hand craft pydantic models, perhaps only hand patch them if needed. also we could consider the linkml generator creating an appropriate context file. in the current schema we generate that context file by hand in a function. |
@yarikoptic In this case, may be we don't need to do anything regarding the |
so, with the goal to make export jsonschema more stringent lets
BTW -- I stared to import from S3 , will give you access to all of those to see if anything gets broken by more stringent checks |
@candleindark dump of manifests from S3 could be found on |
This will work but only in the from pydantic import BaseModel, Field, AnyHttpUrl, ConfigDict, ValidationError
class M0(BaseModel):
pass
# model_config["extra"] defaults to "ignore"
class M1(BaseModel):
context: AnyHttpUrl = Field(..., alias="@context")
model_config = ConfigDict(extra="forbid")
class M2(BaseModel):
model_config = ConfigDict(extra="forbid")
data_instance = {"@context": "https://schema.org"}
# Validate without any problem
m0 = M0.model_validate(data_instance)
# Validate without any problem
m1 = M1.model_validate(data_instance)
try:
m2 = M2.model_validate(data_instance)
except ValidationError as e:
print(e)
"""
1 validation error for M2
@context
Extra inputs are not permitted [type=extra_forbidden, input_value='https://schema.org', input_type=str]
For further information visit https://errors.pydantic.dev/2.9/v/extra_forbidden
""" In other words, if you want the proposed change, then we must make sure all data instances have the |
@yarikoptic Never mind about the last post. I think I have a way around the issue. |
Per my
investigative dump in slack
I think the fact that we do not restrict exported jsonschema to not allow extra attributes (fields) is what is behind the koumoul-dev/vuetify-jsonschema-form#284 (comment) and might lead to some data loss (whenever an arbitrary extra attribute is simply not accompanied with corresponding UI ) or just cause inefficiencies or crashes.
I guess we could easily add
"unevaluatedProperties": false
to every record thus making validation more stringent etc.The text was updated successfully, but these errors were encountered: