-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Typed JSON API #87
base: master
Are you sure you want to change the base?
Typed JSON API #87
Conversation
srsly/_json_api.py
Outdated
return obj | ||
|
||
|
||
def json_loads_list(data: Union[str, bytes]) -> List[Dict[str, Any]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and read_json_list
only check List
not List[Dict[str, Any]]
? I wouldn't expect a function that's called read_json_list
to have the additional dict behavior/type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right this part is a little weird. Again, what I want 99% of the time when calling read_json_list
is to have a list of dicts. But the naming does leave something to be desired. There is some validation in place for the read_json_list
function, just not for the json_loads_list
. I can copy the validation over there though.
Is read_json_dicts
a better name here? similar to the read_jsonl_dicts
.
srsly/_json_api.py
Outdated
return data | ||
|
||
|
||
def read_json_list(path: FilePath, validate_inner: bool = False, skip_invalid: bool = False) -> List[Dict[str, Any]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still a bit of a typing problem because the return type depends on the value of validate_inner
.
I think this method with these options and types may be getting too specific for inclusion in the srsly
API.
If it is going to be in srsly
, then I think it would be better as read_json_list() -> List
and read_json_list_of_dicts() -> List[Dict[str, JSONOutput]]
or something along those lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that was weird. I fixed these types and separated this logic into 2 functions.
This parsing as a list idea is kinda extra honestly. I think the main functions I really think should live in srsly are:
read_json_dict
-> We parse a normal JSON file expecting a JSON object (e.g. for configs and stuff) all the timeread_jsonl_dicts
-> We use this to load data all the time and expect each line to be a valid JSON object
srsly/_json_api.py
Outdated
|
||
data = read_json(path) | ||
if not isinstance(data, list): | ||
raise ValueError("Invalid JSON, data could not be parsed to a list of dicts.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think describing any of the input as "invalid JSON" is going to be confusing for users. Something like "invalid value" instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah good point, I changed this language so it's reads that it can't be parsed to the expected type
…t say invalid JSON, just invalid type
return obj | ||
|
||
|
||
def json_loads_list(data: Union[str, bytes]) -> List[Dict[str, JSONOutput]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This return type is too specific for the method?
Context
Currently most loader functions use the
JSONOutput
return type. This type is pretty non specific and a bit hard to reason about downstream. I find myself having to cast or validate the resulting type all the time.The current loaders return
JSONOutput
because this file e.g. is valid JSON and would be parsed properly byujson
test_file.json
hello
However, if we get this JSON file as an input in most of our code paths, we would want to raise an error as this is almost certainly invalid for what we want to do next.
This PR adds validation on top of the existing JSON API to ensure you get the expected type from a loader function.
If we like this approach, I can add to the rest of the API, just implementing the most commonly used functions from the JSON API for now.
Summary of Changes
Add
read_json_dict
- read JSON file and validate the resulting object is adict
Add
read_json_list
- read JSON file and validate the resulting object is alist
ofdict
sAdd
read_jsonl_dicts
- read JSONL file and validate each line is a validdict
in the resulting generator