You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the JSON-LD above, the last } is extra. And extruct or json.loads won't handle it properly.
The json.loads in Python after 3.5 will give detailed error information as JSONDecodeError: Extra data: line 19 column 1 (char 624)
In [7]: try:
...: data = json.loads(json_ld_string)
...: except json.JSONDecodeError as err:
...: print(err)
...: print(err.msg)
...: print(err.pos)
...:
Extra data: line 19 column 1 (char 624)
Extra data
624
The error.msg and error.pos can give some hint to fix the JSON-LD data, e.g., this one we can remove the character at position 624 and parse the data string again to correctly get:
Some web pages contain badly formatted JSON-LD data, e.g., an example
The JSON-LD in this page is:
In the JSON-LD above, the last
}
is extra. Andextruct
orjson.loads
won't handle it properly.The
json.loads
in Python after 3.5 will give detailed error information asJSONDecodeError: Extra data: line 19 column 1 (char 624)
The
error.msg
anderror.pos
can give some hint to fix the JSON-LD data, e.g., this one we can remove the character at position 624 and parse the data string again to correctly get:There're many possible format errors and some can be fixed easily some might be harder or even impossible.
I propose 3 ways to improve the situation:
extruct
try various ways to fix the json-ld data case by case, but need to adapt to Python >= 3.5 to allow to get detailed error infoextruct
allow the user to pass in a function to parse JSON data, and let the user to handle his own possible error typesextruct
can output the extracted JSON-LD string not parsed data and let the user to parse and handle his own possible error typesI personally recommend the latter 2 ways.
Thanks.
The text was updated successfully, but these errors were encountered: