-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add min- and max-created filters to lddb_json_shape.py #1312
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@@ -109,6 +123,14 @@ def count_value(k, v, shape): | |||
|
|||
try: | |||
data = json.loads(l) | |||
|
|||
if '@graph' in data: | |||
created = datetime.fromisoformat(data['@graph'][0]['created']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get a traceback (at different lines on repeated runs) but upon quick inspection doesn't seem to indicate a missing created
property in the data but have not had time to investigate more properly.
File "/libris/librisxl/librisxl-tools/scripts/lddb_json_shape.py", line 128, in <module>
created = datetime.fromisoformat(data['@graph'][0]['created'])
~~~~~~~~~~~~~~~~~^^^^^^^^^^^
KeyError: 'created'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Processing definitions/build/*.lines
I see a few dozen things with no created
property, e.g.,
{'@graph': [{'@type': 'SystemRecord', 'mainEntity': {'@id': 'https://libris.kb.se/'}, '@id': 'p76szt07r0kw1bjb', 'inDataset': [{'@id': 'https://libris.kb.se/dataset/syscore'}, {'@id': 'https://libris.kb.se/dataset/sys/apps'}]}, {'@id': 'https://libris.kb.se/', '@type': 'DataCatalog', 'title': 'libris.kb.se', 'article': {'@type': 'Article', 'articleBody': "<p xml:lang='sv'>Data på <b>LIBRIS.KB.SE</b>.</p>"}}]}
{'@graph': [{'@type': 'SystemRecord', 'mainEntity': {'@id': 'https://libris.kb.se/data'}, '@id': 'p76szt07r4pwm3dk', 'inDataset': [{'@id': 'https://libris.kb.se/dataset/syscore'}, {'@id': 'https://libris.kb.se/dataset/sys/apps'}]}, {'@id': 'https://libris.kb.se/data', '@type': 'DataService', 'titleByLang': {'en': 'LIBRIS-XL Linked Data Platform API'}, 'statistics': {'sliceList': [{'dimensionChain': ['rdf:type'], 'itemLimit': 400}]}}]}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, the test sets I used only contain bib, auth or hold data (using method described here). I tried to fetch the line with sed when i got the At: 228,661Traceback (most recent call)
but didn't immediately see anything suspicious unless it yields the wrong line.
Use like: