-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert SSSOM TSV => JSON => TSV and confirm the MappingSetDataFrame
object remains consistent.
#429
Conversation
…remains consistent.
At the moment this fails. Here's the reason why: Here's the test code: sssom-py/tests/test_parsers.py Lines 250 to 265 in 831a58e
{
'owl': 'http://www.w3.org/2002/07/owl#',
'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
'rdfs': 'http://www.w3.org/2000/01/rdf-schema#',
'semapv': 'https://w3id.org/semapv/vocab/',
'skos': 'http://www.w3.org/2004/02/skos/core#',
'sssom': 'https://w3id.org/sssom/',
'FBbt': 'http://purl.obolibrary.org/obo/FBbt_',
'ORCID': 'https://orcid.org/',
'UBERON': 'http://purl.obolibrary.org/obo/UBERON_'
} whereas {
'owl': 'http://www.w3.org/2002/07/owl#',
'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
'rdfs': 'http://www.w3.org/2000/01/rdf-schema#',
'semapv': 'https://w3id.org/semapv/vocab/',
'skos': 'http://www.w3.org/2004/02/skos/core#',
'sssom': 'https://w3id.org/sssom/',
'FBbt': 'http://purl.obolibrary.org/obo/FBbt_',
'UBERON': 'http://purl.obolibrary.org/obo/UBERON_'
}
In the self.prefix_map = dict(subconverter.bimap) This line is where the subconverter = self.converter.get_subconverter(prefixes_in_table) @cthoyt , would you have any idea why this would be happening? |
MappingSetDataFrame
object remains consistent.
src/sssom/parsers.py
Outdated
@@ -212,21 +212,28 @@ def parse_sssom_table( | |||
logging.info(f"Externally provided metadata {k}:{v} is added to metadata set.") | |||
sssom_metadata[k] = v | |||
meta = sssom_metadata | |||
|
|||
if "curie_map" in sssom_metadata: | |||
if CURIE_MAP in sssom_metadata: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I split the code for updating this into #431
src/sssom/parsers.py
Outdated
# This takes priority over default prefix_map in case of a tie. | ||
jsondoc_prefix_map = jsondoc["@context"] | ||
|
||
# Convert keys in both maps to lower case for comparison |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this wise? I think we should not be doing all of this lowercasing. The prefix maps should be correct as they are. Doing all of these ad-hoc operations makes sssom-py insanely hard to work with (I can tell you this from first hand experience since I have tried to understand the whole history of the package)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please wait on making progress with this until the PRs that replace a lot of the same code with curies
are merged - hopefully this encourages much more consistent and elegant handling of prefix maps and reduces the number of changes to address the issue here
tests/test_parsers.py
Outdated
"""Test converting SSSOM TSV => JSON => SSSOM TSV such that it is reproducible.""" | ||
sample_tsv = f"{test_data_dir}/sample1.sssom.tsv" | ||
json_outfile = f"{test_out_dir}/sample1.json" | ||
msdf1 = parse_sssom_table(sample_tsv) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hrshdhgd other hints: when writing tests, make the things you're testing more explicit. The way this test is written, it doesn't really test anythign specific. Why not explicitly check that ORCID
is in the places you expect to to be? Otherwise, someone reading this test learns nothing about what's actually supposed to be happening here.
tests/test_parsers.py
Outdated
write_json(msdf1, file) | ||
|
||
msdf2 = parse_sssom_json(json_outfile) | ||
self.assertIn("ORCID", msdf2.prefix_map) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some more explicit tests so you can see where things are actually going wrong, which appers to be in the SSSOM JSON parsing, and has nothing to do with cleaning the prefix map like I suggested in a previous comment
This provides an alternative to #429 that makes more explicit the chaining operations done on the metadata and prefix maps This is also a good change to carefully document the way that this is handled, since I might not have captured it accurately
@cthoyt , thanks for the insight! This branch was just to investigate the issue Damien pointed out (hence the draft). As for the lowercasing, I don't like that solution at all. I was just trying to understand the issue better so I could explain it to someone else via code (you and Nico). Also, I assume these files will change once your PRs are merged so I have no intention to continue this branch. May start afresh or merge main into this branch first IF the problem persists. |
Closes #363 (final nail in the coffin) This provides an alternative to #429 that makes more explicit the chaining operations done on the metadata and prefix maps. This is also a good change to carefully document the way that this is handled, since I might not have captured it accurately. As it is, The priority order for combining prefix maps are: 1. Internal prefix map inside the document 2. Prefix map passed through this function inside the ``meta`` 3. Prefix map passed through this function to ``prefix_map`` 4. Default prefix map (handled with ensure_converter)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am now 100% against adding any ad hoc code that works on prefixes or prefix maps into sssom-py.
If people add incorrect prefixes that aren't defined, why don't we give them appropriate warnings/errors?
Besides, this code shouldn't be necessary to address the original issue, which was something to do with incorrectly writing SSSOM files that was fixed in another PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a failing test here that roundtrips through TSV->JSON->TSV before we do anything else? I think the problem is somewhere deeper, but perhaps the solution is somewhere along the lines of https://github.com/mapping-commons/sssom-py/pull/431/files#diff-aed80c160ecfb8db6a235671a6d4ed0ae74470301705533ff4ec2a7a36dd989fR444
Charlie's test validates the issue: sssom-py/tests/test_parsers.py Lines 256 to 325 in e06a4c6
|
This test does not seem to have a JSON serialise and parse step in it though - what we need is a test that takes Damiens test case, serialises it to JSON, then parses the JSON, then ensures that pre JSON and post JSON msdf objects are the same.. |
Fixes mapping-commons/sssom#321
Given a simple sssom tsv
parse_sssom_table()
msdf1.clean_prefix_map()
to_json()
and write to fileparse_sssom_json()
msdf2.clean_prefix_map()
The prefix maps of both should be the same.