How do we ensure subject/object source is standardized? #218

cmungall · 2022-08-12T17:25:00Z

Currently the value of this is a string like UBERON

In the SSSOM 2022: Update and SEMAPV paper:

Require sources to be recorded as entity references. We now require entity references instead of strings to denote source information for the subjects and objects in the mapping (subject_source, object_source). For example, we previously permitted the string “UBERON” to be used to declare that the subject is part of the Uberon ontology (cite), but now require, ideally resolvable, entity references instead, e.g. obo:uberon to denote the Uberon ontology or wikidata:Q465 to denote dbpedia. While this practice all by itself does not guarantee that sources are documented uniformly across mapping sets (both obo:uberon and wikidata:Q7876491 refer to the Uberon ontology), it does ensure that users can at least figure out which was the intended source without having to resort to manual searches on the web.

I couldn't find an issue about this already

@cthoyt says:

I missed discussion about this completely. Maybe we can mention that the bioregistry is an ideal resource for making sure that the namespaces in these are valid references and/or normalizing them

I also missed the discussion.

The challenge with the proposed obo:uberon is that

this would require declaring a semi-redundant obo namespace that could confuse some serializers (e.g obo:UBERON_nnnn)
most OBO PURLs of this form don't resolve (although this is fixable)
it leaves open the problem of how to do this for non OBO resources

I think we have to be strict here, there are probably dozens of different ways to CURIE-ify any given resources, we need a standard.

Wikidata is one possibility but this would be a big change in terms of readability. It would also entail mapping producers do more work in lookups, rather than having a standard syntactic translation.

Bioregistry has many obvious advantages here. Would we use CURIEs of the form bioregistry:uberon

One thing we would have to do here is ensure that these resolve to the correct URIs for linked data purposes, the same way we do for the OBO class PURLs, this should be doable.

@sierra-moxon may wish to comment on infores here

The text was updated successfully, but these errors were encountered:

cthoyt · 2022-08-12T17:30:53Z

While I'd love to advocate for using the Bioregistry for all of its advantages, it has the slight semantic hiccup that its entries refer to the identifier spaces and not necessarily the resources. That being said, we're all against shadow namespaces now (right?), but for 99.9% of the use cases, stuff like bioregistry:uberon will mean what you expect it to mean.

FAIRSharing and SciCrunch Registry also have semantic spaces describing the artifacts themselves, but neither of them are very open or very FAIR, so I wouldn't be super happy to committing to using them for anything in practice. Wikidata is also a great alternative that's fully open, but with the slight disadvantage of having opaque identifiers.

matentzn · 2022-08-12T17:49:54Z

Just FYI: here is the debate we had: #126

Here is the PR that you signed off on: #177 :)

I mean the mere fact that we use entity reference is not under debate here I hope? Just that we should somehow come to a unified proposal on what to use?

Also, what does this mean for this paper, is it ok to submit it anyways, and say that there are still debates on the merits of finding a way to standardise which exact curie scheme to use?

matentzn · 2022-08-12T17:50:33Z

(BTW @cmungall for issues like this I am recommending GitHub discussion due to its ability to comment directly on comments)

cmungall added the question Further information is requested label Aug 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do we ensure subject/object source is standardized? #218

How do we ensure subject/object source is standardized? #218

cmungall commented Aug 12, 2022

cthoyt commented Aug 12, 2022 •

edited

Loading

matentzn commented Aug 12, 2022

matentzn commented Aug 12, 2022

How do we ensure subject/object source is standardized? #218

How do we ensure subject/object source is standardized? #218

Comments

cmungall commented Aug 12, 2022

cthoyt commented Aug 12, 2022 • edited Loading

matentzn commented Aug 12, 2022

matentzn commented Aug 12, 2022

cthoyt commented Aug 12, 2022 •

edited

Loading