Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do we ensure subject/object source is standardized? #218

Open
cmungall opened this issue Aug 12, 2022 · 3 comments
Open

How do we ensure subject/object source is standardized? #218

cmungall opened this issue Aug 12, 2022 · 3 comments
Labels
question Further information is requested

Comments

@cmungall
Copy link
Contributor

Currently the value of this is a string like UBERON

In the SSSOM 2022: Update and SEMAPV paper:

Require sources to be recorded as entity references. We now require entity references instead of strings to denote source information for the subjects and objects in the mapping (subject_source, object_source). For example, we previously permitted the string “UBERON” to be used to declare that the subject is part of the Uberon ontology (cite), but now require, ideally resolvable, entity references instead, e.g. obo:uberon to denote the Uberon ontology or wikidata:Q465 to denote dbpedia. While this practice all by itself does not guarantee that sources are documented uniformly across mapping sets (both obo:uberon and wikidata:Q7876491 refer to the Uberon ontology), it does ensure that users can at least figure out which was the intended source without having to resort to manual searches on the web.

I couldn't find an issue about this already

@cthoyt says:

I missed discussion about this completely. Maybe we can mention that the bioregistry is an ideal resource for making sure that the namespaces in these are valid references and/or normalizing them

I also missed the discussion.

The challenge with the proposed obo:uberon is that

  1. this would require declaring a semi-redundant obo namespace that could confuse some serializers (e.g obo:UBERON_nnnn)
  2. most OBO PURLs of this form don't resolve (although this is fixable)
  3. it leaves open the problem of how to do this for non OBO resources

I think we have to be strict here, there are probably dozens of different ways to CURIE-ify any given resources, we need a standard.

Wikidata is one possibility but this would be a big change in terms of readability. It would also entail mapping producers do more work in lookups, rather than having a standard syntactic translation.

Bioregistry has many obvious advantages here. Would we use CURIEs of the form bioregistry:uberon

One thing we would have to do here is ensure that these resolve to the correct URIs for linked data purposes, the same way we do for the OBO class PURLs, this should be doable.

@sierra-moxon may wish to comment on infores here

@cthoyt
Copy link
Member

cthoyt commented Aug 12, 2022

While I'd love to advocate for using the Bioregistry for all of its advantages, it has the slight semantic hiccup that its entries refer to the identifier spaces and not necessarily the resources. That being said, we're all against shadow namespaces now (right?), but for 99.9% of the use cases, stuff like bioregistry:uberon will mean what you expect it to mean.

FAIRSharing and SciCrunch Registry also have semantic spaces describing the artifacts themselves, but neither of them are very open or very FAIR, so I wouldn't be super happy to committing to using them for anything in practice. Wikidata is also a great alternative that's fully open, but with the slight disadvantage of having opaque identifiers.

@cmungall cmungall added the question Further information is requested label Aug 12, 2022
@matentzn
Copy link
Collaborator

Just FYI: here is the debate we had: #126

Here is the PR that you signed off on: #177 :)

I mean the mere fact that we use entity reference is not under debate here I hope? Just that we should somehow come to a unified proposal on what to use?

Also, what does this mean for this paper, is it ok to submit it anyways, and say that there are still debates on the merits of finding a way to standardise which exact curie scheme to use?

@matentzn
Copy link
Collaborator

(BTW @cmungall for issues like this I am recommending GitHub discussion due to its ability to comment directly on comments)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants