Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic build failed - Should we remove NCIT? #105

Closed
edeutsch opened this issue Feb 25, 2022 · 8 comments
Closed

Automatic build failed - Should we remove NCIT? #105

edeutsch opened this issue Feb 25, 2022 · 8 comments

Comments

@edeutsch
Copy link
Contributor

Describe the question or discussion

It looks like the automatic generation of OWL failed with an out of memory error in the latest build attempt after today's merge.

I wonder if we should remove the import of NCIT. This is such a huge beast and I am doubtful that it is gaining us anything.

@mobiusklein
Copy link
Contributor

Will that break mzQC? Is there a restricted subset of NCIT that mzQC depends upon that has its own name?

I can try giving the JVM more memory, but the action worker only has 7 GB of RAM and I was at 4 GB previously.

@bittremieux
Copy link
Contributor

We use very few terms from NCIT: currently Duration and Density, in other mzQC terms that are not merged yet here also Outlier and Cover(age). These are value concepts that are not super essential, so if it causes issues the NCIT import could relatively safely be avoided I think.

I tried to find alternative terms in other ontologies we import (UO, STATO, PATO):

@mwalzer
Copy link
Contributor

mwalzer commented Mar 15, 2022

Are there alternative converters that are less memory hungry?
But yeah, NCIT is a big beast. But as a result of this size, there are a lot of terms for rich semantics in mzQC (don't know about others).
In the meantime, we could try and get appropriate terms into STATO and PATO. As Wout mentioned, sometimes the definitions are not ideal.

@mwalzer
Copy link
Contributor

mwalzer commented Mar 15, 2022

Pronto can nowadays dump OWL (according to this althonos/pronto#149 (comment)). But I'm having difficulties loading any OBO after I updated...

@mobiusklein
Copy link
Contributor

I think we can work around the issue by re-declaring the NCIT terms within our CV unchanged (e.g. using NCIT accessions, but in our file) and that should allow us to use the terms without losing the ability to connect them to other CVs. Right now, the import directive is essentially copying everything from NCIT into the MS namespace, so reproducing just the four terms used will suffice.

Source:
https://stackoverflow.com/questions/47695745/referring-to-a-concept-in-a-not-imported-ontology

@mwalzer
Copy link
Contributor

mwalzer commented Mar 15, 2022

Oh wow, didn't know that. Ontologies are a particularly deep well. 🕳️

@edeutsch
Copy link
Contributor Author

I vote for removing the NCIT import and explicitly redeclaring at the top of psi-ms.obo the 4 NCIT terms that we want to refer to, and see how it goes.

bittremieux added a commit that referenced this issue Mar 16, 2022
As per #105, remove the full NCIT import and just re-declare the 4 NCIT terms that are used in mzQC.

I've used abridged terms from NCIT now that contain the relevant information, without some of the unnecessary `property_value` specifications. I'm not sure though whether the `is_a` specifications might be problematic now, because they refer to NCIT parent terms that are not included in our CV? It shouldn't be our goal to redefine the NCIT tree up to those terms, so if this is indeed problematic, what would be the solution? Have the terms consist of just `id`-`name`-`def`?
@bittremieux
Copy link
Contributor

Great suggestion @mobiusklein. I've started a PR to do this, but I still have a small question: #109

mobiusklein pushed a commit that referenced this issue Mar 24, 2022
* Add NCIT terms for mzQC

As per #105, remove the full NCIT import and just re-declare the 4 NCIT terms that are used in mzQC.

I've used abridged terms from NCIT now that contain the relevant information, without some of the unnecessary `property_value` specifications. I'm not sure though whether the `is_a` specifications might be problematic now, because they refer to NCIT parent terms that are not included in our CV? It shouldn't be our goal to redefine the NCIT tree up to those terms, so if this is indeed problematic, what would be the solution? Have the terms consist of just `id`-`name`-`def`?

* Update OBO version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants