Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gene association representation in omim.ttl #156

Open
twhetzel opened this issue Oct 24, 2024 · 7 comments · May be fixed by #158
Open

Gene association representation in omim.ttl #156

twhetzel opened this issue Oct 24, 2024 · 7 comments · May be fixed by #158
Assignees
Labels
bug Something isn't working

Comments

@twhetzel
Copy link
Contributor

twhetzel commented Oct 24, 2024

I think this is a bug, if not please explain the design decision. For some OMIM disease entries, e.g. https://omim.org/entry/613659, in the omim.ttl file there is only 1 disease to gene association ('has material basis in germline mutation in' IL1B). However, on the OMIM entry page there are 11. I do see all associations in the various data files that are downloaded when creating the omim.ttl file using python3 -m omim2obo.

This issue is not limited to OMIM disease/phenotype entries that contain an INCLUDED entry since this also happens with https://omim.org/entry/605074 and the omim.ttl file only contains 1 disease to gene association ('has material basis in germline mutation in' PRCC).

I believe this is part of the issues that were reported in the PR for the OMIM g2d pipeline in Mondo, specifically point (3) Genes should not be added if the OMIM record is associated with multiple genes.

UPDATE:
For https://omim.org/entry/613659, I looked in the ttl file directly vs. from Protege and do see 11 entries in this format where the values like _:Nbcf6a815046747ee9fe5bc8f3891b1c5 look to point back to a gene:

_:Nbcf6a815046747ee9fe5bc8f3891b1c5 a owl:Restriction ;
    owl:onProperty RO:0004013 ; 
    owl:someValuesFrom OMIM:613659 .

In 10 of the 11 entries, it has owl:onProperty RO:0003302 ;. None have RO:0004003 as displayed in Protege.

UPDATE 2: I do see that the OMIM gene entry with RO:0004013 is for IL1B and there is some code that flips RO:0004013 to RO:0004003 so those further transformations are more clear.

LATEST QUESTION --> However, it's not clear why only this one gene has RO:0004013 to start with and the others listed for 'gastric cancer' have a different RO property.


Also, did anyone have a chance to document the earlier design decisions? See #75 (comment)

Resources

@twhetzel twhetzel added the bug Something isn't working label Oct 24, 2024
@matentzn
Copy link
Member

@joeflack4 let me know if I can help with anything - since you updated that code recently its probably best you take care of this

@joeflack4
Copy link
Contributor

I know that if there’s more than 1 association, we don’t call it causal. But I don’t know why we would not list all associations otherwise. Will look into it.

@joeflack4
Copy link
Contributor

joeflack4 commented Oct 24, 2024

@sabrinatoro @matentzn Just want to confirm how this is supposed to work (Trish edit: for modeling OMIM in the first file created to model the OMIM content which is omim.ttl)

Gastric Cancer (OMIM:613659) has 11 Phenotype-Gene Relationships.

In this case, we should declare the following property on all 11:

But neither of the following properties should be used at all:

@sabrinatoro
Copy link

@joeflack4 We are talking about Mondo, right? (ie NOT the Monarch KG. --- I need to mention this in case I am confusing myself).
In the case of MONDO: because Mondo is an ontology and all axioms have to be correct 100% of the time, the only gene annotations that we bring in are the one when the genes are part of defining the disease.
The only gene-related properties we allow in Mondo, coming from OMIM is: "has material basis in germline mutation in".

Therefore, we allow only 1 gene per disease (because we know that in OMIM, the disease is defined based on variation in that gene). If a disease is associated with more than one gene, then the genes are not defining the disease, and therefore we do not bring this gene annotation into Mondo.

We documented in multiple places, I don't have time to look for the links, sorry.

Note: The 11 Phenotype-Gene Relationships for Gastric Cancer (OMIM:613659) would get into the Monarch KG, but NOT into Mondo

@twhetzel
Copy link
Contributor Author

twhetzel commented Oct 24, 2024

@sabrinatoro Joe's question is related to how OMIM should initially be modeled as an ontology, e.g. omim.ttl, as the content exists in OMIM itself. What we do with it from there, ie processing of omim.ttl to bring into Mondo, involves further steps that are out of scope for this question currently.

The way this initial modeling of omim looks like in the omim.ttl file is that even entries like https://omim.org/entry/613659 for 'gastric cancer' has only 1 gene association viewable in Protege (the other 10 are viewable in the ttl file when viewing using a text editor), while OMIM itself has 11 associations. Here is a screenshot of 'gastric cancer' in the omim.ttl file. While what is viewable in Protege vs. the ttl file itself is not that important, it's not clear why only 1 of 11 the genes listed in OMIM for the 'gastric cancer' entry has the association RO:0004013 which is then later converted to RO:0004003. Is there a flag in the OMIM entry/files that are used to create this association that determines that IL1B is the causal gene out of the other 10 genes that are listed or is this representation in the omim.ttl file incorrect?

My concern is that if the initial modeling of OMIM content in the omim.ttl file is not correct, the further transformations that occur to get this content into Mondo will also not be correct since the starting content is incorrect. This is related to your (Sabrina's comments) about issues with the omim pipeline/gene2disease pipeline for Mondo.

Screenshot 2024-10-24 at 3 23 34 PM

@twhetzel
Copy link
Contributor Author

FYI - there is now a thread in Slack in mondo-ingest about this too.

@twhetzel twhetzel changed the title Missing gene associations in omim.ttl file Gene association representation in omim.ttl file Oct 25, 2024
@twhetzel
Copy link
Contributor Author

Joe and I reviewed this further and my suspicion is that there is an issue in how associations are counted, therefore leading to incorrect application of the RO property in the omim.ttl file. More to come soon.

@joeflack4 joeflack4 linked a pull request Oct 27, 2024 that will close this issue
@joeflack4 joeflack4 changed the title Gene association representation in omim.ttl file Gene association representation in omim.ttl Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants