-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing Reactome pathway xrefs #27081
Comments
A problem a priori is that our pathway boundaries do not reliably align with GO process boundaries, so making exact mappings isn't always possible and I expect it would always require expert human review - not safe to do automatically. (Sorting out glycolysis and Wnt signaling took substantial work, for example.) But making those mappings is valuable so figuring out how to salvage as many as possible is a useful topic for an internal pathways2GO "weeds" discussion.
|
Understood. |
That sounds really useful as a way to flag any discrepancies automatically and fairly reliably, so curators can home in on those. |
FYI, I only looked at the ~5 pathways Rossana has done GO-CAMs for so far.
|
Also, I just noticed that the Reactome pathway pages already assert the 3 relationships I suggested in my first comment (see the 'Represents GO Biological Process' section on the respective Reactome pages). So a Reactome curator has already reviewed and made these 3 connections (at least) - they are just not included in the GO file. |
Haven't checked this case, but this kind of loss is a known limitation of the GAF-based export of Reactome GO annotations, that should be corrected by exporting from Reactome-derived GO-CAMs rather than from the original Reactome annotations directly. |
For completeness (wrt carbohydrate metabolism-related pathways): I see the expected xrefs between Reactome pathway IDs and these two GO terms in the ontology: But I don't see these expected xrefs:
*R-HSA-446205 xref is currently on "GO:0061729 GDP-mannose biosynthetic process from fructose-6-phosphate", though Reactome website associates R-HSA-446205 with the parental term "GDP-mannose biosynthetic process (GO:0009298) I could manually add all those Reactome xrefs to the GO terms. |
Please dont, these are not managed by GO, as @deustp01 pointed out. The problem needs to be fixed at the Reactome end (or else we'll loose the 'source of truth' for these mappings). Thanks, Pascale |
Thanks for clarifying that - I suspected as much, but wasn't sure. It's odd that GO has some of these mappings (like for 'Glycolysis' and 'Pentose phosphate pathway') but not for others... |
It looks like manual editing of Reactome pathways to add the missing GO terms (or to come up with a rationale for adding a different term or no term, as Reactome pathway and GO process boundaries are currently defined) is needed. One important limitation here, hard-wired into our data model, is that a Reactome pathway instance can have only zero or one associated BP terms. Of course, if a pathway has multiple child pathways, each child can have a different BP, so this limitation may not be fatal. The timing on this is good - we have a couple of weeks before the data freeze for our next (end-of-March 2024) release, so it should be possible to knock off the straightforward ones by then. (I turned the list into a task list for tracking.) |
Results of previous bottom-up cleanup attempts, e.g., the work to distinguish kinds of glycolysis consistently in GO and Reactome. |
Just to be clear - all of the Reactome-GO relationships shown above mirror those already shown on Reactome pages, so these associations already exist at the Reactome end - seems they are just not being exported to the GO. |
I'll check to confirm this. In that case, we're stuck on limitations of our current GAF export process, which include generating no exportable line when an MF or BP is attributed to a heteromeric complex and no single gene product in the complex is identified as the active / enabling unit. This too should be fixable in the future. |
Checking is done. Found one case where SJM found a more specific BP that we had used and changed Reactome accordingly, one case where we had omitted a BP term and added it, one collision with our only-one-BP-per-pathway constraint, and one case where I'm confused, as noted on the task list above. |
So R-HSA-446205 is one of the 'rare' cases that currently appear as an xref in the GO. The GO obo file currently has this:
But the Reactome page (https://reactome.org/PathwayBrowser/#/R-HSA-446205) says: So there's a slight mismatch between the xref given in the current GO file and at Reactome. Which GO term is (more) correct for this Reactome pathway? |
Got it. The narrow description, to start the pathway specifically with fructose-6-phosphate, matches what we have actually annotated. We generally (but definitely not completely consistently) leave these kinds of qualifiers off Reactome pathway names unless they are needed, e.g., to differentiate two closely similar pathways that we want to keep distinct for the human user. This happens a lot in signaling, not so much in metabolism. If there are reasons to conform pathway / process names better, that's a fair issue for discussion. |
The main issue here is that the reactome-provided mappings are MF only: https://reactome.org/download/current/Reactions2GoTerms_human.txt These are all MF What is confusing is this gets merged with a tiny handful of mappings that GO curates. GO should not curate Reactome mappings. This is already done by Reactome. E.g https://reactome.org/content/detail/R-HSA-70263 |
We have decided on the ontology call that
|
Joel and I at Reactome will look into the best way to provide this information and will get back to you. |
@jweiser and I have found the Cypher query used to generate the Reaction2Go_human file you are using. We have modified the query so we get the BiologicalProcess associations as well. Here is the query: MATCH (rle:ReactionLikeEvent)-[:catalystActivity]->(:CatalystActivity)-[:activity]->(go:GO_MolecularFunction) I am attaching the file with the results. Let us know if this is what you need. If it is we will look into incorporating it into our release process. |
Thanks @adamjohnwright Just for checking purposes, could you provide add a file here that only contains the BP mappings (i.e. MF mapping removed)? |
Thanks @adamjohnwright ! I did a first-pass analysis of the BP mappings here: https://docs.google.com/spreadsheets/d/1HSeCrshAhXeyVz1djJ8SL5KP9WXD6eNxc1azYRPPfV0/edit#gid=1698582253
Observations:
I don't think Reactome should make mappings between individual reactions and GO-BP terms - the BP mapping should only be made to the 'parent' pathway ID. If all those reaction-level mappings could be reviewed and removed at the Reactome side, I think this would remove many/most of the many:1 mappings and produce a more accurate reactome-GOBP mapping file. Thoughts? |
I agree with @sjm41 and would go further e.g for sumolyation
if we look at the one Ihighligted: https://reactome.org/content/detail/R-HSA-4615910 We cab see this rolls up to a parent https://reactome.org/content/detail/R-HSA-2990846: |
Sorry, but no way! We attach BP terms to individual reactions to enable us to simultaneously associate a reaction with a pathway / process that makes biological sense to our experts, e.g., to make regulation of glycolysis mediated by fubctose-1,6-bisphosphate metabolism a part of the process of glycolysis, and also to conform our annotations to the current GO rules on pathway boundaries, which do not necessarily make biological sense. I really do not want to discard large numbers of good annotations that accurate reflect mammalian biology because of GO pathway boundary restrictions. This is essentially an issue of how to represent multi-parentage (a reaction / molecular function can be part_of more than one pathway / process, and indeed in those different contexts may accomplish different things. This is our starting position - there is more to discuss! @ukemi |
This is the issue I was trying to get at the other day (unclearly). I think to do what @sjm41 wants to do, we need to think about two separate pieces of information.
|
@sjm41 hopefully this is the final time. If it is not right please don't hesitate to mention that you have found another exception. Here is the latest query: What I am doing with this query is returning all pathways that do not have a parent pathway with the same go_biological_process term. In this way when I look at the example you gave me all the pathways are related to the same biological process and all of them are descendants of "Signaling by FGFR". When I look in the file I now only see Signaling by FGFR. Hopefully, this is what you are looking for. go-bp-to-reactome-pathways-hierarchy-aware-reactome-v89-try-two.csv |
Thanks @adamjohnwright ! I think we've done it! An analysis of the latest file is here: Summary:
From a few spot-checks, it looks like these remaining many:1 mappings are 'correct' within the critieria we set up - that is, the same BP term is annotated to independent Reactome pathways where the parent pathway has a different BP annotation. So, if we integrated the current mapping file into the GO, there would only be ~100 GO-BP terms mapped to multiple Reactome pathway IDs, with only a handful of extreme cases having 7-10 Reactome mappings at most. I suggest you now start including these BP mappings in the reactome_xrefs file you regularly submit to the GO. |
I suggest you now start including these BP mappings in the reactome_xrefs
file you regularly submit to the GO.
Sound OK to you @cmungall <https://github.com/cmungall> ?
Sounds good!
… Message ID: ***@***.***>
|
Hi @adamjohnwright , @deustp01 Are you now including the new BP mappings in the reactome_xrefs file you regularly submit to the GO? |
@sjm41 I want to clarify. I can generate this mapping file and put it in our download directory for each release. This would make it available for download via URL. Does this meet your requirements? Is it what you have in mind? |
I'm not familiar with how the GO developers currently pull in xref files. So if the additional GO-BP to Reactome (pathway) xrefs could either (1) be included in that same file, or (2) be put in the same place as that other file, then I'd guess that would work for us. I think @cmungall is on vacation this week, but @balhoff may be around and be able to advise us? |
@adamjohnwright @sjm41 yes I think it would be simplest if the additional xrefs were added to Reactions2GoTerms_human.txt, but if it doesn't make sense to combine them, a file in the same directory with the same format would work well also. |
@guanmingwu I am wondering what approach you prefer. My preference is to have it as a separate file called Pathways2GoTerms_human.txt. Thats because the terms I am generating are mapped to pathways. I could see us listing this file on the downloads page. If you agree I think we should bring this up to the broader group on tomorrows call. |
@adamjohnwright Agree: a separate file, Pathway2GoTerms_human.txt, will make it easier for us to manage at the Reactome side. |
@balhoff and @sjm41 I am sure I will be able to make it for this release and going forward. I will work with others at Reactome to list it on the downloads page. @sjm41 I will work with you closer to the release to make sure the file is what you are expecting before the release goes out. In order to not forget I have added it to the Reactome Release SOP. |
Here is an e-mail message from @pgaudet that is at least closely related to this ticket, so I am adding it here as a related item to check the next time we do a release. (If this deserves a new ticket, I can open one.)
|
Just checking on the status of this - just want to make sure it doesn't fall through the cracks. It looks like we agreed that name overloading would be odd, so we'll have a separate file for the pathways/processes |
@cmungall yes the plan is to have a file called Pathway2GoTerms_human.txt, available for download. We have already added it to our release pipelines. We just started running the release for v90 in which it will be available to the public in about a month. We are also reorganising the Reactome Downloads page in which one requirement that I requested was for these files to be listed. |
Hi @adamjohnwright and @balhoff |
@sjm41 good question and thanks for reaching out. It was generated. Here is a link: HTTP://download.reactome.org/90/Pathways2GoTerms_human.txt. This link pulls it through CloudFront to our S3 bucket. If you don't want to specify the version (e.g. 90) in the URL then you could use the same format as the other downloads on the download page: https://reactome.org/download/current/Pathways2GoTerms_human.txt. Every time we release you should be able to access it in this manner. |
Great, thanks @adamjohnwright ! |
@pgaudet This may help with the propagation of Reactome BP annotations to a place where they are visible for queries about term usage in connection with obsoletion proposals, as we w ere discussing yesterday. |
Hi @balhoff Can you say when you might be able to action this? Thanks! |
@adamjohnwright I'm sorry about the delayed feedback here. I'm wondering if you can align the formats of these two files. The previous reactions file looks like this:
The new pathways file looks like this:
The headers are different, and there is an additional column (GO term name) in the new file. Also the GO terms are now missing the |
Hi @balhoff, Thanks for pointing out these improvements. I just worked on this with @jweiser, and we have a pull request created with the necessary change: reactome/data-export#10. @jweiser will pull it in before we run this step in the release we are just starting. The file will be formatted with your recommended changes for Reactome v91. He will also send you a v90 version of the file he will create when testing. |
Thanks @adamjohnwright! |
Hi @balhoff - Here is the Pathways2GoTerms file for v90 of Reactome: Please let us know if you find any issues that need to be addressed! |
@jweiser @adamjohnwright that file works great, thank you. |
These look great!
…On Wed, Nov 6, 2024 at 11:56 AM Joel Weiser ***@***.***> wrote:
Hi @balhoff <https://github.com/balhoff> - Here is the Pathways2GoTerms
file for v90 of Reactome:
Pathways2GoTerms_human.txt
<https://github.com/user-attachments/files/17652333/Pathways2GoTerms_human.txt>
Please let us know if you find any issues that need to be addressed!
—
Reply to this email directly, view it on GitHub
<#27081 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAMMOMLD6RKMDG5NHHZQ73Z7JX6RAVCNFSM6AAAAABDMK4TFWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRQGY2TGNBQHE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@sjm41 here is a diff of the rendered xrefs with the new pathways integrated: e966c09#diff-60618cccd4e0920e6894cc1de1ab93b75acd08f6ac5bab190b479ab225816ee3 RDF/XML is not a great format to read, so let me know if there is something you would like to see in a different way. |
Sorry @balhoff, I'd completely missed that you'd added a link to this diff for me to check! But you're right that I can't really work with the RDF/XML output. Can you send me the diff output as a simple text/TSV file listing the GO ID and the newly associated Reactome ID? If those IDs could be accompanied by their term/pathway names, that would be even better! |
Hi Peter
Looks like these Reactome pathway xrefs are missing:
gluconeogenesis (GO:0006094) = R-HSA-70263
fructose biosynthetic process (GO:0046370) = R-HSA-5652227
galactose catabolic process via UDP-galactose (GO:0033499) = R-HSA-70370
If you agree, how do we add them? Is there a file you provide, or do GO editors add them manually?
The text was updated successfully, but these errors were encountered: