Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing Reactome pathway xrefs #27081

Open
sjm41 opened this issue Feb 16, 2024 · 73 comments · May be fixed by #29174
Open

Missing Reactome pathway xrefs #27081

sjm41 opened this issue Feb 16, 2024 · 73 comments · May be fixed by #29174

Comments

@sjm41
Copy link
Contributor

sjm41 commented Feb 16, 2024

Hi Peter

Looks like these Reactome pathway xrefs are missing:
gluconeogenesis (GO:0006094) = R-HSA-70263
fructose biosynthetic process (GO:0046370) = R-HSA-5652227
galactose catabolic process via UDP-galactose (GO:0033499) = R-HSA-70370

If you agree, how do we add them? Is there a file you provide, or do GO editors add them manually?

id: GO:0006094
name: gluconeogenesis
namespace: biological_process
def: "The formation of glucose from noncarbohydrate precursors, such as pyruvate, amino acids and glycerol." [MetaCyc:GLUCONEO-PWY]
synonym: "glucose biosynthesis" EXACT []
synonym: "glucose biosynthetic process" EXACT []
xref: MetaCyc:GLUCONEO-PWY
xref: Wikipedia:Gluconeogenesis
intersection_of: GO:0009058 ! biosynthetic process
intersection_of: has_primary_output CHEBI:17234 ! glucose

id: GO:0046370
name: fructose biosynthetic process
namespace: biological_process
def: "The chemical reactions and pathways resulting in the formation of fructose, the ketohexose arabino-2-hexulose." [GOC:ai]
synonym: "fructose anabolism" EXACT []
synonym: "fructose biosynthesis" EXACT []
synonym: "fructose formation" EXACT []
synonym: "fructose synthesis" EXACT []
intersection_of: GO:0009058 ! biosynthetic process
intersection_of: has_primary_output CHEBI:28757 ! fructose

id: GO:0033499
name: galactose catabolic process via UDP-galactose
namespace: biological_process
def: "The chemical reactions and pathways resulting in the breakdown of galactose, via the intermediate UDP-galactose." [GOC:mah, MetaCyc:PWY-3821]
synonym: "galactose breakdown via UDP-galactose" EXACT []
synonym: "galactose catabolism via UDP-galactose" EXACT []
synonym: "galactose degradation via UDP-galactose" EXACT []
synonym: "Leloir Pathway" RELATED [PMID:14741191]
xref: MetaCyc:PWY-3821
intersection_of: GO:0009056 ! catabolic process
intersection_of: has_intermediate CHEBI:58439 ! UDP-D-galactose(2-)
intersection_of: has_primary_input CHEBI:28260 ! galactose
@deustp01
Copy link

deustp01 commented Feb 16, 2024

How do we add them? Is there a file you provide, or do GO editors add them manually?

A problem a priori is that our pathway boundaries do not reliably align with GO process boundaries, so making exact mappings isn't always possible and I expect it would always require expert human review - not safe to do automatically. (Sorting out glycolysis and Wnt signaling took substantial work, for example.)

But making those mappings is valuable so figuring out how to salvage as many as possible is a useful topic for an internal pathways2GO "weeds" discussion.

  • How do we identify clean pathway : process mappings?
  • When there isn't one, can we use assignment of BP terms at the reaction level to pick out the subset of Reactome pathway parts that map to the most nearly equivalent GO process?

@ukemi @vanaukenk @huaiyumi

@sjm41
Copy link
Contributor Author

sjm41 commented Feb 17, 2024

Understood.
The three mappings suggested above look fairly 'safe' to me, but yes, we should discuss general policy/criteria.
(My use case here was using the mappings in the GO file to find and equate corresponding pathways in Reactome and MetaCyc)

@deustp01
Copy link

(My use case here was using the mappings in the GO file to find and equate corresponding pathways in Reactome and MetaCyc)

That sounds really useful as a way to flag any discrepancies automatically and fairly reliably, so curators can home in on those.

@sjm41
Copy link
Contributor Author

sjm41 commented Feb 17, 2024

FYI, I only looked at the ~5 pathways Rossana has done GO-CAMs for so far.
And canonical glycolysis (GO:0061621) already has the expected Reactome (and MetaCyc) xref:

id: GO:0061621
name: canonical glycolysis
namespace: biological_process
def: "The glycolytic process that begins with the conversion of glucose to glucose-6-phosphate by glucokinase activity. Glycolytic processes are the chemical reactions and pathways resulting in the breakdown of a carbohydrate into pyruvate, with the concomitant production of a small amount of ATP." [GOC:dph, ISBN:0201090910, ISBN:0879010479]
xref: MetaCyc:ANAGLYCOLYSIS-PWY
xref: Reactome:R-HSA-70171 "Glycolysis, Homo sapiens"
xref: Wikipedia:Glycolysis

@sjm41
Copy link
Contributor Author

sjm41 commented Feb 19, 2024

Also, I just noticed that the Reactome pathway pages already assert the 3 relationships I suggested in my first comment (see the 'Represents GO Biological Process' section on the respective Reactome pages). So a Reactome curator has already reviewed and made these 3 connections (at least) - they are just not included in the GO file.

@deustp01
Copy link

they are just not included in the GO file.

Haven't checked this case, but this kind of loss is a known limitation of the GAF-based export of Reactome GO annotations, that should be corrected by exporting from Reactome-derived GO-CAMs rather than from the original Reactome annotations directly.

@sjm41 sjm41 changed the title Missing Reactome pathway xrefs Missing Reactome pathway xrefs (carbohydrate metabolism) Feb 21, 2024
@sjm41
Copy link
Contributor Author

sjm41 commented Feb 21, 2024

For completeness (wrt carbohydrate metabolism-related pathways):

I see the expected xrefs between Reactome pathway IDs and these two GO terms in the ontology:
Glycolysis (R-HSA-70171) | canonical glycolysis (GO:0061621)
Pentose phosphate pathway (R-HSA-71336) | pentose-phosphate shunt (GO:0006098)

But I don't see these expected xrefs:

  • Gluconeogenesis (R-HSA-70263) | gluconeogenesis (GO:0006094)
  • Fructose biosynthesis (R-HSA-5652227) | fructose biosynthetic process (GO:0046370)
  • Galactose catabolism (R-HSA-70370) | galactose catabolic process via UDP-galactose (GO:0033499) - replaced broader BP with this more granular one
  • Glycogen synthesis (R-HSA-3322077) | glycogen biosynthetic process (GO:0005978)
  • Glycogen breakdown (glycogenolysis) (R-HSA-70221) | glycogen catabolic process (GO:0005980)
  • Fructose catabolism (R-HSA-70350) | fructose catabolic process to hydroxyacetone phosphate and glyceraldehyde-3-phosphate (GO:0061624)
  • Lactose synthesis (R-HSA-5653890) | lactose biosynthetic process (GO:0005989)
  • Hyaluronan uptake and degradation (R-HSA-2160916) | hyaluronan catabolic process (GO:0030214)
  • Keratan sulfate biosynthesis (R-HSA-2022854) | keratan sulfate biosynthetic process (GO:0018146)
  • Keratan sulfate degradation (R-HSA-2022857) | keratan sulfate catabolic process (GO:0042340)
  • HS-GAG biosynthesis (R-HSA-2022928) | heparan sulfate proteoglycan biosynthetic process (GO:0015012)
  • HS-GAG degradation (R-HSA-2024096) | heparan sulfate proteoglycan catabolic process (GO:0030200)
  • Chondroitin sulfate biosynthesis (R-HSA-2022870) | chondroitin sulfate biosynthetic process (GO:0030206)
  • Dermatan sulfate biosynthesis (R-HSA-2022923) | dermatan sulfate biosynthetic process (GO:0030208)
  • CS/DS degradation (R-HSA-2024101) | chondroitin sulfate catabolic process (GO:0030207) - This instance and the next one show our only-one-BP-per-pathway rule. Given that limitation, we used a parent BP term that covers CS and DS
  • CS/DS degradation (R-HSA-2024101) | dermatan sulfate catabolic process (GO:0030209)
  • Formation of xylulose-5-phosphate (R-HSA-5661270) | glucuronate catabolic process to xylulose 5-phosphate (GO:0019640)
  • O-linked glycosylation of mucins (R-HSA-913709) | O-glycan processing (GO:0016266)
  • GDP-fucose biosynthesis (R-HSA-6787639) | GDP-L-fucose biosynthetic process (GO:0042350) - Reactome pathway had no BP; added the suggested term.
  • Synthesis of GDP-mannose (R-HSA-446205) | GDP-mannose biosynthetic process (GO:0009298)*
  • Synthesis of UDP-N-acetyl-glucosamine (R-HSA-446210) | UDP-N-acetylglucosamine biosynthetic process (GO:0006048)

*R-HSA-446205 xref is currently on "GO:0061729 GDP-mannose biosynthetic process from fructose-6-phosphate", though Reactome website associates R-HSA-446205 with the parental term "GDP-mannose biosynthetic process (GO:0009298)
@sjm41 I'm confused. Are you asking for a different BP term for R-HSA-446205?

I could manually add all those Reactome xrefs to the GO terms.
But I think @deustp01 is saying these Reactome xrefs should come in via their GAF-based export (though this isn't working?), or in future via Reactome-derived GO-CAMs.
What do you think @pgaudet ?

@pgaudet
Copy link
Contributor

pgaudet commented Feb 21, 2024

I could manually add all those Reactome xrefs to the GO terms.

Please dont, these are not managed by GO, as @deustp01 pointed out. The problem needs to be fixed at the Reactome end (or else we'll loose the 'source of truth' for these mappings).
@deustp01 how do we suggest we proceed?

Thanks, Pascale

@sjm41
Copy link
Contributor Author

sjm41 commented Feb 21, 2024

these are not managed by GO, as @deustp01 pointed out. The problem needs to be fixed at the Reactome end (or else we'll loose the 'source of truth' for these mappings)

Thanks for clarifying that - I suspected as much, but wasn't sure.

It's odd that GO has some of these mappings (like for 'Glycolysis' and 'Pentose phosphate pathway') but not for others...

@deustp01
Copy link

@deustp01 how do we suggest we proceed?

It looks like manual editing of Reactome pathways to add the missing GO terms (or to come up with a rationale for adding a different term or no term, as Reactome pathway and GO process boundaries are currently defined) is needed. One important limitation here, hard-wired into our data model, is that a Reactome pathway instance can have only zero or one associated BP terms. Of course, if a pathway has multiple child pathways, each child can have a different BP, so this limitation may not be fatal.

The timing on this is good - we have a couple of weeks before the data freeze for our next (end-of-March 2024) release, so it should be possible to knock off the straightforward ones by then. (I turned the list into a task list for tracking.)

@deustp01
Copy link

It's odd that GO has some of these mappings

Results of previous bottom-up cleanup attempts, e.g., the work to distinguish kinds of glycolysis consistently in GO and Reactome.

@sjm41
Copy link
Contributor Author

sjm41 commented Feb 21, 2024

Just to be clear - all of the Reactome-GO relationships shown above mirror those already shown on Reactome pages, so these associations already exist at the Reactome end - seems they are just not being exported to the GO.

@deustp01
Copy link

seems they are just not being exported to the GO

I'll check to confirm this. In that case, we're stuck on limitations of our current GAF export process, which include generating no exportable line when an MF or BP is attributed to a heteromeric complex and no single gene product in the complex is identified as the active / enabling unit. This too should be fixable in the future.

@deustp01
Copy link

I'll check to confirm this.

Checking is done. Found one case where SJM found a more specific BP that we had used and changed Reactome accordingly, one case where we had omitted a BP term and added it, one collision with our only-one-BP-per-pathway constraint, and one case where I'm confused, as noted on the task list above.

@sjm41
Copy link
Contributor Author

sjm41 commented Feb 21, 2024

*R-HSA-446205 xref is currently on "GO:0061729 GDP-mannose biosynthetic process from fructose-6-phosphate", though Reactome website associates R-HSA-446205 with the parental term "GDP-mannose biosynthetic process (GO:0009298)
@sjm41 I'm confused. Are you asking for a different BP term for R-HSA-446205?

So R-HSA-446205 is one of the 'rare' cases that currently appear as an xref in the GO. The GO obo file currently has this:

name: GDP-mannose biosynthetic process from fructose-6-phosphate
namespace: biological_process
def: "The chemical reactions and pathways resulting in the formation of GDP-mannose from fructose-6-phosphate." [GOC:dph, PMID:16339137]
xref: Reactome:R-HSA-446205 "Synthesis of GDP-mannose, Homo sapiens"

But the Reactome page (https://reactome.org/PathwayBrowser/#/R-HSA-446205) says:
Represents GO Biological Process: "GDP-mannose biosynthetic process"

So there's a slight mismatch between the xref given in the current GO file and at Reactome. Which GO term is (more) correct for this Reactome pathway?

@deustp01
Copy link

So there's a slight mismatch

Got it. The narrow description, to start the pathway specifically with fructose-6-phosphate, matches what we have actually annotated. We generally (but definitely not completely consistently) leave these kinds of qualifiers off Reactome pathway names unless they are needed, e.g., to differentiate two closely similar pathways that we want to keep distinct for the human user. This happens a lot in signaling, not so much in metabolism.

If there are reasons to conform pathway / process names better, that's a fair issue for discussion.

@cmungall
Copy link
Member

The main issue here is that the reactome-provided mappings are MF only:

https://reactome.org/download/current/Reactions2GoTerms_human.txt

These are all MF

What is confusing is this gets merged with a tiny handful of mappings that GO curates.

GO should not curate Reactome mappings. This is already done by Reactome.

E.g

https://reactome.org/content/detail/R-HSA-70263

image

@pgaudet
Copy link
Contributor

pgaudet commented May 13, 2024

We have decided on the ontology call that

  1. Mappings should all be coming from Reactome. Chris will contact Adam Wright to get the Reactome > GO BP mappings (right now the file provided by Reactome only contains MFs)
  2. We will delete cross references to Reactome in GO

@adamjohnwright
Copy link

Joel and I at Reactome will look into the best way to provide this information and will get back to you.

@adamjohnwright
Copy link

@pgaudet

@jweiser and I have found the Cypher query used to generate the Reaction2Go_human file you are using. We have modified the query so we get the BiologicalProcess associations as well. Here is the query:

MATCH (rle:ReactionLikeEvent)-[:catalystActivity]->(:CatalystActivity)-[:activity]->(go:GO_MolecularFunction)
WHERE rle.stId CONTAINS 'R-HSA-'
WITH rle.stId AS Identifier, rle.displayName AS Name, go.databaseName + ':' + go.accession AS GO_Term
RETURN Identifier, Name, GO_Term
UNION
MATCH (e:Event)-[:goBiologicalProcess]-(go:GO_BiologicalProcess)
WHERE e.stId CONTAINS 'R-HSA-'
WITH e.stId AS Identifier, e.displayName AS Name, go.databaseName + ':' + go.accession AS GO_Term
RETURN Identifier, Name, GO_Term

I am attaching the file with the results. Let us know if this is what you need. If it is we will look into incorporating it into our release process.
Reaction2GO_human_with_BP.csv

@sjm41
Copy link
Contributor Author

sjm41 commented May 16, 2024

Thanks @adamjohnwright

Just for checking purposes, could you provide add a file here that only contains the BP mappings (i.e. MF mapping removed)?
And, if it's easy for you, it would also be a great help for checking if you could add a fourth column with the GO term name. (Not to worry if that's a hassle.)

@adamjohnwright
Copy link

Reactome2GO_human_BP_only.csv

@sjm41
Copy link
Contributor Author

sjm41 commented May 17, 2024

Thanks @adamjohnwright !

I did a first-pass analysis of the BP mappings here: https://docs.google.com/spreadsheets/d/1HSeCrshAhXeyVz1djJ8SL5KP9WXD6eNxc1azYRPPfV0/edit#gid=1698582253

  • First tab is a 4-column table (Reactome ID / Reactome Name / GO ID / GO name
  • Second tab is simpler 2-column version of the same table (Reactome / GO) - I've added a few comments to this one.
  • Third tab is a Pivot Table analysis of the second tab.

Observations:

  1. 1,855 Reactome IDs are mapped to 936 different GO-BP terms.
  2. 35 Reactome IDs are mapped to 17 obsolete GO terms (this may just be a timing issue...)
  3. There are 655 1:1 mappings between Reactome IDs and GO-BP terms.
  4. The remaining 1,200 Reactome IDs are many:1 mappings to GO-BP terms, ranging from 2:1 up to 36:1.
  5. So I spot-checked 3 cases of many:1 mappings:
  • 36 mappings to GO:0006400 = tRNA modification -> 2 mappings to Reactome pathways, 34 mappings to reactions
  • 17 mappings to GO:0016925 = protein sumoylation -> 3 mappings to Reactome pathways, 14 mappings to reactions
  • 15 mappings to GO:0006635 = fatty acid beta-oxidation -> 10 mappings to Reactome pathways, 5 mappings to reactions

I don't think Reactome should make mappings between individual reactions and GO-BP terms - the BP mapping should only be made to the 'parent' pathway ID. If all those reaction-level mappings could be reviewed and removed at the Reactome side, I think this would remove many/most of the many:1 mappings and produce a more accurate reactome-GOBP mapping file.

Thoughts?

@cmungall
Copy link
Member

cmungall commented May 18, 2024

I agree with @sjm41 and would go further

e.g for sumolyation

reactomeID reactomeLabel GO_ID
R-HSA-2990846 SUMOylation GO:0016925
R-HSA-2997723 PIAS4 SUMOylates TP53BP1 with SUMO1 GO:0016925
R-HSA-3108232 SUMO E3 ligases SUMOylate target proteins GO:0016925
R-HSA-3215018 Processing and activation of SUMO GO:0016925
R-HSA-3232162 PIAS3 SUMOylates MITF with SUMO1 GO:0016925
R-HSA-3247493 PIAS1 SUMOylates SP3 with SUMO1 GO:0016925
R-HSA-3465545 PIAS1,3,4 SUMOylate MTA1 with SUMO2,3 GO:0016925
R-HSA-4615910 SUMOylation of PCNA with SUMO1 GO:0016925
R-HSA-4641342 SUMOylation of TOP2A with SUMO1 GO:0016925
R-HSA-4641345 SUMOylation of TOP2B with SUMO1 GO:0016925
R-HSA-4641350 PIAS4 SUMOylates TOP2A with SUMO2,3 GO:0016925
R-HSA-4641362 SUMOylation of TOP1 with SUMO1 GO:0016925
R-HSA-4655355 RANBP2 SUMOylates CDCA8 (Borealin) and PIAS3 SUMOylates AURKB (Aurora-B) GO:0016925
R-HSA-4655404 SUMOylation of AURKA with SUMO1 GO:0016925
R-HSA-5228525 RANBP2 SUMOylates TOP2A with SUMO1 GO:0016925
R-HSA-6804468 PIAS1 SUMOylates SP3 with SUMO2 GO:0016925
R-HSA-6804485 PIAS1 SUMOylates L3MBTL2 with SUMO2 GO:0016925

if we look at the one Ihighligted:

https://reactome.org/content/detail/R-HSA-4615910

We cab see this rolls up to a parent https://reactome.org/content/detail/R-HSA-2990846:

image

@deustp01
Copy link

I don't think Reactome should make mappings between individual reactions and GO-BP terms - the BP mapping should only be made to the 'parent' pathway ID. If all those reaction-level mappings could be reviewed and removed at the Reactome side, I think this would remove many/most of the many:1 mappings and produce a more accurate reactome-GOBP mapping file.

Sorry, but no way! We attach BP terms to individual reactions to enable us to simultaneously associate a reaction with a pathway / process that makes biological sense to our experts, e.g., to make regulation of glycolysis mediated by fubctose-1,6-bisphosphate metabolism a part of the process of glycolysis, and also to conform our annotations to the current GO rules on pathway boundaries, which do not necessarily make biological sense. I really do not want to discard large numbers of good annotations that accurate reflect mammalian biology because of GO pathway boundary restrictions.

This is essentially an issue of how to represent multi-parentage (a reaction / molecular function can be part_of more than one pathway / process, and indeed in those different contexts may accomplish different things.

This is our starting position - there is more to discuss! @ukemi

@ukemi
Copy link
Contributor

ukemi commented May 20, 2024

This is the issue I was trying to get at the other day (unclearly). I think to do what @sjm41 wants to do, we need to think about two separate pieces of information.

  1. Information that is used to generate annotations. This would traverse the Reactome heirarchy as is done now, for the reasons @deustp01 states above.
  2. Information used to map individual Reactome reactions and pathways to their corresponding individual GO classes. This is what we need to make @sjm41 's dream come true.

@adamjohnwright
Copy link

@sjm41 hopefully this is the final time. If it is not right please don't hesitate to mention that you have found another exception.

Here is the latest query:
MATCH (n:Pathway {speciesName: "Homo sapiens"})-[:goBiologicalProcess]->(go_term:GO_BiologicalProcess)
OPTIONAL MATCH (n)<-[:hasEvent]-(ancestor:Pathway)-[:goBiologicalProcess]->(ancestor_go_term:GO_BiologicalProcess)
WITH n, go_term, COUNT(ancestor) AS ancestor_count, COLLECT(ancestor_go_term.accession) AS ancestor_accessions
WHERE ancestor_count = 0 OR NOT go_term.accession IN ancestor_accessions
RETURN
n.stId AS pathway_id,
n.displayName AS pathway_name,
go_term.accession AS go_term,
go_term.displayName AS go_term_name
ORDER BY pathway_name

What I am doing with this query is returning all pathways that do not have a parent pathway with the same go_biological_process term. In this way when I look at the example you gave me all the pathways are related to the same biological process and all of them are descendants of "Signaling by FGFR". When I look in the file I now only see Signaling by FGFR. Hopefully, this is what you are looking for.

go-bp-to-reactome-pathways-hierarchy-aware-reactome-v89-try-two.csv

@sjm41
Copy link
Contributor Author

sjm41 commented Jun 13, 2024

Thanks @adamjohnwright ! I think we've done it!

An analysis of the latest file is here:
https://docs.google.com/spreadsheets/d/1HSeCrshAhXeyVz1djJ8SL5KP9WXD6eNxc1azYRPPfV0/edit?gid=1550810237#gid=1550810237

Summary:

  • 969 Reactome IDs are mapped to 782 different GO-BP terms
  • 678 1:1 mappings between Reactome IDs and GO-BP terms
  • 104 GO-BPs mapped to multiple Reactome IDs
  • Conversely, there are 291 many:1 mappings between Reactome IDs and GO-BP terms, distributed as follows:
Mapping Count
10:1	1 (GO:0019082 = viral protein processing)
9:1	1 (GO:0007268 = chemical synaptic transmission)
7:1	1 (GO:0033271 = myo-inositol phosphate transport)
6:1	2
5:1	5
4:1	7
3:1	26
2:1	61

From a few spot-checks, it looks like these remaining many:1 mappings are 'correct' within the critieria we set up - that is, the same BP term is annotated to independent Reactome pathways where the parent pathway has a different BP annotation.

So, if we integrated the current mapping file into the GO, there would only be ~100 GO-BP terms mapped to multiple Reactome pathway IDs, with only a handful of extreme cases having 7-10 Reactome mappings at most.

I suggest you now start including these BP mappings in the reactome_xrefs file you regularly submit to the GO.
Sound OK to you @cmungall ?

@cmungall
Copy link
Member

cmungall commented Jun 13, 2024 via email

@sjm41
Copy link
Contributor Author

sjm41 commented Jul 8, 2024

Hi @adamjohnwright , @deustp01

Are you now including the new BP mappings in the reactome_xrefs file you regularly submit to the GO?
Just wondering if we can close this ticket and/or another ticket needs making on a Reactome tracker?
Please advise, thanks!

@adamjohnwright
Copy link

@sjm41 I want to clarify. I can generate this mapping file and put it in our download directory for each release. This would make it available for download via URL. Does this meet your requirements? Is it what you have in mind?

@sjm41
Copy link
Contributor Author

sjm41 commented Jul 8, 2024

I'm not familiar with how the GO developers currently pull in xref files.
I do know that GO is already pulling in the file of GO-MF to Reactome (reaction) xrefs, and in a comment above, @cmungall pointed to this file/location:
https://reactome.org/download/current/Reactions2GoTerms_human.txt

So if the additional GO-BP to Reactome (pathway) xrefs could either (1) be included in that same file, or (2) be put in the same place as that other file, then I'd guess that would work for us.

I think @cmungall is on vacation this week, but @balhoff may be around and be able to advise us?

@balhoff
Copy link
Member

balhoff commented Jul 8, 2024

@adamjohnwright @sjm41 yes I think it would be simplest if the additional xrefs were added to Reactions2GoTerms_human.txt, but if it doesn't make sense to combine them, a file in the same directory with the same format would work well also.

@adamjohnwright
Copy link

@guanmingwu I am wondering what approach you prefer. My preference is to have it as a separate file called Pathways2GoTerms_human.txt. Thats because the terms I am generating are mapped to pathways. I could see us listing this file on the downloads page. If you agree I think we should bring this up to the broader group on tomorrows call.

@guanmingwu
Copy link

@adamjohnwright Agree: a separate file, Pathway2GoTerms_human.txt, will make it easier for us to manage at the Reactome side.

@adamjohnwright
Copy link

@balhoff and @sjm41 I am sure I will be able to make it for this release and going forward. I will work with others at Reactome to list it on the downloads page. @sjm41 I will work with you closer to the release to make sure the file is what you are expecting before the release goes out. In order to not forget I have added it to the Reactome Release SOP.

@deustp01
Copy link

deustp01 commented Jul 9, 2024

Here is an e-mail message from @pgaudet that is at least closely related to this ticket, so I am adding it here as a related item to check the next time we do a release. (If this deserves a new ticket, I can open one.)

Hi Peter,

Finally our annotation reports include TAS annotations; we should now be reporting all Reactome annotations. Let me know if that’s not the case.

Thanks, Pascale

@cmungall
Copy link
Member

Just checking on the status of this - just want to make sure it doesn't fall through the cracks. It looks like we agreed that name overloading would be odd, so we'll have a separate file for the pathways/processes

@adamjohnwright
Copy link

@cmungall yes the plan is to have a file called Pathway2GoTerms_human.txt, available for download. We have already added it to our release pipelines. We just started running the release for v90 in which it will be available to the public in about a month. We are also reorganising the Reactome Downloads page in which one requirement that I requested was for these files to be listed.

@sjm41
Copy link
Contributor Author

sjm41 commented Oct 10, 2024

Hi @adamjohnwright and @balhoff
I see that Reactome v90 is out, so I'm wondering if the new Pathway2GoTerms_human.txt file is now included int he Downloads page, and if the contents can now be injected into the ontology as BP xrefs for the next GO release?

@adamjohnwright
Copy link

@sjm41 good question and thanks for reaching out. It was generated. Here is a link: HTTP://download.reactome.org/90/Pathways2GoTerms_human.txt. This link pulls it through CloudFront to our S3 bucket. If you don't want to specify the version (e.g. 90) in the URL then you could use the same format as the other downloads on the download page: https://reactome.org/download/current/Pathways2GoTerms_human.txt. Every time we release you should be able to access it in this manner.

@sjm41
Copy link
Contributor Author

sjm41 commented Oct 10, 2024

Great, thanks @adamjohnwright !
@balhoff - let me know if you have questions about implementation. It would be good to check a sample of the integrated xrefs before they go into the wild. Thanks.

@sjm41 sjm41 assigned balhoff and unassigned deustp01 Oct 10, 2024
@deustp01
Copy link

@pgaudet This may help with the propagation of Reactome BP annotations to a place where they are visible for queries about term usage in connection with obsoletion proposals, as we w ere discussing yesterday.

@sjm41
Copy link
Contributor Author

sjm41 commented Nov 6, 2024

Hi @balhoff Can you say when you might be able to action this? Thanks!

@balhoff
Copy link
Member

balhoff commented Nov 6, 2024

@adamjohnwright I'm sorry about the delayed feedback here. I'm wondering if you can align the formats of these two files. The previous reactions file looks like this:

Identifier	Name	GO_Term
R-HSA-1008248	Adenylate Kinase 3 is a GTP-AMP phosphotransferase	GO:0046899
R-HSA-1013012	Binding of Gbeta/gamma to GIRK/Kir3 channels	GO:0004965

The new pathways file looks like this:

Pathway_Id	Pathway_Name	GO_Term	GO_Term_Name
R-HSA-73843	5-Phosphoribose 1-diphosphate biosynthesis	0006015	5-phosphoribose 1-diphosphate biosynthetic process
R-HSA-1369062	ABC transporters in lipid homeostasis	0006869	lipid transport

The headers are different, and there is an additional column (GO term name) in the new file. Also the GO terms are now missing the GO: prefix. The first two are not hard to work around, but it would be simplify things if at least the GO terms looked like GO:0006015 instead of 0006015.

@adamjohnwright
Copy link

Hi @balhoff, Thanks for pointing out these improvements. I just worked on this with @jweiser, and we have a pull request created with the necessary change: reactome/data-export#10. @jweiser will pull it in before we run this step in the release we are just starting. The file will be formatted with your recommended changes for Reactome v91. He will also send you a v90 version of the file he will create when testing.

@balhoff
Copy link
Member

balhoff commented Nov 6, 2024

Thanks @adamjohnwright!

@jweiser
Copy link

jweiser commented Nov 6, 2024

Hi @balhoff - Here is the Pathways2GoTerms file for v90 of Reactome:

Pathways2GoTerms_human.txt

Please let us know if you find any issues that need to be addressed!

@balhoff
Copy link
Member

balhoff commented Nov 6, 2024

@jweiser @adamjohnwright that file works great, thank you.

@cmungall
Copy link
Member

cmungall commented Nov 7, 2024 via email

@balhoff balhoff linked a pull request Nov 7, 2024 that will close this issue
@balhoff
Copy link
Member

balhoff commented Nov 7, 2024

@sjm41 here is a diff of the rendered xrefs with the new pathways integrated: e966c09#diff-60618cccd4e0920e6894cc1de1ab93b75acd08f6ac5bab190b479ab225816ee3

RDF/XML is not a great format to read, so let me know if there is something you would like to see in a different way.

@sjm41
Copy link
Contributor Author

sjm41 commented Nov 28, 2024

Sorry @balhoff, I'd completely missed that you'd added a link to this diff for me to check!

But you're right that I can't really work with the RDF/XML output. Can you send me the diff output as a simple text/TSV file listing the GO ID and the newly associated Reactome ID? If those IDs could be accompanied by their term/pathway names, that would be even better!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants