-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: How to esablish to which entities the Source and Target in a GO-CAMS relation point to? (Newbie question) #466
Comments
Hi @zwurgl, for some reason the Turtle you pasted looks a little garbled. Here is that section from the file: _:b40 <http://purl.org/pav/providedBy> "http://www.wormbase.org" ;
<http://geneontology.org/lego/evidence> <http://model.geneontology.org/568b0f9600000284/5ce58dde00000278> ;
<http://www.w3.org/2002/07/owl#annotatedProperty> <http://purl.obolibrary.org/obo/RO_0002629> ;
<http://www.w3.org/2002/07/owl#annotatedSource> <http://model.geneontology.org/568b0f9600000284/57ec3a7e00000079> ;
<http://www.w3.org/2002/07/owl#annotatedTarget> <http://model.geneontology.org/568b0f9600000284/57ec3a7e00000109> ;
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Axiom> ;
<http://www.w3.org/2000/01/rdf-schema#comment> "ES (PMID:15625192): A weak interaction between Flag-tagged TIR-1 and T7-tagged NSY-1 could be detected, but this interaction appeared inefficient compared to the TIR-1/UNC-43 interaction (data not shown)." ;
<http://purl.org/dc/elements/1.1/contributor> "http://orcid.org/0000-0002-3013-9906" ;
<http://purl.org/dc/elements/1.1/date> "2019-05-31" . For those two nodes, you can find other triples in the data. For example: <http://model.geneontology.org/568b0f9600000284/57ec3a7e00000079> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.obolibrary.org/obo/GO_0035591> and <http://model.geneontology.org/568b0f9600000284/57ec3a7e00000109> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.obolibrary.org/obo/GO_0004709> So it is an instance of signaling adaptor activity which is directly positively regulating an instance of MAP kinase kinase kinase activity. You would need to look at further graph connections to reach the gene products enabling these activities. It's all in there, but not that convenient maybe for someone not using OWL or RDF directly. We have been working on a report format which simplifies some of this, but it's not part of the public outputs yet (needs some refinement). The text there comes from the publication which the curator used to construct this part of the graph. It doesn't necessarily describe the exact edge we're looking at. |
Thanks @balhoff for the explanation. That makes sense and allows me to explore the next steps. When you say "the text ... doesn't necessarily describe the exact edge we're looking at" that means that in general it cannot be assumed that the text part (the sentence etc) that did lead to this edge is contained in this dataset at all, right? So that limits of course the usefulness of the dataset for the purpose I had in mind (as useful as it certainly is for other purposes.) Thanks a lot again and will watch out for your report with the simplified data format :-) |
@zwurgl I will ask @vanaukenk to comment about how the text in that comment ("ES (PMID:15625192): A weak interaction between Flag-tagged TIR-1 and T7-tagged NSY-1 could be detected, but this interaction appeared inefficient compared to the TIR-1/UNC-43 interaction (data not shown).") is added to the model. I don't think it is standard content that you will find connected to axioms in GO-CAMs. @vanaukenk, this is a worm model so I thought you might know where the text extract comes from. |
@zwurgl I wonder if you could collect all the PMIDs from a particular model and associate text from those papers with the overall OWL representation in the GO-CAM. Would that be too indirect for your purposes? |
@balhoff thanks. I will check. Taking your hint from above into account, what we have in this GoCams dataset then is: the source entity, the target entity, the type of relation and the PMID. So the next step for me is to identify source and target (or any one of their known variants or synonyms) in the text of the PMID (hopefully in the abstract) and get the positions (start / end). Then also stipulate a NO-RELATION between all other pairs of entities in that PMID abstract and we have a training corpus ... A bit of fiddling, but feasible. |
I hope I can quickly ask a question on some available GO-CAMS data (https://geneontology.org/docs/download-go-cams/). I am new to GO so maybe this is all pretty evident but anyway:
I'm busy right now generating formal representations of relations between genes, proteins, molecules and diseases found in text documents (publications, etc) in a public research project. So we apply machine learning methods to read text and generate relations ("A upregulates B", "C inhibits D", ...). We have recently done that successfully on corpora from other formats (BEL, ...)
Now, a collaborator pointed out the resources in https://geneontology.org/docs/download-go-cams/ to me and suggested to check whether they can also be used as a training data set. In the ttl part of the page above, one finds among other things a few thousand relations as the one below:
It seems, here we have a "directly positive regulates" Relation here (http://purl.obolibrary.org/obo/RO_0002629) between two entities (Source and Target) in the resp sentence in (rdf-schema#comment)
My question is a purely technical: How can I determine what these two entities Source and Target are (since the urls do not really point to a page that tells the browser what entities are behind these)? Of course in RDF a url does not necessary need to point to a resources that can be accessed as is - it can also be a DB identifier etc.
Being new to GO I'd appreciate a hint for this one detail: How do I arrive from the URIs of Source and Target (for source http://model.geneontology.org/568b0f9600000284/57ec3a7e00000079 and for target http://model.geneontology.org/568b0f9600000284/57ec3a7e00000109) at the specific entities that there URI refer to? Looking at the sentence above maybe the Source is "TIR-1" and the target "NYS-1"? Or the other way round? Or "UNC-43"? There is certainly a formal link between the urls and these entities in the text. But which link is that?
Ideally after identifying these entities we can use the GO-CAMS ttl data as further training data, meaning for the example above:
Any hint regarding these questions would be highly appreciated.
The text was updated successfully, but these errors were encountered: