Replies: 5 comments 27 replies
-
Thanks @cbizon agree with the above I think the scope of the question makes sense. Note that the question of what are the deductive rules (e.g. which predicates and inverses of one another) is in the scope of DM, see for example biolink/biolink-model#624. These are largely straightforward entailments using common OWL constructs (symmetry, transitivity, relfexivity, inverses) It would be good to survey KPs and see which ones are implementing which rules. I know @balhoff's CAM provider performs inferences. |
Beta Was this translation helpful? Give feedback.
-
So given this scope, the question becomes where in the architecture stack this kind of deduction occurs. Options I see:
Other variations include that
Of the three, I would say that we've been leaning towards the registry handling a lot of this, but our experience in the prototype didn't fill me with confidence on this approach. Furthermore, the registry is not going to be able to help with deduction on specific entities (one of our use cases above). I think when you think through how a named thing or related to query will work, doing anything other than pushing this deduction down to the KP is going to generate a huge number of messages for little gain. So my strong preference is that we require KPs to do this. (I expect that to be a controversial statement :) ) |
Beta Was this translation helpful? Give feedback.
-
Chris may I respectfully ask a meta-question about the broader context of why this issue of DEDUCTION arises now? (i) We at imPROVING agent are in the belief that , following the old discusison of #2 , mostly promoted by Andrew Su, ARAs accept the highest granularity information (most specific term) from KPs and deal internally with that. This made sense to us because the space or ARA is likely finite (or very slowly growing) while the space of KPs and their content (notably when dealing with "raw data sources", such as EHRs) is growing ad infinitum and can get messy. For instance, there could possibly be soon several subtypes of Type 2 diabetes based on molecular profiling that show up in EHRs under different names. The relative level of granularity of a term is thus dynamic and for mapping to biolink (for backwards compatibility) it would indeed make more sense if this is is done by ARA "centrally" and under human supervision (aka manual curation) and also considering how a given ARA performs the relevance analytics.... (ii) Thus there is the broader issue of the two types of ARAs tha we rarely explicitly acknowledge: (a) ARAs with well-curated internal big KGs which ingest KPs and do their own data modeling on a per KP basis, i.e. mostly manually (creating some import filter) vs. (b) ARAs that build KGs on the fly using information from KPs, identified eg via registry. The former makes mot a lot of the issues that we discuss which arise solely because we seek to automate something (as is the case for the (b) type of ARAs..) |
Beta Was this translation helpful? Give feedback.
-
Could we consider a simpler approach that (at least for now) will not require the ARA or KP to perform reasoning? |
Beta Was this translation helpful? Give feedback.
-
Based on yesterday's architecture call, let's break this down a little bit into chunks. So imagine a KP that has relationships that it stores as (ChemicalSubstance)-[related_to]->(Gene). That is what is in its /predicates (or /knowledge_map) endpoint, and what gets into the metaKG. Now, an ARA has a query (ChemicalSubstance CHEBI:50730)-[related_to]->(NamedThing). My bias is that the ARA should be able to send that as a single TRAPI query (which I can write out here if it makes things clearer) to that KP and return any related_to edges between CHEBI:50730 Genes. Again, let's ignore any edges that are not exact predicate matches for the moment. So KP reps: Is this something that you already do? Is this something you would be able to do or do you think that this is the wrong direction to go? Are there tools that (if they existed) would make this easier and that you would be willing to incorporate? |
Beta Was this translation helpful? Give feedback.
-
Users will formulate questions and KPs will contain a representation of knowledge. But we don’t want to fail a query (miss results) because the representation of the KP was logically the same as the user query, even if its representation was different.
For example, if a user asks, “what chemicals are related to Diabetes?”, if a KP that knows that Pioglitiazone treats diabetes, then Translator should recognize that this is an example of being related to (treats is a subproperty of related to), and return this knowledge; returning only edges that are directly annotated with ‘related to’ is probably not the user’s intention.
There are two kinds of these mismatches that we might worry about: Mismatches that can be solved logically using deductive reasoning, and mismatches that cannot. For example, consider a query like (biolink:ChemcialSubstance)-[biolink:affects_activity_of]->(biolink:Gene). If I have a result that a chemical [increases_activity_of] a gene, that is logically guaranteed to satisfy the query because increases_activity of is a subproperty of affects_activity_of. Note that the reverse is not true. If the query is
(biolink:ChemicalSubstance)-[biolink:increases_activity_of]->(biolink:Gene), and all I know is that a chemical affects the activity of a gene, it may or may not be a correct edge to return. Because not every affects is an increase, it may also decrease.
So what are the set of inferences that we are talking about?
The set of inferences that we might want to worry about, but are not able to be logically deduced are basically these same choices, but making things more general rather than more specific. This is because the more general result may or may not match the more specific query. For example, if I ask about type 2 diabetes, and a KP has information at the superclass level diabetes, then we’re not concerned with that at the moment.
The question that we have been discussing, and which has come into even clearer focus this week is: where in Translator does this inference occur.
Does the scope of this question make sense?
Beta Was this translation helpful? Give feedback.
All reactions