-
Notifications
You must be signed in to change notification settings - Fork 2
Unsecret Agent
Based on the miniKanren logic programming language for reasoning over Knowledge Graphs. miniKanren is a family of Domain Specific Languages for logic programming. The name kanren comes from a Japanese word (関連) meaning "relation". The core miniKanren language is very simple, with only three logical operators and one interface operator. The core language, using Scheme as the host language, is described in this short, interactive tutorial.
The mediKanren Reasoner goes beyond the standard miniKanren language with Datalog-like declarative database features, implemented in Greg Rosenblatt's dbKanren system.
The entire implementation is written in Racket, an offshoot of Scheme, which is a dialect of LISP.
Will Byrd (https://github.com/webyrd) ([email protected])
Based on the miniKanren logic programming language for reasoning over Knowledge Graphs.
Ingests KGX-formatted static knowledge graphs, originating from Knowledge Providers (KPs), into a locally-managed mediKanren knowledge graph repository.
- KG2
- [ROBOKOP]
- TextMiningProvider
- Node Normalization
The terminology:
-
edge: represents relationship between two entities (like chemical and gene) within a knowledge graph. An edge also includes information about the source, supporting publications, and so on about the claim of the relationship.
-
MVP: represents the template of a query. The MVP queries are "what" questions, including "what drugs may treat a disease?" (mvp1), "what chemicals may up-regulate/down-regulate a gene?" (mvp2-gene), and "what genes may be up-regulated/down-regulated by a chemical?" (mvp2-chem).
The MVP Logic: All MVP queries include two query modes - look-up (one-hop) and inferred (two-hop)
- What drugs may treat a disease? (mvp1) [a disease is required to input]
- The look-up mode: unknown chemical X treats the known/given disease D. mediKanren does a direct search among the knowledge graphs (KGs) and returns all the edges that contain information about X treats disease D.
- The inferred mode: unknown chemical entity X affects/regulates a gene or protein Y which is associated with/contribute to a known disease D. mediKanren finds the genes or proteins (Y) that have association or causal relationship with D. Then mediKanren looks for chemicals (X) that affect or regulate Y.
- What chemicals may up-regulate/down-regulate a gene? (mvp2-gene) [a gene and direction (up-regulation/down-regulation) are required to input]
- The look-up mode: unknown chemical X regulates (either up-regulates or down-regulates) the known Gene and Protein G. mediKanren does a search among the KGs and returns edges that indicates X up- or down- regulates G.
- The inferred mode: unknown chemical X affects/interacts with genes or proteins Y which affects known gene or protein G. mediKanren looks for chemicals that regulate or interact with Y which up- or down- regulates G, and the corresponding edges are returned.
- What genes may be up-regulated/down-regulated by a chemical? (mvp2-chem) [a chemical and direction (up-regulation/down-regulation) are required to input]
- The look-up mode: the given chemical C regulates unknown gene or protein X. mediKanren looks for edges that indicate X being up- or down- regulated by C.
- The inferred mode: the given chemical C affects/interacts with gene or protein Y which regulates unknown gene or protein X. mediKanren looks for X which are up- or down- regulated by Y that are regulated by or interact with C, and the corresponding edges are returned.
The Ranking Algorithm:
-
Summary of the Algorithm The algorithm is designed to rank paths by combining multiple factors that represent different dimensions of relevance:
- relatedness (Class Hierarchy)
- Evidence Strength (Evidence Score)
- Path Length (Hop Count)
- Type of Relationship (Predicate Type) aggregating and normalizing these scores to identify the most supported answers in the knowledge graphs.
-
Scoring Algorithm of A Single Path ** Weights for Each Factor The total score for a path is computed as a weighted sum of the following four factors:
- Class Hierarchy Value: Weight = 0.3
- Edge Score: Weight = 0.3
- Hop Count Value: Weight = 0.2
- Predicate Type Value: Weight = 0.2
** Factors and Their Computation
- Class Hierarchy Value Description: This factor assesses the relatedness of the path and the query. Calculation: Class Hierarchy Value = 10 × (1 / Hierarchy Level) Examples:
- Level 1 (Self): 10 × 1/1 = 10
- Level 2 (Child): 10 × 1/2 = 5
- Level 3 (Grandchild): = 10 × 1/3 = 0.33
- Evidence Score Description: This factor measures the strength of the evidence supporting the edge, based on the number of publications. Calculation:
- 1-Hop path: Edge Score = Number of Publications Supporting the Edge
- If the edge is from infores:semmeddb Edge Score = Edge Score × 0.1
- If the edge is from infores:text-mining-provider-targeted Edge Score = Edge Score × 10
- 2-Hop path: square-root(Number of Publications of Edge1 × Number of Publications of Edge2)
- Hop Count Value This factor accounts for the number of hops (edges) between nodes in a path. Calculation:
- MVP1
- 1-Hop Lookup: Hop Count Value = 3
- 1-Hop Inferred: Hop Count Value = 2
- 2-Hop Inferred: Hop Count Value = 1
- MVP2
- 1-Hop Lookup: Hop Count Value = 50
- 1-Hop Inferred: Hop Count Value = 40
- 2-Hop Inferred: Hop Count Value = 1
- Predicate Type Value This factor distinguishes between different types of relationships represented by the predicates. Predicate Categories:
- Causation Predicates: Predicates like "biolink:contributes_to", "biolink:affects", "biolink:target_for".
- Value = 3
- Association Predicates: Predicates like "biolink:gene_associated_with_condition", "biolink:interacts_with".
- Value = 2 Calculation:
- 1-Hop Lookup: Predicate Type Value = Predicate Value = Causation Predicates Value = 3
- 1-Hop Inferred: Predicate Type Value = Predicate Value
- MVP1 2-Hop Inferred: Predicate Type Value = Predicate Value of Edge2
- MVP2 2-Hop Inferred: Predicate Type Value = Predicate Value of Edge1
** Calculate the Total Score: Apply the weights to each factor and sum them up: Total Score=(0.3 × Class Hierarchy Value)+(0.3 × Evidence Score)+(0.2 × Hop Count Value)+(0.2 × Predicate Type Value)
- Scoring Algorithm of Answers Input: a list of one-hop paths (lp1_0) and a list of two-hop paths (lp2_0) Output: a list of scored answers
[a short explanation of the naming of lp1_0: l represents list; p represents path; 1 means the paths are one-hop edges; and 0 represents step 0.]
- Apply path-score algorithm (see
scoring algorithm of a single path
) to each path in lp1_0. Return the scored list lp1_1. - Apply path-score algorithm (see
scoring algorithm of a single path
) to each path in lp2_0. Return the scored list lp2_2. - Sort lp1_1 by score, producing the sorted list lp1_3. Return the first 125 edges in lp1_3 as lp1_3_1 or return the entire lp1_3 as lp1_3_1 if it contains less than 125 paths.
- Sort lp2_2 by score, producing the sorted list lp2_4. Return the first 125 edges in lp2_4 as lp2_4_1 or return the entire lp2_4 as lp2_4_1 if it contains less than 125 paths.
- Merge le1_3_1 and lp2_4_1, and return it as lp_5.
- lp_5 is processed to score the unique answers depending on the MVP it is dealing with. The scores of the edges are added up when they have the same answer, and the sum of these scores will be the score of an answer. A new list with scored unique answers la_6 is returned, and the contributing edges are stored as supporting edges.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
In the cases of mvp1 or mvp2-gene with the input D, for example, there are edges
Chemical A ...-->... D has score 5
Chemical A ...-->... D has score 2
Chemical A ...-->... D has score 3
Chemical B ...-->... D has score 1
Chemical B ...-->... D has score 12
The unique answer Chemical A would have a score of 10 = 5+2+3 and the unique answer Chemical B would have a score of 13 = 1+12.
In the case of mvp2-chem with the input C, for example, there are edges
C ...-->... Gene A has score 5
C ...-->... Gene A has score 2
C ...-->... Gene A has score 3
C ...-->... Gene B has score 1
C ...-->... Gene B has score 12
The unique answer Gene A would have a score of 10 = 5+2+3 and the unique answer Gene B would have a score of 13 = 1+12.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
- Sort la_6 by score, producing the list la_7. Assign a normalized rank from 0 to 1 to la_7, producing la_7_1 and return la_7_1.