Different scores? #123

cbizon · 2023-09-15T21:04:09Z

Two queries are run in robokop. One is (Ozone)-(gene)-(asthma) and the other is (asthma)-(gene)-(Ozone). The same answers are returned. But the scores are slightly different between the two. Attached are two messages. The first result in each (NCBIGene:7412) show the different scores.

ROBOKOP_message_asthma-gene-ozone_trapi1.4dev.json.txt
ROBOKOP_message_ozone-gene-asthma_trapi1.4dev.json.txt

As far as I can tell, the two results are the same in terms of number of edges bound, and the parameters of the omnicorp support edges.

This suggests to me a bug in ranker somewhere, but the differences are small enough that perhaps it is something numerical?

I also notice that every weight I saw has a value of 1. Is this accurate? Or are these weights no longer used in ranking?

kennethmorton · 2023-10-02T02:35:39Z

Interesting case!

Looking at the first result in both sets, they are basically the same, but not exactly. I wrote some code to do a quick and dirty look at the content of the edges between the different curies in the result. I confirmed that if you remove directionality and only consider a symmetric weight matrix, the same edges are all present. The disagreements are between the subjects and objects on otherwise directionless edges. I believe this is due to Omnicorp and how it must make some arbitrary selection of subjects and objects.

This is fine, except in how it impacts ranker and weight calculation. Roughly for each subject/object pair, each source can only contribute a single weight for each property type. If there are multiple edges from the same source that have the same property value (ex. CTD publications), the maximum property value is taken. Once the edges are collapse for each unique subject/object/source/property, there can be subtle differences if the subjects, objects flip around.

Once we have a weight matrix, we make it symmetric when we calculate the graph laplacian. If instead we make the matrix symmetric while checking for subject/object/source/property collisions, it should clear up the discrepancy.

I believe this is fixed in #124 but I'd like a few more eyes on it. @uhbrar @maximusunc

maximusunc · 2023-10-02T13:38:33Z

Please disregard if this is too edge case, but I'd like to toss a small wrench. In ICEES-KG, we have many edges with the same subject/object/source/property but that come from different datasets and different years. It sounds like the current ranker would not handle this case.

kennethmorton · 2023-10-02T17:22:47Z

I think that's an interesting point. We should consider what other aspects of TRAPI we should using to identify unique edges.

cbizon assigned kennethmorton and uhbrar Sep 15, 2023

kennethmorton mentioned this issue Oct 2, 2023

Simplify and cleanup #124

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different scores? #123

Different scores? #123

cbizon commented Sep 15, 2023

kennethmorton commented Oct 2, 2023

maximusunc commented Oct 2, 2023

kennethmorton commented Oct 2, 2023

Different scores? #123

Different scores? #123

Comments

cbizon commented Sep 15, 2023

kennethmorton commented Oct 2, 2023

maximusunc commented Oct 2, 2023

kennethmorton commented Oct 2, 2023