-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Different scores? #123
Comments
Interesting case! Looking at the first result in both sets, they are basically the same, but not exactly. I wrote some code to do a quick and dirty look at the content of the edges between the different curies in the result. I confirmed that if you remove directionality and only consider a symmetric weight matrix, the same edges are all present. The disagreements are between the subjects and objects on otherwise directionless edges. I believe this is due to Omnicorp and how it must make some arbitrary selection of subjects and objects. This is fine, except in how it impacts ranker and weight calculation. Roughly for each subject/object pair, each source can only contribute a single weight for each property type. If there are multiple edges from the same source that have the same property value (ex. CTD publications), the maximum property value is taken. Once the edges are collapse for each unique subject/object/source/property, there can be subtle differences if the subjects, objects flip around. Once we have a weight matrix, we make it symmetric when we calculate the graph laplacian. If instead we make the matrix symmetric while checking for subject/object/source/property collisions, it should clear up the discrepancy. I believe this is fixed in #124 but I'd like a few more eyes on it. @uhbrar @maximusunc |
Please disregard if this is too edge case, but I'd like to toss a small wrench. In ICEES-KG, we have many edges with the same subject/object/source/property but that come from different datasets and different years. It sounds like the current ranker would not handle this case. |
I think that's an interesting point. We should consider what other aspects of TRAPI we should using to identify unique edges. |
Two queries are run in robokop. One is (Ozone)-(gene)-(asthma) and the other is (asthma)-(gene)-(Ozone). The same answers are returned. But the scores are slightly different between the two. Attached are two messages. The first result in each (NCBIGene:7412) show the different scores.
ROBOKOP_message_asthma-gene-ozone_trapi1.4dev.json.txt
ROBOKOP_message_ozone-gene-asthma_trapi1.4dev.json.txt
As far as I can tell, the two results are the same in terms of number of edges bound, and the parameters of the omnicorp support edges.
This suggests to me a bug in ranker somewhere, but the differences are small enough that perhaps it is something numerical?
I also notice that every weight I saw has a value of 1. Is this accurate? Or are these weights no longer used in ranking?
The text was updated successfully, but these errors were encountered: