Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ranker Slowness and Omnicorp Oddities #119

Open
kennethmorton opened this issue Sep 11, 2023 · 2 comments
Open

Ranker Slowness and Omnicorp Oddities #119

kennethmorton opened this issue Sep 11, 2023 · 2 comments
Assignees

Comments

@kennethmorton
Copy link
Contributor

I am trying to track down some slowness in Aragorn ranker. It was originally believed that the slowness was somehow related to query_id/qnode_id issues coming from Automat. However, this appears to be correlation rather than causation of the slowness. It may still ultimately be related but this is focusing on performance within ranker_obj.py

Looking at this Query

{ "nodes": {
    "on": { "ids": ["MONDO:0004979"] },
    "sn": { "categories": ["biolink:ChemicalEntity"] }
  },
  "edges": {
    "t_edge": {
      "subject": "sn",
      "object": "on",
      "knowledge_type": "inferred",
      "predicates": ["biolink:treats"]} 
  }
}

We find a very bimodal execution time for results. The normal, very fast, and the non-normal very slow.

Digging into one of the slow answers. It looks fine on the surface.

{
  "node_bindings": {
    "on": [
      { "id": "MONDO:0004766", "qnode_id": "MONDO:0004979" },
      { "id": "MONDO:0004979", "qnode_id": "MONDO:0004979" }
    ],
    "sn": [{ "id": "PUBCHEM.COMPOUND:301590" }]
  },
  "analyses": [
    {
      "resource_id": "infores:aragorn",
      "edge_bindings": {
        "t_edge": [{ "id": "4f8c7da6-8771-4bd1-ba94-1fb820b9e910" }]
      },
      "support_graphs": ["OMNICORP_support_graph_9"]
    }
  ]
}

The treats edge looks normal as do the two attribute edges and the web of edges that spawn from them(not shown):

{
  "4f8c7da6-8771-4bd1-ba94-1fb820b9e910": {
    "subject": "PUBCHEM.COMPOUND:301590",
    "object": "MONDO:0004979",
    "predicate": "biolink:treats",
    "sources": [
      {
        "resource_id": "infores:aragorn",
        "resource_role": "primary_knowledge_source"
      }
    ],
    "attributes": [
      {
        "attribute_type_id": "biolink:support_graphs",
        "value": [
          "7a2ecf78-c399-45cf-95f1-3e93185eae4a",
          "d9677631-66a2-4708-9b86-9df19c8353d3"
        ]
      }
    ]
  }
}

The problem comes from "OMNICORP_support_graph_9". Here is a bit

{
  "edges": [
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5",
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5",
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5",
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5",
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5",
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5",
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5",
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5",
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5",
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5",
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5",
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5",
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5",
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5",
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5",
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5",
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5",
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5",
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5",
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5",
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5",
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5"
    ....
  ]
}

Within this support graph there 11701 edges, however, many are duplicates. There are still 3385 unique edges. These edges end up making lots and lots of nodes. At the end of the day we end up with 700+ nodes.

In sorting through all of this I noticed that ranker wasn't properly finding all of the edges and supporting evidence contained in the set. Bug fixes are in the branch ranker_speed_investigation.

After the bug fixes the score for this answer went from 0 to 0.93. It's not even clear if this is a good thing, given the evidence or reality.

I explored ranker_obj to investigate the impact of the duplicate edges, It's not much. The real issue is having 3000+ edges to parse through and then having 700+ nodes. Even with 700+ nodes, the numpy calculations are still reasonably fast, 30ms on my laptop. The real killer is traversing the web of edges spawning from all 3385 edges, collecting the evidence and calculating weights. After the bug fixes the run-time is largely unchanged.

This happens with different OMNICORP support graphs and appears to be the dominant symptom for results that take 1sec or more to score.

I believe this is some sort of OMNICORP issue. We should explore more performance optimizations within ranker, but nothing is obvious, so hopefully the OMNICORP bug fix will be enough.

@cbizon
Copy link
Contributor

cbizon commented Sep 13, 2023

One aspect to omnicorp's behavior here is that the same graph e.g. "OMNICORP_support_graph_1" is appearing in many results. So I think that it's accumulating a bunch of support edges that don't have anything to do with each other and attaching them to many results. suboptimal.

@cbizon
Copy link
Contributor

cbizon commented Sep 13, 2023

Actually, I see I made a PR for this a while ago... Needs a review though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants