Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unambiguous (un)batching #293

Closed
patrickkwang opened this issue Sep 9, 2021 · 15 comments
Closed

Unambiguous (un)batching #293

patrickkwang opened this issue Sep 9, 2021 · 15 comments
Milestone

Comments

@patrickkwang
Copy link
Contributor

NCATSTranslator/TranslatorArchitecture#62

We would like to make sure that results can be unambiguously matched to specific query-graph CURIEs/predicates.

@patrickkwang
Copy link
Contributor Author

Idea: do more explicit batching. Do not allow lists of CURIEs/predicates, but allow sending multiple messages per payload.

Pros:

  • unambiguous linking of results to CURIEs/predicates
  • allows batching of differently-structured messages, e.g. one-hops with two-hops

Cons:

  • potentially lots of duplication, big payloads

@edeutsch
Copy link
Collaborator

edeutsch commented Sep 16, 2021

I think an NodeBinding.original_id could solve the problem reasonably well.
So if the client's original query had qnode n00 with ids ['MONDO:123', 'MONDO:456']
but the server didn't have information on those exactly, but it had information on descendants, it could put descendants MONDO:9876 and MONDO:8765 in the KG and when it is binding them to n00, it would set NodeBinding.original_id='MONDO:123' to let the client know that although the CURIE is different, it is binding MONDO:9876 to what the client originally called MONDO:123.

@edeutsch
Copy link
Collaborator

edeutsch commented Sep 16, 2021

It seems like there are 3 options:

  1. Server always uses the client's CURIEs
  2. Use the NodeBinding.original_id as described above
  3. Servers should return node normalized CURIEs and client should use Node Normalize to associate with CURIE they asked about

1 and 3 does not solve the problem of subclass inference
Maybe 2 seems very heavy handed to be required in all cases

@patrickkwang
Copy link
Contributor Author

patrickkwang commented Sep 16, 2021

A slight variation on idea 2: rather than an additional field, use a sort of extended qnode id in bindings. For example,

"node_bindings": {
    "n00.MONDO:123": [{"id": "MONDO:9876"}],
...

@marcdubybroad
Copy link

What about

"node_bindings": { "n00": {"id": {"MONDO:123": "MONDO:9876"}}, ...

@vdancik
Copy link
Collaborator

vdancik commented Sep 16, 2021

Should we use query_id and knowledge_graph_id to make it explicit which id we refer to?

@brettasmi
Copy link
Contributor

brettasmi commented Sep 16, 2021

FYI from the Architecture README:

KPs that implement the Translator Reasoner API must perform the following kinds of reasoning in answering queries:

Making identifiers more specific, e.g. responding to a query involving an entity with information related to a subclass of that entity. In the knowledge_graph portion of the response, the more-specific identifier must be present and linked to the less-specific identifier. In the results portion of the response, the more-specific response node will be bound to the less-specific query node.

@edeutsch
Copy link
Collaborator

We acknowledge that the above strategy from Architecture can be followed and interpreted successfully for unbatched queries (with a single id). BUT, for batched queries (which seem to be the most popular these days), following the Architecture guideline leads to a situation where the client would have a very hard time (unless they also do some complex ancestor matching) to interpret the results. Seems like NodeBinding.query_id would be the good way to solve that. But that won't appear until TRAPI 1.3. It remains to be seen when we would be able to release TRAPI 1.3 with that fix. We will decide after the relay.

@edeutsch
Copy link
Collaborator

edeutsch commented Nov 4, 2021

Current design:

        "node_bindings": {
          "n00": [
            {
              "id": "CHEMBL.COMPOUND:CHEMBL112"
            }
          ],
          "n01": [
            {
              "id": "UniProtKB:P05181"
            }
          ]
        },

First proposal:

        "node_bindings": {
          "n00": [
            {
              "id": "CHEMBL.COMPOUND:CHEMBL112",
              "original_id": "CHEMBL.COMPOUND:CHEMBL100"
            }
          ],
          "n01": [
            {
              "id": "UniProtKB:P05181"
            }
          ]
        },

(in the case where there are two input curies and one is a descendant of another, just have two entries in the list)

Second proposal:

        "node_bindings": {
          "n00": [
            {
              "id": { "CHEMBL.COMPOUND:CHEMBL112": "CHEMBL.COMPOUND:CHEMBL100" }
            }
          ],
          "n01": [
            {
              "id": "UniProtKB:P05181"
            }
          ]
        },

Note that one node in the KG could be a descendant of two input ids.

@edeutsch
Copy link
Collaborator

edeutsch commented Nov 4, 2021

Suppose there are two input query ids 100 and 200. The server returns 212, which is a descendant of both 100 and 200)
First proposal:

    "node_bindings": {
      "n00": [
        {
          "id": "CHEMBL.COMPOUND:CHEMBL212",
          "query_id": "CHEMBL.COMPOUND:CHEMBL100"
        },
        {
          "id": "CHEMBL.COMPOUND:CHEMBL212",
          "query_id": "CHEMBL.COMPOUND:CHEMBL200"
        }
      ],
      "n01": [
        {
          "id": "UniProtKB:P05181",
          "query_id": null
        }
      ]
    },

(in the case where there are two input curies and one is a descendant of another, just have two entries in the list)

@edeutsch
Copy link
Collaborator

edeutsch commented Nov 4, 2021

Suppose there are two input query ids 100 and 200. The server returns 212, which is a descendant of both 100 and 200)
Second proposal:

    "node_bindings": {
      "n00": [
        {
          "id": { "CHEMBL.COMPOUND:CHEMBL212": "CHEMBL.COMPOUND:CHEMBL100" }
        },
        {
          "id": { "CHEMBL.COMPOUND:CHEMBL212": "CHEMBL.COMPOUND:CHEMBL200" }
        }
      ],
      "n01": [
        {
          "id": { "UniProtKB:P05181": null }
        }
      ]
    },

Note that one node in the KG could be a descendant of two input ids.

@edeutsch
Copy link
Collaborator

edeutsch commented Nov 4, 2021

Third proposal:

"node_bindings": {
  "n00": {
      "CHEMBL.COMPOUND:CHEMBL212": {
         "query_id": [ "CHEMBL.COMPOUND:CHEMBL100", "CHEMBL.COMPOUND:CHEMBL200" ]
       }
   },
  "n01": {
      "UniProtKB:P05181": null
  }
},

@cbizon
Copy link
Contributor

cbizon commented Nov 4, 2021

Proposal 4:

"node_bindings": {
  "n00": {
      "CHEMBL.COMPOUND:CHEMBL212":  [
                 {"query_id": "CHEMBL.COMPOUND:CHEMBL100"}, 
                 {"query_id": "CHEMBL.COMPOUND:CHEMBL200"}
          ]
       }
   },
  "n01": {
      "UniProtKB:P05181": null
  }
},

@vgardner-renci
Copy link

vgardner-renci commented Nov 4, 2021

As of 10:30 11/18
Proposal 1: 6
Proposal 2: -2
Proposal 3: 0
Proposal 4: 3 pts

Heart = 2 pts
Thumbs up = 1 pt
Thumbs down = -1 pt

@edeutsch
Copy link
Collaborator

edeutsch commented Sep 8, 2022

addressed in PR #304

@edeutsch edeutsch closed this as completed Sep 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants