Export all federated queries to create a real-world benchmark for federated queries #40

vemonet · 2024-10-10T09:19:36Z

This repository contains a lot of complex federated queries to large endpoints.

It would be interesting to provide some instructions to easily export all federated queries to constitute a benchmark that could be used by federated query systems.

Another comparable benchmark would be: https://github.com/dice-group/LargeRDFBench

But this benchmark would provide queries that are actually used in the real world.

constraintAutomaton · 2024-10-28T12:52:06Z

I made this script to extract the queries @vemonet .

https://github.com/constraintAutomaton/sib-swiss-federated-query-extractor

I changed the queries provided in the repo because they do not seem to work with the data model. I used this one instead.

PREFIX sh: <http://www.w3.org/ns/shacl#>
PREFIX spex: <https://purl.expasy.org/sparql-examples/ontology#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?queryID ?federatedEndpoint ?comment ?query ?target  WHERE {
  ?queryID sh:select ?query .
  ?queryID spex:federatesWith ?federatedEndpoint .
  ?queryID rdfs:comment ?comment .
  ?queryID <https://schema.org/target> ?target
}

At least on my side no queries had more than one <https://schema.org/target> and spex:federatesWith seems to be matching the number of endpoint in the federation.

Query edited because I was getting the ones where the federation was at least 3 instead of 2.

constraintAutomaton · 2024-10-28T12:57:11Z

Maybe, I can document how I've done it and provide my repo as an example, after some cleanup. Unless, I made a mistake somewhere.

vemonet · 2024-10-29T14:13:43Z

Thanks @constraintAutomaton that's nice! A few remarks:

You forgot to also add the endpoint URL of the main endpoint on which the query is expected to run
It would be better to put all queries under a specific key, so we can directly iterate over them without having to filter out the metadata key
It seems like you are using the old convertToOneTurtle.sh bash script to compile all queries (https://github.com/constraintAutomaton/sib-swiss-federated-query-extractor/blob/main/init.sh), I would recommend to use the sparql-examples-utils.jar like documented in the README.md
This one is more of a detail but maybe use federatesWith instead of federatedEndpoint, to make it more consistent with the currently used predicate

Something a bit like:

{
  "queries": [ 
    {
    "uri": "https://www.bgee.org/sparql/.well-known/sparql-examples/020",
    "endpoint": "https://www.bgee.org/sparql/",
    "query": "PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\nPREFIX up: <http://purl.uniprot.org/core/>\nPREFIX genex: <http://purl.org/genex#>\nPREFIX obo: <http://purl.obolibrary.org/obo/>\nPREFIX orth: <http://purl.org/net/orth#>\nPREFIX dcterms: <http://purl.org/dc/terms/>\nPREFIX sio: <http://semanticscience.org/resource/>\n\nSELECT DISTINCT ?flyEnsemblGene ?orthologTaxon ?orthologEnsemblGene ?orthologOmaLink WHERE {\n\t{\n        SELECT DISTINCT ?gene ?flyEnsemblGene {\n        ?gene a orth:Gene ;\n            genex:isExpressedIn/rdfs:label 'eye' ;\n            orth:organism/obo:RO_0002162 ?taxon ;\n            dcterms:identifier ?flyEnsemblGene .\n        ?taxon up:commonName 'fruit fly' .\n        } LIMIT 100\n    }\n    SERVICE <https://sparql.omabrowser.org/sparql> {\n        ?protein2 a orth:Protein .\n        ?protein1 a orth:Protein .\n        ?clusterPrimates a orth:OrthologsCluster .\n        ?cluster a orth:OrthologsCluster ;\n            orth:hasHomologousMember ?node1 ;\n            orth:hasHomologousMember ?node2 .\n        ?node1 orth:hasHomologousMember* ?protein1 .\n        ?node2 orth:hasHomologousMember* ?clusterPrimates .\n        ?clusterPrimates orth:hasHomologousMember* ?protein2 .\n        ?protein1 sio:SIO_010079 ?gene . # is encoded by\n        ?protein2 rdfs:seeAlso ?orthologOmaLink ;\n            orth:organism/obo:RO_0002162 ?orthologTaxonUri ;\n            sio:SIO_010079 ?orthologGene . # is encoded by\n        ?clusterPrimates orth:hasTaxonomicRange ?taxRange .\n        ?taxRange orth:taxRange 'Primates' .\n        FILTER ( ?node1 != ?node2 )\n    }\n    ?orthologTaxonUri up:commonName ?orthologTaxon .\n    ?orthologGene dcterms:identifier ?orthologEnsemblGene .\n}",
    "description": "Which are the genes in Primates orthologous to a gene that is expressed in the fruit fly's eye?",
    "federatesWith": [
      "https://www.bgee.org/sparql/",
      "https://sparql.omabrowser.org/sparql"
    ],
    }
    ...
  ],
  "metadata": ...
  },

vemonet added documentation Improvements or additions to documentation good first issue Good for newcomers labels Oct 10, 2024

vemonet self-assigned this Oct 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Export all federated queries to create a real-world benchmark for federated queries #40

Export all federated queries to create a real-world benchmark for federated queries #40

vemonet commented Oct 10, 2024

constraintAutomaton commented Oct 28, 2024 •

edited

Loading

constraintAutomaton commented Oct 28, 2024

vemonet commented Oct 29, 2024

Export all federated queries to create a real-world benchmark for federated queries #40

Export all federated queries to create a real-world benchmark for federated queries #40

Comments

vemonet commented Oct 10, 2024

constraintAutomaton commented Oct 28, 2024 • edited Loading

constraintAutomaton commented Oct 28, 2024

vemonet commented Oct 29, 2024

constraintAutomaton commented Oct 28, 2024 •

edited

Loading