-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export all federated queries to create a real-world benchmark for federated queries #40
Comments
I made this script to extract the queries @vemonet . https://github.com/constraintAutomaton/sib-swiss-federated-query-extractor I changed the queries provided in the repo because they do not seem to work with the data model. I used this one instead. PREFIX sh: <http://www.w3.org/ns/shacl#>
PREFIX spex: <https://purl.expasy.org/sparql-examples/ontology#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?queryID ?federatedEndpoint ?comment ?query ?target WHERE {
?queryID sh:select ?query .
?queryID spex:federatesWith ?federatedEndpoint .
?queryID rdfs:comment ?comment .
?queryID <https://schema.org/target> ?target
} At least on my side no queries had more than one
|
Maybe, I can document how I've done it and provide my repo as an example, after some cleanup. Unless, I made a mistake somewhere. |
Thanks @constraintAutomaton that's nice! A few remarks:
Something a bit like: {
"queries": [
{
"uri": "https://www.bgee.org/sparql/.well-known/sparql-examples/020",
"endpoint": "https://www.bgee.org/sparql/",
"query": "PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\nPREFIX up: <http://purl.uniprot.org/core/>\nPREFIX genex: <http://purl.org/genex#>\nPREFIX obo: <http://purl.obolibrary.org/obo/>\nPREFIX orth: <http://purl.org/net/orth#>\nPREFIX dcterms: <http://purl.org/dc/terms/>\nPREFIX sio: <http://semanticscience.org/resource/>\n\nSELECT DISTINCT ?flyEnsemblGene ?orthologTaxon ?orthologEnsemblGene ?orthologOmaLink WHERE {\n\t{\n SELECT DISTINCT ?gene ?flyEnsemblGene {\n ?gene a orth:Gene ;\n genex:isExpressedIn/rdfs:label 'eye' ;\n orth:organism/obo:RO_0002162 ?taxon ;\n dcterms:identifier ?flyEnsemblGene .\n ?taxon up:commonName 'fruit fly' .\n } LIMIT 100\n }\n SERVICE <https://sparql.omabrowser.org/sparql> {\n ?protein2 a orth:Protein .\n ?protein1 a orth:Protein .\n ?clusterPrimates a orth:OrthologsCluster .\n ?cluster a orth:OrthologsCluster ;\n orth:hasHomologousMember ?node1 ;\n orth:hasHomologousMember ?node2 .\n ?node1 orth:hasHomologousMember* ?protein1 .\n ?node2 orth:hasHomologousMember* ?clusterPrimates .\n ?clusterPrimates orth:hasHomologousMember* ?protein2 .\n ?protein1 sio:SIO_010079 ?gene . # is encoded by\n ?protein2 rdfs:seeAlso ?orthologOmaLink ;\n orth:organism/obo:RO_0002162 ?orthologTaxonUri ;\n sio:SIO_010079 ?orthologGene . # is encoded by\n ?clusterPrimates orth:hasTaxonomicRange ?taxRange .\n ?taxRange orth:taxRange 'Primates' .\n FILTER ( ?node1 != ?node2 )\n }\n ?orthologTaxonUri up:commonName ?orthologTaxon .\n ?orthologGene dcterms:identifier ?orthologEnsemblGene .\n}",
"description": "Which are the genes in Primates orthologous to a gene that is expressed in the fruit fly's eye?",
"federatesWith": [
"https://www.bgee.org/sparql/",
"https://sparql.omabrowser.org/sparql"
],
}
...
],
"metadata": ...
}, |
This repository contains a lot of complex federated queries to large endpoints.
It would be interesting to provide some instructions to easily export all federated queries to constitute a benchmark that could be used by federated query systems.
Another comparable benchmark would be: https://github.com/dice-group/LargeRDFBench
But this benchmark would provide queries that are actually used in the real world.
The text was updated successfully, but these errors were encountered: