Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trial and write documentation about whether custom Wikibase instructions can also be used to customize Wikidata queries #184

Closed
ross-spencer opened this issue Apr 27, 2022 · 2 comments
Labels

Comments

@ross-spencer
Copy link
Collaborator

This came up in the #AusPreserves meeting. If a SPARQL query can be customized then there is additional flexibility for users. Related to #183 it can also reduce the load on Wikidata during -harvest where they already have a lot of stress on their back-end to deliver results.

It dawned on me that while the Wikidata query is compiled with Siegfried, the custom Wikibase effort could potentially be used to connect to Wikidata proper but using a slightly modified query. Those instructions are here:

We'd just need to change the query to match that expected by the WDQS and make sure the URIs we connect to are correct, including port information.

Potential customization would rely on only reducing what is output from the existing SPARQL query, i.e. filtering.

Examples may be:

  • Filter the graph to only give me results from the TrID database.
  • Filter results to only give me audiovisual records.

Recording that idea here as a potential docs improvement.

@ross-spencer ross-spencer changed the title Trial write documentation about whether custom Wikibase instructions can also be used to customize Wikidata queries Trial and write documentation about whether custom Wikibase instructions can also be used to customize Wikidata queries Apr 27, 2022
@ross-spencer
Copy link
Collaborator Author

This looks like it will work and will make it into the documentation. Unfortunately I am hitting up against Wikidata rate limiting today.

Connect string: roy harvest -wikidata -wikidataendpoint https://query.wikidata.org/sparql? -wikibaseurl https://www.wikidata.org/w/api.php

wikibase.json:

{
 "PronomProp": "http://www.wikidata.org/entity/Q35432091",
 "BofProp": "http://www.wikidata.org/entity/Q35436009",
 "EofProp": "http://www.wikidata.org/entity/Q1148480"
}

TrID query:

# Return all file format records from Wikidata.
#
# Custom query example:
#
# All formats must have a signature.
# All signatures must come from the TrID Q41799265 reference.
#
# NB. Keep in mind all optional fields as they increase the
# number of fields where schemas aren't consistent across entries.
#
SELECT DISTINCT ?uri ?uriLabel ?puid ?extension ?mimetype ?encoding ?referenceLabel ?date ?relativity ?offset ?sig WHERE {
  ?uri (wdt:P31/(wdt:P279*)) wd:Q235557.
  OPTIONAL { ?uri wdt:P2748 ?puid. }
  OPTIONAL { ?uri wdt:P1195 ?extension. }
  OPTIONAL { ?uri wdt:P1163 ?mimetype. }
  ?uri p:P4152 ?object.
  ?object ps:P4152 ?sig;
    prov:wasDerivedFrom ?provenance.
  ?provenance pr:P248 wd:Q41799265, ?reference.  # <-- modified to return TrID only, and TrID's reference label.
  OPTIONAL { ?provenance pr:P813 ?date. }
  OPTIONAL { ?object pq:P3294 ?encoding. }
  OPTIONAL { ?object pq:P2210 ?relativity. }
  OPTIONAL { ?object pq:P4153 ?offset. }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE], en". }
}
ORDER BY (?uri)

Example output:

---
siegfried   : 1.9.2
scandate    : 2022-09-07T11:55:20+02:00
signature   : default.sig
created     : 2022-09-07T11:55:18+02:00
identifiers :
  - name    : 'wikidata'
    details : 'wikidata-definitions-3.0.0 (2022-09-07)'
---
filename : 'trid'
filesize : 6
modified : 2022-09-07T11:55:14+02:00
errors   :
matches  :
  - ns        : 'wikidata'
    id        : 'Q100137240'
    format    : 'VariCAD Drawing'
    URI       : 'http://www.wikidata.org/entity/Q100137240'
    permalink : 'https://www.wikidata.org/w/api.php/w/index.php?oldid=1423314911&title=Q100137240'
    mime      : 'application/octet-stream'
    basis     : 'byte match at 0, 3 (TrID)'
    warning   : 'extension mismatch'

Filter the signature file by format type, raster-graphics:

# Return all file format records from Wikidata.
#
# Custom query example:
#
# Formats must be an instance of, or subclass of raster-graphics file format.
#
#
select distinct ?uri ?uriLabel ?puid ?extension ?mimetype ?encoding ?referenceLabel ?date ?relativity ?offset ?sig
where
{
  ?uri wdt:P31/wdt:P279* wd:Q235557.
  ?uri wdt:P31/wdt:P279* wd:Q105599390.    # <-- line added to return instance/sub-class of raster-graphics-format
  optional { ?uri wdt:P2748 ?puid.      }
  optional { ?uri wdt:P1195 ?extension. }
  optional { ?uri wdt:P1163 ?mimetype.  }
  optional { ?uri p:P4152 ?object;
    optional { ?object pq:P3294 ?encoding.   }
    optional { ?object ps:P4152 ?sig.        }
    optional { ?object pq:P2210 ?relativity. }
    optional { ?object pq:P4153 ?offset.     }
    optional { ?object prov:wasDerivedFrom ?provenance;
       optional { ?provenance pr:P248 ?reference;
                              pr:P813 ?date.
                }
    }
  }
  service wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE], en". }
}
order by ?uri
---
siegfried   : 1.9.2
scandate    : 2022-09-07T12:30:19+02:00
signature   : default.sig
created     : 2022-09-07T12:30:16+02:00
identifiers :
  - name    : 'wikidata'
    details : 'wikidata-definitions-3.0.0 (2022-09-07)'
---
filename : 'trid'
filesize : 10
modified : 2022-09-07T12:29:20+02:00
errors   :
matches  :
  - ns        : 'wikidata'
    id        : 'Q1143961'
    format    : 'JBIG2'
    URI       : 'http://www.wikidata.org/entity/Q1143961'
    permalink : 'https://www.wikidata.org/w/api.php/w/index.php?oldid=1526516378&title=Q1143961'
    mime      :
    basis     : 'byte match at 0, 8 (Gary Kessler''s File Signature Table (source date: 2017-08-08))'
    warning   : 'extension mismatch'

@ross-spencer
Copy link
Collaborator Author

May still be some typos here and there, but documentation here (feature complete! 🤘):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant