-
Notifications
You must be signed in to change notification settings - Fork 6
Dataset metadata description
The basic metadata of each dataset (series) should be described using either (or both) of the vocabularies:
Such metadata descriptions serve to provide an essential linkage between the abstract SDG entities, as represented in the SDG KOS, and the actual data relevant to their monitoring. This metadata can be then served via:
- JSON-LD-enabled APIs for applications;
- as structured data markup in the code of static HTML websites about those datasets, to facilitate indexing datasets by Google Dataset Search engine.
An example description of a dataset in schema.org as a JSON-LD object is presented below. It follows the guidelines by Google Dataset Search.
{
"@context":"https://schema.org/",
"@id":"http://metadata.un.org/sdg/SG_HAZ_CMRMNTRL",
"@type":"Dataset",
"name":"Series SG_HAZ_CMRMNTRL: Compliance with the Montreal Protocol on hazardous waste and other chemicals.",
"description":"Series SG_HAZ_CMRMNTRL (Indicator 12.4.1): Compliance with the Montreal Protocol on hazardous waste and other chemicals.",
"identifier": "SG_HAZ_CMRMNTRL",
"version": "2018.Q2.G.01",
"url":"http://metadata.un.org/sdg/SG_HAZ_CMRMNTRL",
"sameAs":"http://www.sdg.org/datasets/aa7580f18abc4ba39980a815a936040b_0",
"keywords":[
"INDUSTRY > WASTE DISPOSAL > HAZARDOUS WASTE",
"POLITICAL AND LEGAL QUESTIONS > INTERNATIONAL LAW > INTERNATIONAL INSTRUMENTS"
],
"creator":{
"@type":"Organization",
"url": "https://unstats.un.org",
"name":"United Nations Statistics Division Statistical Services Branch",
"contactPoint":{
"@type":"ContactPoint",
"contactType": "customer service",
"telephone":"+1 (212) 963 9851",
"email":"[email protected]"
}
},
"includedInDataCatalog":{
"@type":"DataCatalog",
"name":"unstats-undesa.opendata.arcgis.com"
},
"distribution":[
{
"@type":"DataDownload",
"encodingFormat":"CSV",
"contentUrl":"https://opendata.arcgis.com/datasets/aa7580f18abc4ba39980a815a936040b_0.csv"
},
{
"@type":"DataDownload",
"encodingFormat":"GDB",
"contentUrl":"https://opendata.arcgis.com/datasets/aa7580f18abc4ba39980a815a936040b_0.gdb"
}
],
"temporalCoverage":"2018-04-01/2018-06-30",
"spatialCoverage":{
"@type":"Place",
"geo":{
"@type":"GeoShape",
"box":"-180.0 -85.06 180.0 85.06"
}
}
}
Note, that as JSON-LD can be seen as merely a different serialization of the RDF data model, descriptions as above can be naturally connected to the rest of the SDG knowledge organization system by shared identifiers of data series (e.g.: http://metadata.un.org/sdg/SG_HAZ_CMRMNTRL
) and additional mappings from the SDG ontology to the schema.org vocabulary that have been introduced.
The currently available set of dataset metadata descriptions is included in the file: sdg-dataset-metadata.jsonld (or in TTL sdg-dataset-metadata.ttl). Note that the version of series described in that way is older than the one used in SDG KOS.
An analogical description can be achieved using the DCAT vocabulary, which is an alternative vocabulary claimed to be supported by the Google Dataset Search service for for facilitating discovery of datasets online. Below is the corresponding description of the series http://metadata.un.org/sdg/SG_HAZ_CMRMNTRL
:
{
"@context": {
"@vocab": "http://www.w3.org/ns/dcat#",
"foaf": "http://xmlns.com/foaf/0.1/",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"vcard": "http://www.w3.org/2006/vcard/ns#",
"dct": "http://purl.org/dc/terms/",
"locn": "http://www.w3.org/ns/locn#",
"geo": "http://www.opengis.net/ont/geosparql#"
},
"@id":"http://metadata.un.org/sdg/SG_HAZ_CMRMNTRL",
"@type":"Dataset",
"title":"Series SG_HAZ_CMRMNTRL: Compliance with the Montreal Protocol on hazardous waste and other chemicals.",
"description":"Series SG_HAZ_CMRMNTRL (Indicator 12.4.1): Compliance with the Montreal Protocol on hazardous waste and other chemicals.",
"identifier": "SG_HAZ_CMRMNTRL",
"keyword":[
"INDUSTRY > WASTE DISPOSAL > HAZARDOUS WASTE",
"POLITICAL AND LEGAL QUESTIONS > INTERNATIONAL LAW > INTERNATIONAL INSTRUMENTS"
],
"landingPage":"http://www.sdg.org/datasets/aa7580f18abc4ba39980a815a936040b_0",
"publisher":{
"@type":"foaf:Organization",
"rdfs:label":"United Nations Statistics Division Statistical Services Branch"
},
"contactPoint":{
"@type":"vcard:Contact",
"vcard:tel":"+1 (212) 963 9851",
"vcard:email":"[email protected]"
},
"includedInDataCatalog":{
"@type":"DataCatalog",
"name":"unstats-undesa.opendata.arcgis.com"
},
"distribution":[
{
"@type":"Distribution",
"format":"CSV",
"downloadUrl":"https://opendata.arcgis.com/datasets/aa7580f18abc4ba39980a815a936040b_0.csv"
},
{
"@type":"Distribution",
"format":"GDB",
"downloadUrl":"https://opendata.arcgis.com/datasets/aa7580f18abc4ba39980a815a936040b_0.gdb"
}
],
"temporal":"2018-04-01/2018-06-30",
"spatial":{
"@type":"dct:Location",
"locn:geometry": {
"@type":"geo:wktLiteral",
"@value": "ENVELOPE (-180.0 -85.06 180.0 85.06)"
}
}
}
Note: since schema.org initiative has much more commercial backing at the moment and growing adoption momentum, we strongly recommend following the schema.org representation in the future.
Apart from describing the provenance- and subject-related metadata of datasets, covered above, W3C recommends also using additional vocabularies for describing the actual data structures and schemas of the datasets published online, in order to facilitate their automated processing. One such recommendation, CSV for the Web, supports descriptions of columns in CSV files on different levels of granularity. As an example, the following JSON-LD file, which should accompany the publication of the CSV dataset https://opendata.arcgis.com/datasets/328c14863e3147c0a57ffcdc1e2e47a0_0.csv, provides basic, machine-accessible list of titles of all the columns included in the CSV file:
{
"@context": "http://www.w3.org/ns/csvw",
"url": "https://opendata.arcgis.com/datasets/328c14863e3147c0a57ffcdc1e2e47a0_0.csv",
"tableSchema": {
"columns": [
{
"titles": "series_release"
},
{
"titles": "series_code"
},
{
"titles": "series_description"
},
{
"titles": "geoAreaCode"
},
{
"titles": "X"
},
{
"titles": "Y"
},
{
"titles": "ISO3CD"
},
{
"titles": "geoAreaName"
},
{
"titles": "sliceId"
},
{
"titles": "Age"
},
{
"titles": "Units"
},
{
"titles": "Age_description"
},
{
"titles": "Units_description"
},
{
"titles": "F2004"
},
{
"titles": "F2005"
},
{
"titles": "F2006"
},
{
"titles": "F2007"
},
{
"titles": "F2008"
},
{
"titles": "F2009"
},
{
"titles": "F2010"
},
{
"titles": "F2011"
},
{
"titles": "F2012"
},
{
"titles": "F2013"
},
{
"titles": "F2014"
},
{
"titles": "F2015"
},
{
"titles": "F2016"
},
{
"titles": "last_5_years_mean"
},
{
"titles": "latest_year"
},
{
"titles": "latest_value"
},
{
"titles": "latest_source"
},
{
"titles": "latest_nature"
},
{
"titles": "FID"
}
]
}
}
The dataset metadata descriptions, expressed as JSON-LD objects in terms of schema.org
or DCAT vocabularies, can be embedded in the HTML websites about the corresponding SDG series in order to be crawled and index by Google Dataset Search service.
Such HTML pages are currently published at: http://klarman.me/un-sdgs-pages/
. The script for generating them from the source JSON file, along with the pages themselves is available on the gh-pages
branch of this repository (*). On inspecting the source HTML code of any those pages (e.g., in Chrome you can right-click and choose View page source
) you can find the section containing the corresponding JSON-LD object within the <script type="application/ld+json"> ... </script>
tags. For instance, for http://klarman.me/un-sdgs-pages/series_html/SI_COV_VULN.html
you will find the following section in the HTML code:
<script type="application/ld+json">
{
"@id": "http://metadata.un.org/sdg/SI_COV_VULN",
"@type": "Dataset",
"name": "Proportion of vulnerable population receiving social assistance cash benefit (%)",
"description": "(Indicator: 1.3.1; Series: SI_COV_VULN) Proportion of vulnerable population receiving social assistance cash benefit (%)",
"identifier": "SI_COV_VULN",
"url": "http://www.sdg.org/datasets/indicator-1-3-1-proportion-of-vulnerable-population-receiving-social-assistance-cash-benefit-percent",
"sameAs": [
"http://www.sdg.org/datasets/5fc1939dcfa54d678a2c7bb654f558df",
"https://opendata.arcgis.com/datasets/5fc1939dcfa54d678a2c7bb654f558df"
],
"keywords": [
"standard of living",
"basic needs",
"STANDARD OF LIVING",
"SOCIAL WELFARE",
"POVERTY",
"BASIC NEEDS",
"poverty"
],
"creator": {
"@id": "http://metadata.un.org/sdg/unsd",
"@type": "Organization",
"url": "https://unstats.un.org",
"name": "United Nations Statistics Division Statistical Services Branch",
"contactPoint": {
"@type": "ContactPoint",
"contactType": "customer service",
"telephone": "+1 (212) 963 9851",
"email": "[email protected]"
}
},
"includedInDataCatalog": {
"@type": "DataCatalog",
"name": "Open SDG Data Hub",
"url": "http://www.sdg.org/#catalog"
},
"distribution": [
{
"@type": "DataDownload",
"encodingFormat": "KML",
"contentUrl": "https://opendata.arcgis.com/datasets/5fc1939dcfa54d678a2c7bb654f558df_0.kml"
},
{
"@type": "DataDownload",
"encodingFormat": "CSV",
"contentUrl": "https://opendata.arcgis.com/datasets/5fc1939dcfa54d678a2c7bb654f558df_0.csv"
},
{
"@type": "DataDownload",
"encodingFormat": "GDB",
"contentUrl": "https://opendata.arcgis.com/datasets/5fc1939dcfa54d678a2c7bb654f558df_0.gdb"
}
],
"@context": "http://schema.org/"
}
</script>
Furthermore, you can verify that the structured data is accessible to Google indexing mechanism, by using Googles Structured data testing tool.
Note that from there, the websites can be automatically published over GitHub by selecting gh-pages branch
as the source of GitHub pages in the settings of this repository (see GitHub help).