-
Notifications
You must be signed in to change notification settings - Fork 6
LinkedSDGs demo application
LinkedSDGs is a demo web application utilizing linked data techniques and resources for connecting and discovering diverse types of SDG information resources.
The end-to-end information extraction and discovery pipeline implemented and supported in the LinkedSDGs app comprises the following steps:
- Extracting textual content from text documents (PDF, DOC, DOCX, HTML) - uploaded from the local machine or referenced via a URL.
- Extracting SDG-related concepts from UNBIS & EuroVoc SKOS taxonomies from the text obtained in step 1.
- Extracting geographic areas from text obtained in step 1.
- Querying the triple store with the SDG Knowledge Organization System to find SDG entities (goals, targets, indicators, series) related directly or indirectly to the extracted concepts.
- Exposing the data about those entities via a JSON-LD resource description API (see: Query endpoints).
- Exposing the snippets of statistical data corresponding to selected data series and the extracted geographic areas.
The key mechanism underpinning LinkedSDGs is the one described in the section Content linking. Essentially, as the SDG entities and the uploaded text documents are described using subject tags (SKOS concepts URIs) from the same taxonomies (UNBIS & EuroVoc) it is possible to find and analyse existing paths between such resources using standard graph queries. Depending on the frequency of extracted concepts and the number of paths linking those concepts to the SDG entities, we are further able to determine the relevance of each entity to the submitted document.
In the case of SDG entities, the subject tags are expected to be part of the existing, curated SDG Knowledge Organization System. Meanwhile, the tags over text documents are discovered by means of an automated concept extraction service. The use of well-curated, standard SKOS taxonomies (here UNBIS & EruoVoc) as the source of the subject tags, identifiable by URIs, is invaluable in this application, as the concepts are connected in semantically coherent graph structures (with sibling concepts/synonyms, broader concepts/parents) that can be traversed automatically with preservation of the intended meaning. Moreover, the multilingual labels for taxonomy concepts should enable easy transfer of concepts.
The combined value of the employed linked data artifacts and data processing services described above, enable discovery of the following sample path:
- Document: Conserve and sustainably use the oceans, seas and marine resources for sustainable development
- Identified text passage with a keyword included: "[...] beaches estuaries dune systems mangroves marshes lagoons swamps reefs etc are [...]"
- The UNBIS concept extracted from the keyword via its synonym: WETLANDS
- The path from the extracted concept to the subject tag associated with the SDG entity: WETLANDS -> SURFACE WATERS -> WATER
- The most relevant goal associated with the subject tag WATER: 06 Ensure availability and sustainable management of water and sanitation for all
- The Wikidata resource associated with the identified SDG 6 and exposed via the JSON-LD describe API: http://www.wikidata.org/entity/Q48741129.
The results of the concept extraction from the submitted text documents (concept labels + URIs from UNBIS and EuroVoc + relative importance weights) can be downloaded directly from the app and possibly reused in other contexts, for instance, in order to fuel keyword search services on different document archive portals, such as Voluntary National Reviews Database.
The following JSON file contains sample extraction results from the document: Conserve and sustainably use the oceans, seas and marine resources for sustainable development.