Standalone workflow to create national scale open-data packages from global open datasets.
Get the latest code by cloning this repository:
git clone [email protected]:nismod/irv-datapkg.gitor
git clone https://github.com/nismod/irv-datapkg.gitInstall Python and packages - suggest using micromamba:
micromamba create -f environment.ymlActivate the environment:
micromamba activate datapkgThe data packages are produced using a
snakemake workflow.
The workflow expects ZENODO_TOKEN, CDSAPI_KEY and CDSAPI_URL to be set as
environment variables - these must be set before running any workflow steps.
If not interacting with Zenodo or the Copernicus Climate Data Store, these can be dummy strings:
echo "placeholder" > ZENODO_TOKEN
echo "https://cds-beta.climate.copernicus.eu/api" > CDSAPI_URL
echo "test" > CDSAPI_KEYSee Climate Data Store API docs and Zenodo API docs for access details.
Export from the file to the environment:
export ZENODO_TOKEN=$(cat ZENODO_TOKEN)
export CDSAPI_KEY=$(cat CDSAPI_KEY)
export CDSAPI_URL=$(cat CDSAPI_URL)Check what will be run, if we ask for everything produced by the rule all,
before running the workflow for real:
snakemake --dry-run allRun the workflow, asking for all, using 8 cores, with verbose log messages:
snakemake --cores 8 --verbose allTo publish, first create a Zenodo token,
save it and export it as the ZENODO_TOKEN environment variable.
Upload a single data package:
snakemake --cores 1 zenodo/GBR.depositedPublish (cannot be undone) either programmatically:
snakemake --cores 1 zenodo/GBR.publishedOr after review online, through the Zenodo website (sandbox, live)
To get a quick list of DOIs from the Zenodo package json:
cat zenodo/*.deposition.json | jq '.metadata.prereserve_doi.doi'To generate records.csv with details of published packages:
python scripts/published_metadata.pyIn case of warnings about GDAL_DATA not being set, try running:
export GDAL_DATA=$(gdal-config --datadir)To format the workflow definition Snakefile:
snakefmt SnakefileTo format the Python helper scripts:
black scriptsThese Python libraries may be a useful place to start analysis of the data in the packages produced by this workflow:
snkithelps clean network datanismod-snailis designed to help implement infrastructure exposure, damage and risk calculations
The open-gira repository contains a larger
workflow for global-scale open-data infrastructure risk and resilience analysis.
MIT License, Copyright (c) 2023 Tom Russell and irv-datapkg contributors
This research received funding from the FCDO Climate Compatible Growth Programme. The views expressed here do not necessarily reflect the UK government's official policies.