| Dataset | |
|---|---|
| Paper | |
| Tutorial notebook | |
| Version history | |
| Tests |
Open-access database of englacial temperature measurements compiled from data submissions and published literature. It is developed on GitHub and published to Zenodo. Version 1.0.0 was described in the following publication:
Mylène Jacquemart, Ethan Welty, Marcus Gastaldello, and Guillem Carcanade (2025). glenglat: a database of global englacial temperatures. Earth System Science Data 17(4): 1627–1666. https://doi.org/10.5194/essd-17-1627-2025
The dataset adheres to the Frictionless Data Package standard. The metadata in datapackage.yaml describes, in detail, the contents of the tabular data files in the data folder:
source.csv: Description of each data source (either a personal communication or the reference to a published study).borehole.csv: Description of each borehole (location, elevation, etc), linked tosource.csvviasource_idand less formally via source identifiers innotes.profile.csv: Description of each profile (date, etc), linked toborehole.csvviaborehole_idand tosource.csvviasource_idand less formally via source identifiers innotes.measurement.csv: Description of each measurement (depth and temperature), linked toprofile.csvviaborehole_idandprofile_id.
For boreholes with many profiles (e.g. from automated loggers), pairs of profile.csv and measurement.csv are stored separately in subfolders of data named {source.id}-{glacier}, where glacier is a simplified and kebab-cased version of the glacier name (e.g. flowers2022-little-kluane).
The folder sources, available on GitHub but omitted from dataset releases on Zenodo, contains subfolders (with names matching column source.id) with files that document how and from where the data was extracted.
Files with a .png, .jpg, or .pdf extension are figures, tables, maps, or text from the publication. Pairs of files with .pgw and .{png|jpg}.aux.xml extensions georeference a .{png|jpg} image, and files with .geojson extension are the subsequently-extracted spatial coordinates. Files with an .xml extension document how numeric values were extracted from maps and figures using Plot Digitizer. Of these, digitized temperature profiles are named {borehole.id}_{profile.id}{suffix} where borehole.id and profile.id are either a single value or a hyphenated range (e.g. 1-8). Those without the optional suffix use temperature and depth as axis names. Those with a suffix are unusual cases which, for example, may be part of a series (e.g. _lower) or use a non-standard axis (e.g. _date). Finally, digitized temperature profiles named {borehole.id}_{depth}m are continuous timeseries of temperature at discrete depths and use (decimal) year and temperature as axis names.
The repository's license does not extend to figures, tables, maps, or text extracted from publications. These are included in the sources folder for transparency and reproducibility.
We welcome submissions of new data as well as corrections and improvements to existing data.
To submit data, send an email to [email protected] or open a GitHub issue. Please structure your data as either comma-separated values (CSV) files (borehole.csv and measurement.csv) or as an Excel file (with sheets borehole and measurement). The required and optional columns for each table are described below and in the submission metadata: submission/datapackage.yaml. Consider using our handy Excel template: submission/template.xlsx! For corrections and improvements to existing data, you can also describe the changes that need to be made or make the changes directly via a GitHub pull request.
By submitting data to glenglat, you agree to be listed as a contributor in the metadata and have your original submission preserved in the sources folder. You will also be invited to become a co-author on the next dataset release (with the option to opt-out or suggest someone in your stead) and asked to confirm your name, affiliation, funding sources, and that your data was correctly integrated into glenglat. Unless you opt-out, you will remain an author on all subsequent releases in which your data still appears.
| name | description | type | constraints |
|---|---|---|---|
id |
Unique identifier. | integer | required: True unique: True minimum: 1 |
glacier_name |
Glacier or ice cap name (as reported). | string | required: True pattern: [^\s]+( [^\s]+)* |
glims_id |
Global Land Ice Measurements from Space (GLIMS) glacier identifier. | string | pattern: G[0-9]{6}E[0-9]{5}[NS] |
latitude |
Latitude (EPSG 4326). | number [degree] | required: True minimum: -90 maximum: 90 |
longitude |
Longitude (EPSG 4326). | number [degree] | required: True minimum: -180 maximum: 180 |
elevation |
Elevation above sea level. | number [m] | required: True maximum: 9999.0 |
mass_balance_area |
Mass balance area. - ablation: Ablation area - equilibrium: Near the equilibrium line - accumulation: Accumulation area |
string | enum: ['ablation', 'equilibrium', 'accumulation'] |
label |
Borehole name (e.g. as labeled on a plot). | string | |
date_min |
Begin date of drilling, or if not known precisely, the first possible date (e.g. 2019 → 2019-01-01). | date | format: %Y-%m-%d |
date_max |
End date of drilling, or if not known precisely, the last possible date (e.g. 2019 → 2019-12-31). | date | format: %Y-%m-%d |
drill_method |
Drilling method. - mechanical: Push, percussion, rotary - thermal: Hot point, electrothermal, steam - combined: Mechanical and thermal |
string | enum: ['mechanical', 'thermal', 'combined'] |
ice_depth |
Starting depth of continuous ice. Infinity (INF) indicates that only snow, firn, or intermittent ice was reached. | number [m] | |
depth |
Total borehole depth (not including drilling in the underlying bed). | number [m] | |
to_bed |
Whether the borehole reached the glacier bed. | boolean | |
temperature_uncertainty |
Estimated temperature uncertainty (as reported). | number [°C] | |
notes |
Additional remarks about the study site, the borehole, or the measurements therein. Literature references should be formatted as {url} or {author} {year} ({url}). | string | pattern: [^\s]+( [^\s]+)* |
investigators |
Names of people and/or agencies who performed the work, as a pipe-delimited list. Each entry is in the format 'person (agency; ...) {notes}', where only person or one agency is required. Person and agency may contain a latinized form in square brackets. | string | pattern: [^\s]+( [^\s]+)* |
funding |
Funding sources as a pipe-delimited list. Each entry is in the format 'funder [rorid] > award [number] url', where only funder is required and rorid is the funder's ROR (https://ror.org) ID (e.g. 01jtrvx49). | string | pattern: [^\s]+( [^\s]+)* |
| name | description | type | constraints |
|---|---|---|---|
borehole_id |
Borehole identifier. | integer | required: True |
depth |
Depth below the glacier surface. | number [m] | required: True |
temperature |
Temperature. | number [°C] | required: True |
date_min |
Measurement date, or if not known precisely, the first possible date (e.g. 2019 → 2019-01-01). | date | format: %Y-%m-%d |
date_max |
Measurement date, or if not known precisely, the last possible date (e.g. 2019 → 2019-12-31). | date | format: %Y-%m-%drequired: True |
time |
Measurement time. | time | format: %H:%M:%S |
utc_offset |
Time offset relative to Coordinated Universal Time (UTC). | number [h] | |
equilibrium |
Whether and how reported temperatures equilibrated following drilling. - true: Equilibrium was measured - estimated: Equilibrium was estimated (typically by extrapolation) - false: Equilibrium was not reached |
string | enum: ['true', 'estimated', 'false'] |
You can validate your CSV files (borehole.csv and measurement.csv) before submitting them using the frictionless Python package.
-
Clone this repository.
git clone https://github.com/mjacqu/glenglat.git cd glenglat -
Either install the
glenglat-submissionPython environment (withconda):conda env create --file submission/environment.yaml conda activate glenglat-submission
Or install
frictionlessinto an existing environment (withpip):pip install "frictionless~=5.13" -
Validate, fix any reported issues, and rejoice! (
path/to/csvsis the folder containing your CSV files)python submission/validate.py path/to/csvs
Clone this repository.
git clone https://github.com/mjacqu/glenglat
cd glenglatInstall the glenglat Python environment with conda (or the faster mamba):
conda env create --file environment.yaml
conda activate glenglator update it if it already exists:
conda env update --file environment.yaml
conda activate glenglatCopy .env.example to .env and set the (optional) environment variables.
cp .env.example .envGLIMS_PATH: Path to a GeoParquet file of glacier outlines from the GLIMS dataset with columnsgeometry(glacier outline) andglac_id(glacier id).ZENODO_SANDBOX_ACCESS_TOKEN: Access token for the Zenodo Sandbox (for testing). Register an account (if needed), then navigate to Account > Settings > Applications > Personal access tokens > New token and select scopesdeposit:actionsanddeposit:write.ZENODO_ACCESS_TOKEN: Access token for Zenodo. Follow the same steps as above, but on the real Zenodo.
Run all the tests in the tests folder.
pytestRun only fast tests with the --fast option and only slow tests with the --slow option.
pytest --fast
pytest --slowAn optional (slow) test checks that borehole.glims_id is consistent with borehole coordinates. To run, install geopandas and pyarrow and set the GLIMS_PATH environment variable before calling pytest.
conda install -c conda-forge geopandas=0.13 pyarrow
pytestThe glenglat.py module contains functions used to maintain the repository. They can be run from the command line as python glenglat.py {function}.
To update all generated submission instructions:
python glenglat.py write_submissionThis executes several functions:
write_submission_yaml: Buildssubmission/datapackage.yamlfromdatapackage.yaml.write_submission_md: Updates tables in thisREADME.mdfromsubmission/datapackage.yaml.write_submission_xlsx: Buildssubmission/template.xlsxfromsubmission/datapackage.yaml.
To select and write a subset of the data (e.g. to send to a contributor for review), use select_and_write_subset. The selection can be made by curator name (--curator) or source id (--source), optionally including secondary sources mentioned in notes columns (--secondary_sources), and the output can include source directories (--source_files).
python glenglat.py select_and_write_subset subsets/vantricht --curator='Lander Van Tricht' --secondary_sources --source_filesThe zenodo.py module contains functions used to prepare and publish the data to Zenodo. They can be run from the command line as python zenodo.py {function}.
To publish (as a draft) to the Zenodo Sandbox, set the ZENODO_SANDBOX_ACCESS_TOKEN environment variable and run:
python zenodo.py publish_to_zenodoTo publish (as a draft) to Zenodo, set the ZENODO_ACCESS_TOKEN environment variable, run the same command with --sandbox False, and follow the instructions. It will first check that the repository is on the main branch, has no uncommitted changes, that all tests pass, and that a commit has not already been tagged with the current datapackage version (function is_repo_publishable).
python zenodo.py publish_to_zenodo --sandbox FalseThe publish process executes several functions:
build_metadata_as_json: Builds a finalbuild/datapackage.jsonfromdatapackage.yamlwith filled placeholders forid(doi),created(timestamp), andtemporalCoverage(measurement date range).build_zenodo_readme: Buildsbuild/README.mdfromdatapackage.yaml.build_for_zenodo: Builds a glenglat release asbuild/glenglat-v{version}.zipfrom newbuild/datapackage.jsonandbuild/README.md(see above), and unchangedLICENSE.mdanddata/. The zip archive is extracted tobuild/glenglat-v{version}for review.render_zenodo_metadata: Prepares a metadata dictionary for upload to Zenodo.