A set of tools to extract raw data from scientific instruments into standardised DataTree
in-memory objects, or into NetCDF
files on disk. The resulting data is annotated with metadata, provenance information, timestamps, units, and uncertainties. Currently developed at the ConCat lab at Technische Universität Berlin (Berlin, DE) and the Materials for Energy Conversion lab at Empa (Dübendorf, CH).
- Extraction of chromatography data from gas and liquid chromatograms. Supports several Agilent, EZChrom, Masshunter, and Fusion formats.
- Extraction of electrochemical data from electrochemistry and battery cycling experiments. Supports BioLogic file formats.
- Extraction of reflection coefficient traces from network analysers. Supports the Touchstone file format.
- Extraction of spectroscopy files including common XPS, XRD and MS formats.
- Extraction of tabulated data using CSV parsing functionality, including Bronkhorst and DryCal output formats.
Additionally, data from multiple files of the same type, or even of different types, can be easily and reproducibly combined into a single DataTree
by using process and preset modes of yadg.
- timezone-aware timestamp processing using Unix timestamps
- locale-aware processing of files
- automatic uncertainty determination using data contained in the raw files, instrument specification, or last significant digit
- tagging of all data with units
- annotation with processing metadata (such as provenance or extraction date) under the
yadg_⋅⋅⋅
namespace - original metadata from the extracted files is stored under
original_metadata
- extensive dataschema validation using provided specifications
The full list of capabilities and features is listed in the project documentation.
The released versions of yadg
are available on the Python Package Index (PyPI) under yadg. Those can be installed using:
pip install yadg
If you wish to install the current development version as an editable installation, check out the main
branch using git, and install yadg
as an editable package using pip:
git clone [email protected]:dgbowl/yadg.git
cd yadg
pip install -e .
Additional targets yadg[testing]
and yadg[docs]
are available and can be specified in the above commands, if testing and/or documentation capabilities are required.
After installing yadg
, you can extract data from single files of known filetypes using:
yadg extract <filetype> <infile> [outfile]
This will write the data extracted from the infile
into a NetCDF file called outfile
. An example usage for BioLogic MPR files would be:
yadg extract eclab.mpr example_file.mpr output_file.nc
Alternatively, you can obtain a DataTree
object in Python via:
import yadg
yadg.extractors.extract(filetype=<filetype>, path=<infile>)
More detailed usage instructions are available in the project documentation.
This project has received funding from the following sources:
- European Union’s Horizon 2020 programme under grant agreement No 957189.
- DFG's Emmy Noether Programme under grant number 490703766.
The project is also part of BATTERY 2030+, the large-scale European research initiative for inventing the sustainable batteries of the future.