Compute and manipulate histograms from XArray data using BoostHistogram
This package allow to compute histograms from XArray data, taking advantage of its label and dimensions management. It relies on the Boost Histogram library for the computations.
It is essentially a thin wrapper using directly Boost Histogram on loaded data, or Dask-histogram on data contained in Dask arrays. It thus features optimized performance, as well as lazy computation and easy up-scaling thanks to Dask.
Bins can be specified similarly to Numpy functions:
import xarray_histogram as xh
hist = xh.histogram(data, bins=[(100, 0., 10.)])
but also using boost axes, benefiting from their features:
import boost_histogram.axis as bha
hist = xh.histogram(data, bins=[bha.Regular(100, 0., 10.)])
Multi-dimensional histogram can be computed, here in 2D for instance:
hist = xh.histogram(
temp, chlorophyll,
bins=[bha.Regular(100, -5., 40.), bha.Regular(100, 1e-3, 10, transform=bha.transform.log))
)
Finally, so far we have computed histograms on the whole flattened arrays, but we can compute only along some dimensions. For instance we can retrieve the time evolution of an histogram:
hist = xh.histogram(temp, bins=[bha.Regular(100, 0., 10.)], dims=['lat', 'lon'])
Histograms can be normalized, and weights can be applied. All of this works seamlessly with data stored in numpy or dask arrays.
An Xarray accessor can be made available to do certain manipulations on histogram data. Simply import xarray_histogram.accessor
, all arrays can then access methods through the hist
property::
hist.hist.edges()
hist.hist.median()
hist.hist.ppf(q=0.75)
See the accessor API for more details.
- Python >= 3.11
- numpy
- xarray
- boost-histogram
- dask and dask-histogram (optional)
- scipy (optional)
From source with:
git clone https://github.com/Descanonge/xarray-histogram
cd xarray-histogram
pypi install -e .
Soon on Pypi.
Documentation available at https://xarray-histogram.readthedocs.io
Soon from PyPI ... 🚧
From source:
git clone https://github.com/Descanonge/xarray-histogram
cd xarray-histogram
pypi install -e .
To compare performances check these notebooks for numpy and dask arrays.
xhistogram already exists. It relies on Numpy functions and thus does not benefit of some performance upgrades brought by Boost (see performance comparisons). I also hoped to bring similar features with simpler code, relying on dependencies. Some additional features of boost (overflow bins, rebinning, extracting various statistics from the DataArray histogram) could be added (this is in the works).