The role of Kerchunk in PVGIS 6 #581

NikosAlexandris · 2025-12-09T10:24:20Z

NikosAlexandris
Dec 9, 2025

Dear all @ fsspec/kerchunk

For the past four years I’ve worked on the PVGIS ¹ project at the Joint Research Centre, European Commission. PVGIS provides on‑demand and instant solar‑energy estimates. The secret of speed lies not in the programming language, but in a handcrafted chunked time series data strategy, pioneered almost two decades ago : files of 25 × 25 pixel chunks with values stored contiguously in time. Any increase in latency directly affects its users.

My vision was to replace the legacy C/C++ codebase with a future‑proof engine that still delivers results in under a second. The result is PVGIS 6 ²³, an all‑Python + NumPy prototype that demonstrates the Python ecosystem can support high‑performance scientific computing. The chunk-oriented thinking demonstrated in Kerchunk, Xarray, Zarr & friends, has been both backing-up the existing scheme behind PVGIS and inspiring for a transition to Python. A future-coming PVGIS service based on the new engine will read it's time series from appropriately chunked Zarr⁴ stores.

PVGIS is used daily by researchers, planners, and policymakers who need instant solar‑energy estimates for any location on Earth. It isn’t an industrial‑grade service, but the original mixed C/C++ backend began hitting random scalability limits, degrading performance. Re‑engineering the engine in pure Python + NumPy while preserving instant response times meant that the service could stay open, extensible, cost‑effective and relevant for the years to come. Kerchunk backed-up the crucial insight that smart data chunking—not raw compute power or language choice—was the key to that speed.

What's the relation with kerchunk however ? kerchunk & friends, including @martindurant, have been more than simple inspiration for PVGIS 6.

The single most important question to answer before embarking on a complete rewrite of the old PVGIS 5.x codebase ⁵ , was :

Is it possible to read multiple variables from multi-year hourly time-series over a single location (pixel), perform some calculations and serve the result in under a second ? And how ? ⁶
Yes, it is. By appropriately chunking the input time series, of course !

Is it necessary ?
It depends who you ask, is my humble view. But let's say that replicating the speed of the current service (PVGIS 5.x) was mandatory.
So, what about Kerchunk ?
PVGIS 6 does not need kerchunk to do this. Nonetheless, kerchunk was the entry gate to experimental work and a learning process that eventually led to PVGIS 6. In the end, it's all about chunking, isn't it ?
Have you thought of using Kerchunk to... ?

A fairly large amount of time I spent in experimenting with Kerchunk. Aiming at a most economical data storage solution, I was hoping to be able to read fast large time series ⁷ that are originally split in multiple NetCDF files, by feeding a single Kerchunk-generated index--exposed as a Zarr store--to Xarray's open_dataset function. Thought this to be possible with the existing powers of Kerchunk. However, there was a hook : data in NetCDF files are usually compressed, hence by definition chunked ⁸. Therefore time needs to be spent in decompressing before reading and loading values in memory. It is likely impossible to achieve a split-of-a-second speed to read large time series from multiple input files. PVGIS 6 was getting more challenging, more exciting.

I’m grateful to the Kerchunk authors and community! 🙏🏼

Nikos

ps- For the records, the first time I got hinted about kerchunk was by @pmav. Reading, learning, asking many questions (of mine in this repository [0]) was next. Also writing https://github.com/NikosAlexandris/rekxx (for which I am not happy that I could not merge in kerchunk itself (yet?)). Of course, many posts in the pangeo forum helped, more than a bit. The answer was out there already ! The work to do was to piece together different software and the right data structure.

ps2- I've posted about PVGIS 6 in LinkedIn

https://joint-research-centre.ec.europa.eu/photovoltaic-geographical-information-system-pvgis_en ↩
https://code.europa.eu/pvgis/pvgis ↩
https://github.com/fsspec/kerchunk/issues?q=involves%3Anikosalexandris ↩
Zarr gets smarter and more efficient : https://discourse.pangeo.io/t/new-cloud-tensor-i-o-benchmarks-zarr-is-fast-now/5459. ↩
in-house "proprietary" code in C/C++ [^3] ↩
the larger part of it was published as a GRASS GIS module called r.sun ↩
say yearly data of hourly time series ↩
a chunk is then the atomic unit of comressed data, is my understanding ↩

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The role of Kerchunk in PVGIS 6 #581

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

The role of Kerchunk in PVGIS 6 #581

Uh oh!

NikosAlexandris Dec 9, 2025

Nikos

Footnotes

Replies: 0 comments

NikosAlexandris
Dec 9, 2025