Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Choice of Zarr compressor #1661

Open
VeckoTheGecko opened this issue Aug 20, 2024 · 4 comments
Open

Choice of Zarr compressor #1661

VeckoTheGecko opened this issue Aug 20, 2024 · 4 comments

Comments

@VeckoTheGecko
Copy link
Contributor

VeckoTheGecko commented Aug 20, 2024

The writing of the zarr file in particlefile.py doesn't look to set a compressor. Setting a compressor can significantly help trading off compute for storage or vice versa [1]. The default chosen by zarr is the Blosc compressor, which is a "meta-compressor" (using different algorithms under the hood). Perhaps its worth investigating other compressors to see if there's one that is best suited for our simulation output.

@erikvansebille do you know if zarr compressors have been a topic of discussion before?

@VeckoTheGecko VeckoTheGecko converted this from a draft issue Aug 20, 2024
@JamiePringle
Copy link
Collaborator

All the Zarr files I have made from parcels have been compressed. You need not set a compressor; the default is to compress. For efficiency I have found the correct setting of chunk size to be more important.

@VeckoTheGecko
Copy link
Contributor Author

VeckoTheGecko commented Aug 20, 2024

I mean the type of compression that is used (there are various available compressors zarr-developers/numcodecs). I agree about chunk size, but it would be good to also investigate the compression algorithm used as I think we just rely on the default which may not be best given our data.

@CKehl
Copy link
Contributor

CKehl commented Sep 23, 2024

With all due respect, I get the impression you miss the point of the paper linked above. The high I/O-time is attributed in major parts to the loading of the fieldsets. Correct if I am mistaken, but that doesn't have anything to do with the zarray compressor of the particle file, does it ? I wish you get the answers you seek through your benchmarking approach.

@VeckoTheGecko
Copy link
Contributor Author

Thanks for clarifying! :)

I haven't had the time yet to fully go through the paper. I've updated the description here to match.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

3 participants