-
Notifications
You must be signed in to change notification settings - Fork 14
Open
Description
Hi y'all!
I started a (VERY EARLY PROTOTYPE) that implements serialization to apache avro.
I think it would be a good alternative to json with more efficient disk usage.
https://github.com/jspaezp/avrospeclib
I am still implementing the schema using pydantic and deriving form it the
avro schema.
Some disk usage metrics on a reasonably large speclib I have
# ~ 50MB binary speclib file from diann
# 552M tmp/speclib_out.tsv
# 448M tmp/speclib_out.mzlib.json # using mzspeclib
# 148M tests/data/test.mzlib.avro
Read-write speeds
avro write: 4.832904
avro read: 6.133625
json write: 6.304285
json read: 4.992042
pydantic validation: 19.415933 # Not needed for avro because schema is on-write.
let me know if there is any interest in adopting it!
best!
Metadata
Metadata
Assignees
Labels
No labels