Add Decimal32 and Decimal64 support to arrow-avro Reader #8255
+354
−136
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
Apache Avro’s
decimal
logical type annotates eitherbytes
orfixed
and carriesprecision
andscale
. Implementations should reject invalid combinations such asscale > precision
, and the underlying bytes are the two’s‑complement big‑endian representation of the unscaled integer. On the Arrow side, Rust now exposes first‑classDecimal32
,Decimal64
,Decimal128
, andDecimal256
data types with documented maximum precisions (9, 18, 38, 76 respectively). Until now,arrow-avro
decoded all Avro decimals to 128/256‑bit Arrow decimals, even when a narrower type would suffice.What changes are included in this PR?
arrow-avro/src/codec.rs
Codec::Decimal(precision, scale, _size)
to Arrow’sDecimal32
/64
/128
/256
by precision, preferring the narrowest type (≤9→32, ≤18→64, ≤38→128, otherwise 256).scale > precision
.precision
exceeds Arrow’s maximum (Decimal256).fixed
, check that declaredprecision
fits the byte width (≤4→max 9, ≤8→18, ≤16→38, ≤32→76).Codec::Decimal
to mentionDecimal32
/64
.arrow-avro/src/reader/record.rs
Add
Decoder::Decimal32
andDecoder::Decimal64
variants with corresponding builders (Decimal32Builder
,Decimal64Builder
).Builder selection:
Implement decode paths that sign‑extend Avro’s two’s‑complement payload to 4/8 bytes and append values to the new builders; update
append_null
/flush
for 32/64‑bit decimals.arrow-avro/src/reader/mod.rs
(tests)Expand
test_decimal
to assert that:Decimal32
; precision 10 map toDecimal64
;Decimal64
;Decimal128
.Add a nulls path test for bytes‑backed
Decimal32
.Are these changes tested?
Yes. Unit tests under
arrow-avro/src/reader/mod.rs
construct expectedDecimal32Array
/Decimal64Array
/Decimal128Array
withwith_precision_and_scale
, and compare against batches decoded from Avro files (including legacy fixed and bytes‑backed cases). The tests also exercise small batch sizes to cover buffering paths; a new Avro data file is added for higher‑width decimals.New Avro test file details:
These new Avro test files were created using this script: https://gist.github.com/jecsand838/3890349bdb33082a3e8fdcae3257eef7
There is also an arrow-testing PR for these new files: apache/arrow-testing#112
Are there any user-facing changes?
N/A due to
arrow-avro
not being public.