Skip to content

Commit

Permalink
[Parquet] Improve speed of dictionary encoding NaN float values (apac…
Browse files Browse the repository at this point in the history
…he#6953)

* Treat NaNs equal to NaN when interning for dictionary encoding

* Compare all values by bytes rather than adding Intern trait
  • Loading branch information
adamreeve authored and totoroyyb committed Jan 20, 2025
1 parent e261d4f commit c04ef9d
Showing 1 changed file with 3 additions and 2 deletions.
5 changes: 3 additions & 2 deletions parquet/src/util/interner.rs
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ const DEFAULT_DEDUP_CAPACITY: usize = 4096;
pub trait Storage {
type Key: Copy;

type Value: AsBytes + PartialEq + ?Sized;
type Value: AsBytes + ?Sized;

/// Gets an element by its key
fn get(&self, idx: Self::Key) -> &Self::Value;
Expand Down Expand Up @@ -66,7 +66,8 @@ impl<S: Storage> Interner<S> {
.dedup
.entry(
hash,
|index| value == self.storage.get(*index),
// Compare bytes rather than directly comparing values so NaNs can be interned
|index| value.as_bytes() == self.storage.get(*index).as_bytes(),
|key| self.state.hash_one(self.storage.get(*key).as_bytes()),
)
.or_insert_with(|| self.storage.push(value))
Expand Down

0 comments on commit c04ef9d

Please sign in to comment.