Skip to content

perf(rocksdb): Add VoidNamespace fast-path and reusable read buffer#2

Open
nateab wants to merge 1 commit intofeature/rocksdb-zerocopy-key-serializationfrom
feature/rocksdb-additional-optimizations
Open

perf(rocksdb): Add VoidNamespace fast-path and reusable read buffer#2
nateab wants to merge 1 commit intofeature/rocksdb-zerocopy-key-serializationfrom
feature/rocksdb-additional-optimizations

Conversation

@nateab
Copy link
Owner

@nateab nateab commented Feb 5, 2026

Summary

  • Add VoidNamespace fast-path in SerializedCompositeKeyBuilder — short-circuits namespace serialization for the most common namespace type (non-windowed state) by writing the single byte directly instead of full serializer dispatch
  • Add reusable value buffer for RocksDB read operations — replaces the allocating db.get(CF, key) API with db.get(CF, ReadOptions, key, keyOff, keyLen, valueBuf, valOff, valLen) which writes into a pre-allocated reusable buffer, eliminating one byte[] allocation per state read
  • Applied to RocksDBValueState.value(), RocksDBMapState.get()/contains(), and AbstractRocksDBAppendingState.getInternal() (reducing/aggregating states)

Benchmark Results (vs master)

Benchmark Master Optimized Change
ValueStateBenchmark.valueGet 793.7 ± 45.6 ops/ms 921.0 ± 63.9 ops/ms +16.0%
MapStateBenchmark.mapGet 97.4 ± 2.3 ops/ms 102.9 ± 4.4 ops/ms +5.6%

Note: These results include the cumulative effect of all optimizations on feature/rocksdb-zerocopy-key-serialization (write-path ByteBuffer API) plus this PR's read-path changes.

Design Decisions

  • byte[] pre-allocated buffer instead of ByteBuffer for reads: The db.get(ColumnFamilyHandle, ReadOptions, ByteBuffer, ByteBuffer) variant requires direct ByteBuffers only (assertion in FRocksDB 8.10.0). The byte[] variant avoids this complexity while achieving the same allocation elimination.
  • Buffer growth strategy: Start at 128 bytes, grow to exact size on first overflow. Amortized O(1) since values typically stabilize.
  • Not applied to ListState.getInternal(): Uses ListDelimitedSerializer.deserializeList() which needs full byte[]. Follow-up optimization.
  • Not applied to mergeNamespaces(): Loop with changing keys, low incremental gain.

Test plan

  • Full RocksDB state backend test suite (839/839 passing, 0 failures)
  • JMH benchmarks show improvement on read operations
  • No regression on write operations

Add two optimizations to the RocksDB state backend:

1. VoidNamespace fast-path in SerializedCompositeKeyBuilder: Short-circuit
   namespace serialization for VoidNamespace (most common in non-windowed
   state) by writing the single byte directly instead of going through
   full serializer dispatch.

2. Reusable value buffer for read operations: Replace the allocating
   db.get(CF, key) API with db.get(CF, ReadOptions, key, keyOff, keyLen,
   valueBuf, valOff, valLen) which writes into a pre-allocated reusable
   buffer, eliminating one byte[] allocation per state read. Applied to:
   - RocksDBValueState.value()
   - RocksDBMapState.get() and contains()
   - AbstractRocksDBAppendingState.getInternal() (reducing/aggregating)

JMH benchmark results vs master:
  valueGet: +16.0% (793 -> 921 ops/ms)
  mapGet:   +5.6%  (97 -> 103 ops/ms)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant