-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for compressed kinesis data #17083
base: master
Are you sure you want to change the base?
Conversation
@@ -657,7 +659,7 @@ public static OutputStream compress(final OutputStream in, final Format format) | |||
case XZ: return new XZCompressorOutputStream(in); | |||
case SNAPPY: return new FramedSnappyCompressorOutputStream(in); | |||
case ZSTD: return new ZstdCompressorOutputStream(in); | |||
case ZIP: return new ZipOutputStream(in, StandardCharsets.UTF_8); | |||
case ZIP: return new DeflaterOutputStream(in); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ZipOutputStream / ZipInputStream expect to take place on entries inside a ZipEntry. Since these helpers are intended for compressing/decompressing bytes directly, we instead use the DeflaterOutputStream / InflaterInputStream underlying compression classes for zip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, this doesn't make sense as a solution since zip isn't 'just' deflate.
An alternative would be to just use a dummy zipEntry with a single optimistic getNextEntry on decompress - or, perhaps, to just disable zip as a valid compression format for Kinesis since it doesn't make much sense in a record-streaming context.
@funguy-tech - let us know when this PR is ready for review. |
Implements #17062.
Description
Allows usage of built in compression formats for Kinesis ingestion based on an optional
compressionFormat
parameter inioConfig
.Unlike Kafka, Kinesis does not provide much by the means of data compression - it is a common industry pattern to compress Kinesis data across the wire with client-implemented decompression.
Changes
Prerequisite additions to CompressionUtils.java.
Added Jackson deserialization support to
Format
to enable simplified config exposure.Added general
compress
/decompress
utilities withByteBuffer
,InputStream
, andOutputStream
parametersAdded
compressionFormat
Enum to IOConfigBy linking to a specified enum, the field's values are limited at load time - invalid values are automatically rejected by existing Druid spec safeguards.
Field is safely fed through:
KinesisSupervisor
KinesisSupervisorIOConfig
KinesisSamplerSpec
KinesisTask
KinesisTaskIOConfig
KinesisRecordSupplier
Added handling logic to Kinesis Record Supplier
Discussion remains open on competing designs - this is more or less a starting point of available functionality to provoke discussion (or implement if satisfactory as-is).
Release note
Added support for compressed Kinesis streams. Users may specify a
compressionFormat
in IOConfig. Accepted values arebz2
,gz
,snappy
,xz
,zip
, andzstd
.Key changed/added classes in this PR
CompressionUtils
KinesisRecordSupplier
This PR has: