Skip to content

Latest commit

 

History

History
38 lines (29 loc) · 3.02 KB

debezium-json-deserialization.md

File metadata and controls

38 lines (29 loc) · 3.02 KB

Debezium Json Deserialization

A Postgres' json or jsonb column doesn't really describe any schema. So when reading such a column, Debezium converts it in string. It means that part of your message is structured, part is nested as a JSON encoded string. This SMT will automatically decode these JSON strings and transforms them in structured records.

What are the trade-offs? The main thing is that you need all values of similar keys to have the exact same type throughout your records. If not all keys are present all the time, use optional-struct-fields to make all fields optional to prevent backward incompatible changes in the schema that would block your pipeline.

Usage

transforms=json
transforms.json.type=com.birdie.kafka.connect.smt.DebeziumJsonDeserializer
transforms.json.optional-struct-fields=true

Properties

Name Description Type Default
optional-struct-fields When true, all fields in structs are optional. This enables you to have slightly different types within each array item for example, so long that every field with the same name as the same type. Boolean false
union-previous-messages-schema When true, the schema will be merged with previous messages' schemas. If you have lots of different schema structures in your table, this will help reduce the number of schema versions being created. Boolean false
union-previous-messages-schema.log-union-errors When true, if two schemas can't be merged with one another, it will log an error instead of just considering it normal. Boolean false
union-previous-messages-schema.topic.{topic-name}.field.{field-name} An array of Kafka Connect schema to be used as initial schema(s) to unify messages with. It's an array in case some are incompatible, on the same field. You can get the serialized schema from the SMT logs as it processes messages. String ø
convert-numbers-to-double When true, all number fields in structs are converted to double. This avoids compatibility errors when some fields can contain both integers and floats. Boolean false
sanitize.field.names When true, sanitizes the fields name so they are compatible with Avro; like with Debezium. Boolean false
probabilistic-fast-path When true, the connectors will first try to map the message to one of the known schemas if it exists (works with union-previous-messages-schema). If that's your most common path, this yields much higher performances. Boolean false
ignored-fields An array of fields to be excluded from all schemas. E.g. "column_name.json_field_1,column_name.json_field_2.an_array[].sub_field" String ø

Benchmark

The latest benchmarks (MacBook Pro; 13-inch, 2019) shows the following:

  • Without union-previous-messages-schema, ~4700 ops/sec.
  • With union-previous-messages-schema, ~3500 ops/sec.
  • With probabilistic-fast-path activated and triggered all the time, ~16000 ops/sec.