You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There is a discrepancy between the way the java KafkaProtobufSerializer works and the current @kafkajs. We are told from Confluent's documentation that when serializing to Protobuf the message looks like:
However this is not how the java serializer / deserializer is behaving; between the 5th byte and the start of the serialized protobuf they serialize the MessageIndex. In the case the .protofile contains just a single message then the 6th byte will be 0. In the case it contains several messages then the 6th byte is represents the size of the indexes collection, and the 7th byte represents the index of the message. E.g.: A .protofile containing 3 messages, were we serializing the first message then the 6th and 7th byte respectively would be 0x02,0x00, 2nd message: 0x02, 0x02, 3rd message 0x02, 0x04. It gets a bit more complex when dealing with nested types - nested types end up adding to the collection of message indexes to represent the nested type, so more bytes are used to denote the type hierarchy, and the byte prior to the serialized object represents the specific type as per above.
The difficulty here is that the java KafkaProtobuf(De)Serializer doesn't recognize those messages encoded by the confluent-schema-registry. The reason we chose Protobuf as a serialization technology was for speed, size but most importantly, cross platform. We lose this with the missing bytes representing MessageIndexes as per: io.confluent:kafka-protobuf-serializer:7.4.0 MessageIndexes L40, ProtoSchema L2157
Thanks
The text was updated successfully, but these errors were encountered:
There is a discrepancy between the way the java KafkaProtobufSerializer works and the current @kafkajs. We are told from Confluent's documentation that when serializing to Protobuf the message looks like:
0: 1-4: 5-End
[Magic_Byte] [RegistryId] [Serialized-Protobuf]
However this is not how the java serializer / deserializer is behaving; between the 5th byte and the start of the serialized protobuf they serialize the MessageIndex. In the case the .protofile contains just a single message then the 6th byte will be 0. In the case it contains several messages then the 6th byte is represents the size of the indexes collection, and the 7th byte represents the index of the message. E.g.: A .protofile containing 3 messages, were we serializing the first message then the 6th and 7th byte respectively would be 0x02,0x00, 2nd message: 0x02, 0x02, 3rd message 0x02, 0x04. It gets a bit more complex when dealing with nested types - nested types end up adding to the collection of message indexes to represent the nested type, so more bytes are used to denote the type hierarchy, and the byte prior to the serialized object represents the specific type as per above.
The difficulty here is that the java KafkaProtobuf(De)Serializer doesn't recognize those messages encoded by the confluent-schema-registry. The reason we chose Protobuf as a serialization technology was for speed, size but most importantly, cross platform. We lose this with the missing bytes representing MessageIndexes as per: io.confluent:kafka-protobuf-serializer:7.4.0 MessageIndexes L40, ProtoSchema L2157
Thanks
The text was updated successfully, but these errors were encountered: