Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding should include MessageIndexes for parity with Java #248

Closed
pcoleman00st opened this issue Aug 4, 2023 · 1 comment
Closed

Encoding should include MessageIndexes for parity with Java #248

pcoleman00st opened this issue Aug 4, 2023 · 1 comment

Comments

@pcoleman00st
Copy link

There is a discrepancy between the way the java KafkaProtobufSerializer works and the current @kafkajs. We are told from Confluent's documentation that when serializing to Protobuf the message looks like:

0: 1-4: 5-End
[Magic_Byte] [RegistryId] [Serialized-Protobuf]

However this is not how the java serializer / deserializer is behaving; between the 5th byte and the start of the serialized protobuf they serialize the MessageIndex. In the case the .protofile contains just a single message then the 6th byte will be 0. In the case it contains several messages then the 6th byte is represents the size of the indexes collection, and the 7th byte represents the index of the message. E.g.: A .protofile containing 3 messages, were we serializing the first message then the 6th and 7th byte respectively would be 0x02,0x00, 2nd message: 0x02, 0x02, 3rd message 0x02, 0x04. It gets a bit more complex when dealing with nested types - nested types end up adding to the collection of message indexes to represent the nested type, so more bytes are used to denote the type hierarchy, and the byte prior to the serialized object represents the specific type as per above.

The difficulty here is that the java KafkaProtobuf(De)Serializer doesn't recognize those messages encoded by the confluent-schema-registry. The reason we chose Protobuf as a serialization technology was for speed, size but most importantly, cross platform. We lose this with the missing bytes representing MessageIndexes as per: io.confluent:kafka-protobuf-serializer:7.4.0 MessageIndexes L40, ProtoSchema L2157

Thanks

@pcoleman00st
Copy link
Author

Ooops, duplicate of #152, didn't see that sorry.

@pcoleman00st pcoleman00st closed this as not planned Won't fix, can't repro, duplicate, stale Aug 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant