Optimize binlog event deserialization #11

shirolimit · 2025-01-22T15:11:57Z

The PR aims to speed up MySQL binary log client by changing how events are deserialized.

Instead of reading data byte-by-byte from ByteArrayInputStream it fully buffers binlog event in memory as RawBinaryLogEvent and then uses BinaryLogEventDataReader for deserialization. The interface of BinaryLogEventDataReader is kept similar to library's custom ByteArrayInputStream to simplify migration.

In my experiments, this improves binlog extraction speed by up to 300% (in case we do nothing with extracted events).
I also tried buffering data in byte[] and wrapping it into ByteArrayInputStream, it was faster than existing approach but still 2 times slower than the proposed one.

The PR also introduces some minor optimizations to row parsing (e.g. datetime) and a bunch of unit tests for existing deserializers.

src/main/java/com/github/shyiko/mysql/binlog/event/deserialization/ColumnType.java

src/main/java/com/github/shyiko/mysql/binlog/event/RawBinaryLogEvent.java

src/main/java/com/github/shyiko/mysql/binlog/io/RawEventsReader.java

.../com/github/shyiko/mysql/binlog/event/deserialization/AbstractRowsEventDataDeserializer.java

shirolimit · 2025-01-23T10:07:58Z

Force-pushed after rebasing on the latest main.

.../com/github/shyiko/mysql/binlog/event/deserialization/AbstractRowsEventDataDeserializer.java

jmlw

I don't see any obvious issues; especially since the majority of changes are duplication to utilize a different input object

...main/java/com/github/shyiko/mysql/binlog/event/deserialization/BinaryLogEventDataReader.java

DylanFlanders

Some minor comments but overall it's looking good to me

.../java/com/github/shyiko/mysql/binlog/event/deserialization/BinaryLogEventDataReaderTest.java

...main/java/com/github/shyiko/mysql/binlog/event/deserialization/BinaryLogEventDataReader.java

DylanFlanders · 2025-01-29T05:10:50Z

...main/java/com/github/shyiko/mysql/binlog/event/deserialization/BinaryLogEventDataReader.java

+        return BitSet.valueOf(bytes);
+    }
+
+    public int available() {


I'm having trouble understanding the blockLength in the ByteArrayInputStream version of available(). Do you have some more insight there? It seems in the new data reader version, the buffer is the "block" and we resize the buffer limit in enterBlock. Whereas in the old input stream version, it seems a bit more hacky and has to block off portions of the input stream?

In ByteArrayInputStream blockLength was used as a running value, i.e. it contained the number of bytes available for the current block. The stream updated blockLength on every read but it couldn't go below zero unless we skipToTheEndOfTheBlock. In the end, what they tried to achieve is to avoid reading from a block boundary which is needed for parsing some events.

The reader version achieves the same by manipulating ByteBuffer's limit.

...om/github/shyiko/mysql/binlog/event/deserialization/UpdateRowsEventDataDeserializerTest.java

DylanFlanders · 2025-01-29T05:25:56Z

...om/github/shyiko/mysql/binlog/event/deserialization/UpdateRowsEventDataDeserializerTest.java

+            new BigDecimal("237.00"),
+            new BigDecimal("10.00"),
+            1,
+            0,


It looks like this covers a number of branches in AbstractRowsEventDataDeserializer#deserializeCell but not all. Are there some column types that are more difficult to add than others? I'd be happy to try to help here so we can cover AbstractRowsEventDataDeserializer changes

That's true, we don't cover all possible data types and branches here. I totally agree that ideally we should cover all the branches and methods but I decided not to do that to keep the PR smaller. Also, I didn't set a goal to improve the library's code coverage to 100%, just enough of them to be confident in my changes.

For row deserialization classes most of the changes are just overloads that use the same method names thus I decided to add only a few tests.

Would it be better to add more unit tests to cover all cases and branches?
It definitely would, the library is a crucial part of binlog syncs and we want it to be stable.

Is it required to add them to this PR?
I believe it shouldn't be a stopper considering the changes introduced to the class.

I'd propose to leave it as is for now and create a backlog task to improve library's code coverage.
What do you think?

Okay that makes sense to me. And thank you for the many tests that were added here!

shirolimit requested review from jmlw, DylanFlanders, fivetran-mattalexander and zekail January 22, 2025 15:11

fivetran-mattalexander reviewed Jan 22, 2025

View reviewed changes

src/main/java/com/github/shyiko/mysql/binlog/event/deserialization/ColumnType.java Show resolved Hide resolved

DylanFlanders reviewed Jan 22, 2025

View reviewed changes

src/main/java/com/github/shyiko/mysql/binlog/event/RawBinaryLogEvent.java Show resolved Hide resolved

DylanFlanders reviewed Jan 22, 2025

View reviewed changes

src/main/java/com/github/shyiko/mysql/binlog/io/RawEventsReader.java Show resolved Hide resolved

jmlw reviewed Jan 22, 2025

View reviewed changes

.../com/github/shyiko/mysql/binlog/event/deserialization/AbstractRowsEventDataDeserializer.java Show resolved Hide resolved

shirolimit added 15 commits January 23, 2025 10:20

Add binary event and event data reader

f61c505

Implement deserialization logic with reader

be406c6

Implement buffered binlog client

06c6372

Use array lookup for enum byCode

6c5624d

Fix byCode implementation

266522e

Optimize packet header read by reducing number of read operations

c542084

Use BitSet.cardinality instead of manual count

44ec4df

Add unit tests to some deserializers

fa86e7d

Get rid of Buffered client

3daf3e0

Fix transaction payload event deserialization

704ab88

Fix build and test errors

71aaa38

Optimize MySqlGtid parsing

80fd655

Create server EOF event on EOF packet

6d35f60

Add RawEventsReader unit tests

492e4f1

Add support to deserializeTimeV2New

85fcc35

shirolimit force-pushed the shirolimit/optimize-event-deserialization branch from 5cba571 to 85fcc35 Compare January 23, 2025 10:06

jmlw reviewed Jan 23, 2025

View reviewed changes

.../com/github/shyiko/mysql/binlog/event/deserialization/AbstractRowsEventDataDeserializer.java Outdated Show resolved Hide resolved

jmlw reviewed Jan 23, 2025

View reviewed changes

DylanFlanders reviewed Jan 23, 2025

View reviewed changes

...main/java/com/github/shyiko/mysql/binlog/event/deserialization/BinaryLogEventDataReader.java Outdated Show resolved Hide resolved

DylanFlanders reviewed Jan 23, 2025

View reviewed changes

...main/java/com/github/shyiko/mysql/binlog/event/deserialization/BinaryLogEventDataReader.java Outdated Show resolved Hide resolved

Remove dead code in AbstractRowsEventDataDeserializer

5ff88b2

shirolimit added 2 commits January 27, 2025 09:55

Use IOException in BinaryLogEventDeserializer

9a5df24

Zero excessive bits in readBooleanList

888c63f

shirolimit force-pushed the shirolimit/optimize-event-deserialization branch from 2ac1ea7 to 888c63f Compare January 27, 2025 17:49

DylanFlanders reviewed Jan 29, 2025

View reviewed changes

shirolimit added 2 commits January 29, 2025 17:57

Fix test comments

226d8cc

Fix java.sql.Timestamp comparison in tests

b28c388

shirolimit force-pushed the shirolimit/optimize-event-deserialization branch from 43ceba9 to b28c388 Compare January 29, 2025 17:05

DylanFlanders approved these changes Jan 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize binlog event deserialization #11

Optimize binlog event deserialization #11

shirolimit commented Jan 22, 2025 •

edited

Loading

shirolimit commented Jan 23, 2025

jmlw left a comment

DylanFlanders left a comment

DylanFlanders Jan 29, 2025

shirolimit Jan 29, 2025

DylanFlanders Jan 29, 2025

shirolimit Jan 29, 2025

DylanFlanders Jan 29, 2025

Optimize binlog event deserialization #11

Are you sure you want to change the base?

Optimize binlog event deserialization #11

Conversation

shirolimit commented Jan 22, 2025 • edited Loading

shirolimit commented Jan 23, 2025

jmlw left a comment

Choose a reason for hiding this comment

DylanFlanders left a comment

Choose a reason for hiding this comment

DylanFlanders Jan 29, 2025

Choose a reason for hiding this comment

shirolimit Jan 29, 2025

Choose a reason for hiding this comment

DylanFlanders Jan 29, 2025

Choose a reason for hiding this comment

shirolimit Jan 29, 2025

Choose a reason for hiding this comment

DylanFlanders Jan 29, 2025

Choose a reason for hiding this comment

shirolimit commented Jan 22, 2025 •

edited

Loading