-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize binlog event deserialization #11
base: main
Are you sure you want to change the base?
Conversation
src/main/java/com/github/shyiko/mysql/binlog/event/deserialization/ColumnType.java
Show resolved
Hide resolved
src/main/java/com/github/shyiko/mysql/binlog/event/RawBinaryLogEvent.java
Show resolved
Hide resolved
.../com/github/shyiko/mysql/binlog/event/deserialization/AbstractRowsEventDataDeserializer.java
Show resolved
Hide resolved
5cba571
to
85fcc35
Compare
Force-pushed after rebasing on the latest main. |
.../com/github/shyiko/mysql/binlog/event/deserialization/AbstractRowsEventDataDeserializer.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any obvious issues; especially since the majority of changes are duplication to utilize a different input object
...main/java/com/github/shyiko/mysql/binlog/event/deserialization/BinaryLogEventDataReader.java
Outdated
Show resolved
Hide resolved
...main/java/com/github/shyiko/mysql/binlog/event/deserialization/BinaryLogEventDataReader.java
Outdated
Show resolved
Hide resolved
2ac1ea7
to
888c63f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments but overall it's looking good to me
.../java/com/github/shyiko/mysql/binlog/event/deserialization/BinaryLogEventDataReaderTest.java
Outdated
Show resolved
Hide resolved
.../java/com/github/shyiko/mysql/binlog/event/deserialization/BinaryLogEventDataReaderTest.java
Outdated
Show resolved
Hide resolved
...main/java/com/github/shyiko/mysql/binlog/event/deserialization/BinaryLogEventDataReader.java
Show resolved
Hide resolved
...main/java/com/github/shyiko/mysql/binlog/event/deserialization/BinaryLogEventDataReader.java
Show resolved
Hide resolved
return BitSet.valueOf(bytes); | ||
} | ||
|
||
public int available() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having trouble understanding the blockLength
in the ByteArrayInputStream version of available()
. Do you have some more insight there? It seems in the new data reader version, the buffer is the "block" and we resize the buffer limit in enterBlock
. Whereas in the old input stream version, it seems a bit more hacky and has to block off portions of the input stream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In ByteArrayInputStream
blockLength
was used as a running value, i.e. it contained the number of bytes available for the current block. The stream updated blockLength
on every read but it couldn't go below zero unless we skipToTheEndOfTheBlock
. In the end, what they tried to achieve is to avoid reading from a block boundary which is needed for parsing some events.
The reader version achieves the same by manipulating ByteBuffer
's limit.
...om/github/shyiko/mysql/binlog/event/deserialization/UpdateRowsEventDataDeserializerTest.java
Show resolved
Hide resolved
new BigDecimal("237.00"), | ||
new BigDecimal("10.00"), | ||
1, | ||
0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this covers a number of branches in AbstractRowsEventDataDeserializer#deserializeCell
but not all. Are there some column types that are more difficult to add than others? I'd be happy to try to help here so we can cover AbstractRowsEventDataDeserializer changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true, we don't cover all possible data types and branches here. I totally agree that ideally we should cover all the branches and methods but I decided not to do that to keep the PR smaller. Also, I didn't set a goal to improve the library's code coverage to 100%, just enough of them to be confident in my changes.
For row deserialization classes most of the changes are just overloads that use the same method names thus I decided to add only a few tests.
-
Would it be better to add more unit tests to cover all cases and branches?
It definitely would, the library is a crucial part of binlog syncs and we want it to be stable. -
Is it required to add them to this PR?
I believe it shouldn't be a stopper considering the changes introduced to the class.
I'd propose to leave it as is for now and create a backlog task to improve library's code coverage.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay that makes sense to me. And thank you for the many tests that were added here!
43ceba9
to
b28c388
Compare
The PR aims to speed up MySQL binary log client by changing how events are deserialized.
Instead of reading data byte-by-byte from
ByteArrayInputStream
it fully buffers binlog event in memory asRawBinaryLogEvent
and then usesBinaryLogEventDataReader
for deserialization. The interface ofBinaryLogEventDataReader
is kept similar to library's customByteArrayInputStream
to simplify migration.In my experiments, this improves binlog extraction speed by up to 300% (in case we do nothing with extracted events).
I also tried buffering data in
byte[]
and wrapping it intoByteArrayInputStream
, it was faster than existing approach but still 2 times slower than the proposed one.The PR also introduces some minor optimizations to row parsing (e.g. datetime) and a bunch of unit tests for existing deserializers.