-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[improve][client] PIP-393: Improve performance of Negative Acknowledgement #23600
base: master
Are you sure you want to change the base?
Conversation
I will implement a space efficient map structure for |
@thetumbled Fastutil could be a better source of space efficient map data structures. I believe that there's a templating solution where it's possible to generate code for efficient implementations. In this case, is there a need for the data structure to be concurrent? Following a single writer principle could result in simpler and more performant designs. One way to address message passing from other threads to a single writer thread is to use message passing queues from JCTools which we already use in Pulsar. Just some food for thought. |
...rc/main/java/org/apache/pulsar/common/util/collections/ConcurrentTripleLong2LongHashMap.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/apache/pulsar/common/util/collections/ConcurrentTripleLong2LongHashMap.java
Outdated
Show resolved
Hide resolved
If any solution from |
The main purpose of Fastutil is performance and space efficiency. pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/PendingAcksMap.java Line 101 in 3d0625b
There's a huge benefit when the keys are primitives and you can only achieve that with a hierarchical data structure. In the long run, I'd rather get rid of our custom collection implementations instead of adding more of them. It's not trivial to create bug free collections. Using existing libraries for that purpose is strongly preferred. @thetumbled Have you considered a hierarchical data structure? (such as map of maps) |
It is a good point, i will test it. |
@thetumbled In certain cases when tracking existence (true/false), it's worth considering to use space efficient bit maps. In Pulsar, we use the RoaringBitmap library. |
I guess it's not applicable in this case. @thetumbled I checked the NegativeAcksTracker class and it seems that the actual key is (ledgerId, entryId). It's easy to implement (ledgerId, entryId) as map of maps. |
This is a very poor solution in the current implementation: pulsar/pulsar-client/src/main/java/org/apache/pulsar/client/impl/NegativeAcksTracker.java Lines 80 to 88 in 22cfa54
There should be a separate datastructure (a list or queue) which contains the entries in timestamp order. The benefit of that is that iterating could stop after the timestamp condition no longer holds. |
@thetumbled It looks like there's no need for a map data structure in the first place. That's completely unnecessary for implementing NegativeAcksTracker |
You are right. We need to improve the code execution efficiency too. some kind of structures sorted by timestamp. |
Fastutil contains multiple PriorityQueue implementations: https://fastutil.di.unimi.it/docs/it/unimi/dsi/fastutil/PriorityQueue.html. |
I propose a pip to fix several issues with nack tracker, with a new data structure :
This PR become the implementation PR for PIP-393: #23601. |
There are some issues with the licence.
Should we replace |
@thetumbled That's fine to complete the experiment. For a proper solution there will need to be a way to avoid the shaded jar files growing significantly. The proper solution would require a separate module for fastutil which minimizes the number of classes. I tried to explain that in my previous comment, #23600 (comment) . Minimizing all classes would most likely be a breaking change. Since you cannot specify |
I wonder whether it is the problem with the |
@thetumbled When new dependencies are added, it's necessary to add them to LICENSE file(s). However, there's a bit of a problem in using |
get it, i change it back to |
The implementation is done, please help to review this PIP, thanks. @lhotari @BewareMyPower @nodece @poorbarcode @dao-jun |
pulsar-client-shaded/pom.xml
Outdated
@@ -168,6 +168,7 @@ | |||
<include>org.tukaani:xz</include> | |||
<!-- Issue #6834, Since Netty ByteBuf shaded, we need also shade this module --> | |||
<include>org.apache.pulsar:pulsar-client-messagecrypto-bc</include> | |||
<include>it.unimi.dsi:*</include> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't the correct solution for shading fastutil. Please see https://github.com/apache/pulsar/pull/23600/files#r1848136065
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have update the shade conf, please review again, thanks.
+ "than the expected time, but no more than 256ms. \nIf set to k, the redelivery time will be" | ||
+ "bucketed by 2^k ms.\nIf the value is 0, the redelivery time will be accurate to ms." | ||
) | ||
private int negativeAckPrecisionBitCnt = 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you consider 0 as the default value? Users may wish to submit later, which is consistent with previous behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to configure it as the best the confguration value. Without this enhancement, it is unrealistic to use negative acknowledgement in production. The memory occupation will inflate very fast.
But i will listen to the voice from the community, if more and more people think that disabling this feature by default is better, i will update it.
} | ||
|
||
long backoffNs; | ||
long backoffMs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use nanoseconds? Then you need to use System.nanoTime()
instead of System.currentTimeMillis()
, the System.nanoTime()
is quickly based on JVM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Nano is not convenient for calculating, as we need to estimate the accuracy of timestamp.
For example, withnegativeAckPrecisionBitCnt = 10
, we know that the redelivery time may be earlier at most 2^10=1024ms~=1s. We just trim the lower 8 bit to bucket the time.
But with nano(1ms=1000000ns
). we can't get a suitable conf value easily. - It is unnecessary to use timestamp in
ns
unit. As the tick time of the timer is 1ms, and the latency of message delivery is atms
level.
<dependency> | ||
<groupId>org.roaringbitmap</groupId> | ||
<artifactId>RoaringBitmap</artifactId> | ||
</dependency> | ||
|
||
<dependency> | ||
<groupId>it.unimi.dsi</groupId> | ||
<artifactId>fastutil</artifactId> | ||
</dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like you should use the fastutil-core
instead of fastutil
, and then we can make a new PR to reduce the fastutil-core
size.
And then use <include>it.unimi.dsi:fastutil-core</include>
.
pulsar-broker/src/test/java/org/apache/pulsar/client/impl/NegativeAcksTest.java
Show resolved
Hide resolved
…tiveAcksTest.java Co-authored-by: Zixuan Liu <[email protected]>
@thetumbled Please don't include |
Thanks for reminder. |
Implementation PR for PIP-393: #23601.
Motivation
There are many issues with the current implementation of Negative Acknowledgement in Pulsar:
All of these problem is severe and need to be solved.
Memory occupation is high
After the improvement of #23582, we have reduce half more memory occupation
of
NegativeAcksTracker
by replacingHashMap
withConcurrentLongLongPairHashMap
. With 100w entry, the memory occupation decrease from 178Mb to 64Mb. With 1kw entry, the memory occupation decrease from 1132Mb to 512Mb.The average memory occupation of each entry decrease from 1132MB/10000000=118byte to 512MB/10000000=53byte.
But it is not enough. Assuming that we negative ack message 1w/s, assigning 1h redelivery delay for each message,
the memory occupation of
NegativeAcksTracker
will be3600*10000*53/1024/1024/1024=1.77GB
, if the delay is 5h,the required memory is
3600*10000*53/1024/1024/1024*5=8.88GB
, which increase too fast.Code execution efficiency is low
Currently, each time the timer task is triggered, it will iterate all the entries in
NegativeAcksTracker.nackedMessages
,which is unnecessary. We can sort entries by timestamp and only iterate the entries that need to be redelivered.
Redelivery time is not accurate
Currently, the redelivery time is controlled by the
timerIntervalNanos
, which is 1/3 of thenegativeAckRedeliveryDelay
.That means, if the
negativeAckRedeliveryDelay
is 1h, the check interval time will be 20min, which is unacceptable.Multiple negative ack for messages in the same entry(batch) will interfere with each other
Currently,
NegativeAcksTracker#nackedMessages
map(ledgerId, entryId)
totimestamp
, which means multiple nacks from messages in the same batch share single one timestamp.If we let msg1 redelivered 10s later, then let msg2 redelivered 20s later, these two messages are delivered 20s later together. msg1 will not be redelivered 10s later as the timestamp recorded in
NegativeAcksTracker#nackedMessages
is overrode by the second nack call.we can reproduce this problem with test code below:
We expect the second message redelivered 10s later than the first message, as it call nack 10s later than the first one.
However, we will receive two messages together.
You can also reproduce this problem with the test code in this PR: org.apache.pulsar.client.impl.NegativeAcksTest#testNegativeAcksWithBatch
Modifications
Refactor the
NegativeAcksTracker
to solve the above problems.Verifying this change
This change added tests and can be verified as follows:
(example:)
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: thetumbled#64