KAFKA-20131: ClassicKafkaConsumer does not clear endOffsetRequested flag on failed LIST_OFFSETS calls#21457
Conversation
…ST_OFFSETS call fails First pass at catching the case of failures. Still work to do to handle the fact that multiple responses are possible and thus we don't want to clear the flag prematurely.
There was a problem hiding this comment.
@kirktrue went through the code, debugged it locally and ran the added tests, to me the PR looks fine. Since I'm not up to date with the consumer code though, I would request you to get a second opinion too.
…T_OFFSETS-failures
| if (listOffsetsRequestState.shouldRetry) { | ||
| // This exception is caught in prepareFetchOffsetsRequests() and signals to retry this set | ||
| // of partitions. Only do that if the caller wants to retry LIST_OFFSETS calls, of course. | ||
| throw new StaleMetadataException(); | ||
| } else { | ||
| // If the caller doesn't want to retry LIST_OFFSETS calls, go ahead and clear the | ||
| // 'end offsets requested' flag since there won't be another chance. Then return an empty | ||
| // list to signal that there are no requests to be sent. | ||
| offsetFetcherUtils.clearPartitionEndOffsetRequests(timestampsToSearch.keySet()); | ||
| return List.of(); | ||
| } |
There was a problem hiding this comment.
uhm I would say we shouldn't change this retry logic. This internal retry is for the case where when we don't know the leader for a partition when fetching offsets, so we really couldn't even send the request. In this case we want to refresh metadata and try again once we know the leader. Is there a reason why we want to change that in relationship with the issue of the flag not being cleared?
With this change, I'm afraid that if the leader is not known when we call currentLag, it would require a next call to currentLag to issue another request, and a third call to lag to get the result vs the current behaviour, where we would retry here to discover the leader and fetch, so the next call to lag would find the results already.
There was a problem hiding this comment.
In this case we want to refresh metadata and try again once we know the leader. Is there a reason why we want to change that in relationship with the issue of the flag not being cleared?
The existing behavior from ClassicKafkaConsumer.currentLag() is to make a single request to LIST_OFFSETS. I'm trying to mimic with the AsyncKafkaConsumer implementation.
I added a test to OffsetsRequestManager that approximates the call to currentLag() that will fail due to a missing leader. From looking at the code, it seems as though the OffsetsRequestManager will detect the missing leader and retry the LIST_OFFSETS repeatedly, without a timeout or limit.
I'm not 100% convinced that the fire-and-forget style approach of ClassicKafkaConsumer is correct or optimal either, but it's the approach I'm trying to copy.
Thoughts?
There was a problem hiding this comment.
ok, good point agree on the unbounded retries gap, but then, isn't that a problem for calls to beginningOffsets/end/offsetsForTimes too? They will timeout correctly at the API level (even expires), but seems they may leave the OffsetsRequestManager in the background retrying? (or is there a mechanism in place that I'm missing?)
If so, then agree on addressing the unbounded retries here, but should we address it consistently with all calls retrying while there is time? (instead of having the shouldRetry that splits the path)
The gap seems to be having this OffsetsRequestManager.fetchOffsets call that has not deadline.
There was a problem hiding this comment.
I've created KAFKA-20188 to track the work for adding a timeout to OffsetsRequestManager.fetchOffsets(). So we can focus on that in a separate PR.
| // The currentLag() API is a "best effort" attempt at calling the LIST_OFFSETS RPC. If it | ||
| // fails, don't retry the attempt internally, but let the user attempt it again. |
There was a problem hiding this comment.
here is were I think we may be getting something wrong. Agree we don't want to retry the ListOffset triggered from currentLag if it fails. But we do want to retry discovering a leader in order to be able to send the request (which is what the internal retries I pointed in the comment above do).
So if this retry=false is only used to skip that leader re-discovery retry, I think we should remove it (no param needed probably?)
There was a problem hiding this comment.
The leader rediscovery comes from the call to Metadata.requestUpdate() that's made in OffsetsRequestManager.groupListOffsetRequests(), right? I didn't see that throwing the StaleMetadataException did anything besides re-enque the ListOffsetsRequestState, which would then send a follow up LIST_OFFSETS on the next call to OffsetsRequestManager.poll(), right?
My apologies if I'm missing something obvious 😄
| void clearPartitionEndOffsetRequests(Collection<TopicPartition> partitions) { | ||
| for (final TopicPartition partition : partitions) { | ||
| if (subscriptionState.maybeClearPartitionEndOffsetRequested(partition)) { | ||
| log.trace("Clearing partition end offset requested for partition {}", partition); |
There was a problem hiding this comment.
a bit confusing, dup "partition"?
There was a problem hiding this comment.
It's TRACE, so no one will ever see it 😉
j/k I'll change it.
There was a problem hiding this comment.
Removed the extra "partitions" in the log message.
| remainingToSearch.keySet().retainAll(value.partitionsToRetry); | ||
|
|
||
| offsetFetcherUtils.updateSubscriptionState(value.fetchedOffsets, isolationLevel); | ||
| offsetFetcherUtils.clearPartitionEndOffsetRequests(remainingToSearch.keySet()); |
There was a problem hiding this comment.
here we're clearing the flag for the partitions that didn't get offsets yet. I agree we need this if we don't have any time left to retry. But if there's still time, the do-while will try again. In that case, do we want to clear the flag here?
I would imagine we don't, because we'll continue retrying while there is time. It could be the case of missing leader info for instance: we want to keep the flag on for those partitions, hit the client.awaitMetadataUpdate(timer) below, and try again in the next iteration of the do-while, right?
If so, I imagine we could take the timer into consideration here? (clear the flag for the failed partitions only if timer expired?). Thoughts?
There was a problem hiding this comment.
I agree we need this if we don't have any time left to retry. But if there's still time, the do-while will try again. In that case, do we want to clear the flag here?
That's precisely what happens in the currentLag() case, though. It's always using a timeout of 0, so there's never a second pass in that loop.
There was a problem hiding this comment.
ok, we both agree we need it for currentLag/timerExpired. But in the way it's called now it applies to all cases, that's my concern. Isn't this going to clear the flag also in the case where there is time left to retry, and there is a partition that didn't have a known leader?
There was a problem hiding this comment.
I've added an explicit parameter to 'clear end offsets requests' that only the ClassicKafkaConsumer.currentLag() sets to true. This should prevent other callers from clearing the flag, regardless of the timeout setting.
|
|
||
| @Override | ||
| public void onFailure(RuntimeException e) { | ||
| offsetFetcherUtils.clearPartitionEndOffsetRequests(remainingToSearch.keySet()); |
| return true; | ||
| } else { | ||
| return false; |
There was a problem hiding this comment.
this return type seems to only be used to log if we performed the action or not. Should we move the log here instead? (simplify this func and the caller)
There was a problem hiding this comment.
I left the logging at the higher level for clearing the flag when we fail to find the offset to match where the logging is done when we do find the offset, both in OffsetFetcherUtils. That's mostly so that we can enable the log level for a single class to log both the positive and negative cases.
There was a problem hiding this comment.
But I agree that it's a little wonky.
| assertLogEmitted( | ||
| appender, | ||
| "^Requesting the log end offset for " + tp0 + " in order to compute lag$" | ||
| ); |
There was a problem hiding this comment.
this log is printed when there is a call to subscriptionState.requestPartitionEndOffset, should we better verify(subscriptionState).requestPartitionEndOffset(tp)? (we truly care about the func being called, and these log checks easily get flaky)
There was a problem hiding this comment.
Yes, the verify() approach is better. That said, I do feel that ensuring that user-facing log messages show up as we expect them to in order to make logs-based troubleshooting possible.
There was a problem hiding this comment.
Just for clarity, I will add the verify() bit for sure.
There was a problem hiding this comment.
Added calls to verify() to check when requestPartitionEndOffset() is invoked.
There was a problem hiding this comment.
FYI: I had to wrap the SubscriptionState instance in spy() in order to call verify() on it.
| assertLogEmitted( | ||
| appender, | ||
| "^Not requesting the log end offset for " + tp0 + " to compute lag as an outstanding request already exists$" | ||
| ); |
There was a problem hiding this comment.
similar, would it be better to verify that we never called subscriptionState.requestPartitionEndOffset(tp0)?
There was a problem hiding this comment.
Added calls to verify() to that requestPartitionEndOffset() is not invoked in this case.
…T_OFFSETS-failures
…ocations in KafkaConsumerTest
|
|
||
| offsetFetcherUtils.updateSubscriptionState(value.fetchedOffsets, isolationLevel); | ||
|
|
||
| if (isZeroTimestamp && shouldClearPartitionEndOffsets) |
There was a problem hiding this comment.
shouldn't we clear the flag if (isZeroTimestamp)?
I see the shouldClearPartitionEndOffsets is passed true only from currentLag, but what about a call to consumer.endOffsets/beginningOffsets/offsetsForTimes when called with Duration.ZERO? how would the flag get cleared?
There was a problem hiding this comment.
The flag is only set and only checked on the currentLag() path, so the other paths shouldn't need to worry about it.
It's possible for a user to pass in a zero timeout to offsetsForTimes(), et al., but we don't need to clear the flag in those cases.
Updates the
ClassicKafkaConsumerto clear out theSubscriptionStateendOffsetRequestedflag if theLIST_OFFSETScall fails.