feat(stream): wait committed epoch in state table init_epoch #19223
+384
−488
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
The
init_epoch
part of #18312.It is required that after
init_epoch
returns, we can always read the consistent data onprev_epoch
, and otherwise, we will read inconsistent data and pollute the downstream.init_epoch
is called during recovery and configuration. Previously, in recovery, this is ensured by CN waiting the latest hummock version, and in configuration change, this is ensured by pausing the barrier injection and do a CN-widetry_wait_epoch
call in theWaitEpochCommit
streaming service rpc to wait for the committed epoch bumping up globally..In this PR, in the
init_epoch
ofStateTable
, we will change to wait for thecommitted_epoch
of the state table to bump up to theprev_epoch
. With this PR, during recovery there is no need for CN to wait for the latest hummock version, and for configuration change without update vnode bitmap (replace table, sink into table, not including scale), there is no need to do the pause-wait-resume barrier injection.Deadlock can easily happen. Bumping up
committed_epoch
depends on collecting barriers from all actors, but ifinit_epoch
blocks the handling of initial barrier, wheninit_epoch
waits for bumping upcommitted_epoch
, it blocks barrier handling, and causescommitted_epoch
never bump up, and cause deadlock.To avoid deadlock, we should strictly follow the order when handling the initial barrier. The order should be, receive first barrier, yield first barrier, and then
StateTable::init_epoch
. In this PR, we just ensure that all usages ofStateTable::init_epoch
will be included in the changed code so that we can carefully review the code. TheStateTable::init_epoch
method will become async, and therefore all direct usages on it will be included in the changed code. Some non-async utils method that callsStateTable::init_epoch
is required to be async as well, and therefore will be included in the changed code as well. There are two utils methods, both namedinit_epoch
, that were already async, and callStateTable::init_epoch
. To include the usages of these two methods in the changed code, the two methods are both renamed toinit_epoch_after_yield_barrier
, so that the usage can also be included in the changed code. Moreover, if the handling order is incorrect, deadlock must happen if the problematic code is reached. Hence, if we can pass all the CI tests, we can be confident that the correctness is fulfilled.Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.