-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement](compaction) support parallel compaction for single tablet #19069
Conversation
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
TeamCity pipeline, clickbench performance test result: |
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
run p0 |
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
We're closing this PR because it hasn't been updated in a while. |
Proposed changes
Issue Number: close #18742
Problem summary
Basic ideas
In this pr, we add support for parallel cumulative compaction for single tablet, and base compaction still runs in single thread.
Firstly, we save all current running cumulative compaction tasks in an array named
cumulative_compactions
, and save current running base compaction task inbase_compaction
.In the case of multiple cumulative compaction tasks running at the same time, the time and order of completion of the tasks is indeterminate. Therefore, the rowsets that can be compacted by next thread are split into multiple contiguous segments. Every time we want to choose rowsets to compact, we will choose a contiguous segments with maximum score. And in this choose process, we will skip rowsets with large size to avoid their participation in cumulative compaction.
We also change the behavior of
update_cumulative_point
. Inupdate_cumulative_point
, we will forwardcumulative_point
as far as possible.The use of lock
Since the clone task and the compression task cannot be performed at the same time, we use
shared_mutex
to ensure that the execution of the parallel compression task and the clone task is serialized. We also add a mutexcompaction_meta_lock
to protect the meta data of the compaction such ascumulative_compactions
,base_compaction
andcumulative_point
.The following describes the detailed use of locks in each function
Tablet::calc_compaction_score
, we holdcompaction_meta_lock
andmeta_lock
, thus we can safely accesscumulative_compactions
,cumulative_point
, etc.Tablet::prepare_compaction_and_calculate_permits
, before callprepare_compact
, we will holdcumulative_compact_meta_lock
. Thus inprepare_compact
, we can safely accesscumulative_compactions
,cumulative_point
, etc. The reason we holdcumulative_compact_meta_lock
before callingprepare_compact
because we need to ensure that choosing the rowsets to compact and adding the compaction task tocumulative_compactions
must be an atomic operation. It is the same for base compaction, we also need to getcumulative_compact_meta_lock
first, and then check whetherbase_compaction
is null, if it is not null, it means that base compaction is already running (e.g. triggered by http request).CumulativeCompaction::execute_compact_impl
, we will try to get reader lock ofcumulative_compaction_lock
first, and if we can't get it, that means the tablet is under clone. Afterdo_compaction
, we will holdcompaction_meta_lock
and callupdate_cumulative_point
to safely forwardcumulative_point
.EngineCloneTask::_finish_clone
, in addition to acquiringbase_compaction_lock
,cumulative_compaction_lock
(write lock), we also acquire thecumulative_compact_meta_lock
to access current running compaction tasks and update theiris_clone_occurred
field.Checklist(Required)
Further comments
If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...