Skip to content

[refactor](recursive-cte) Replace in-place PFC reset with full recreation between recursion rounds#60812

Open
BiteTheDDDDt wants to merge 3 commits intoapache:masterfrom
BiteTheDDDDt:dev_0224
Open

[refactor](recursive-cte) Replace in-place PFC reset with full recreation between recursion rounds#60812
BiteTheDDDDt wants to merge 3 commits intoapache:masterfrom
BiteTheDDDDt:dev_0224

Conversation

@BiteTheDDDDt
Copy link
Contributor

@BiteTheDDDDt BiteTheDDDDt commented Feb 24, 2026

What problem does this PR solve?

This pull request introduces significant improvements to the management of recursive CTE (Common Table Expression) fragments in the pipeline execution engine. The changes focus on making recursive fragment reruns safer and more robust, improving resource cleanup, and clarifying the lifecycle of fragment contexts. The main areas of change are fragment rerun logic, resource and memory management, and runtime filter handling.

Recursive CTE fragment rerun and lifecycle management:

  • Added a new rerun protocol for recursive CTE fragments, splitting the rerun process into multiple explicit stages (wait_for_close, wait_for_destroy, recreate_and_submit, and final_close) to ensure safe destruction and recreation of pipeline fragment contexts (PipelineFragmentContext). This prevents resource leaks and race conditions during recursive execution. [1] [2]
  • Introduced a RerunableFragmentInfo struct and a _rerunnable_params_map in FragmentMgr to track parameters and callbacks needed to safely recreate fragments between recursion rounds. This map is properly cleaned up to avoid memory leaks. [1] [2] [3] [4]

Resource and memory management improvements:

  • Improved memory safety by ensuring that all saved rerunnable parameters are cleared when queries are removed, and by waiting for all fragment tasks and contexts to be fully destroyed before recreating them. [1] [2] [3]
  • Updated the QueryContext::set_pipeline_context method to use insert_or_assign so that fragment contexts can be safely overwritten between recursion rounds.

Runtime filter and task context cleanup:

  • Replaced the previous reset_to_rerun logic with explicit deregister_runtime_filter methods in both PipelineFragmentContext and RuntimeState, ensuring runtime filters are properly removed during rerun and destruction. [1] [2] [3]
  • Refactored the management of task execution context reference counting and destruction, introducing a RerunWaitContext with a sentinel mechanism to reliably wait for context destruction.

Other improvements:

  • Enhanced debug output for pipeline fragment contexts, making it easier to trace fragment IDs and their states.
  • Cleaned up and clarified method signatures and documentation, such as renaming parameters and adding comments to explain rerun and close semantics. [1] [2]

These changes collectively make recursive pipeline execution safer, more predictable, and easier to debug, especially in complex query scenarios involving recursive CTEs.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

Copilot AI review requested due to automatic review settings February 24, 2026 13:32
@Thearas
Copy link
Contributor

Thearas commented Feb 24, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the recursive CTE (Common Table Expression) mechanism by replacing the 5-stage in-place reset protocol with a 4-stage full recreation protocol. Instead of selectively resetting PipelineFragmentContext (PFC) members between recursive iterations, the old PFC is now fully destroyed and a new one is constructed from saved TPipelineFragmentParams.

Changes:

  • Replaced 5-stage protocol (wait/release/rebuild/submit/close) with 4-stage protocol (wait_for_close/wait_for_destroy/recreate_and_submit/final_close)
  • Introduced sentinel mechanism using shared_ptr with custom deleter to safely wait for external threads before destroying PFC
  • Removed in-place reset methods (RuntimeState::reset_to_rerun(), PipelineFragmentContext::rebuild(), PipelineFragmentContext::set_to_rerun()) and ref/unref counting

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
gensrc/proto/internal_service.proto Updated PRerunFragmentParams enum to reflect new 4-stage protocol (breaking protocol change)
be/src/vec/exec/scan/scanner_context.cpp Removed manual ref/unref counting, now relies on sentinel mechanism
be/src/runtime/task_execution_context.h Added sentinel mechanism infrastructure (RerunWaitContext, init_sentinel, wait_for_sentinel_destruction)
be/src/runtime/task_execution_context.cpp Implemented sentinel mechanism with shared_ptr custom deleter
be/src/runtime/runtime_state.h Removed reset_to_rerun method declaration
be/src/runtime/runtime_state.cpp Removed reset_to_rerun implementation, moved global runtime filter cleanup to destructor
be/src/runtime/query_context.cpp Changed insert to insert_or_assign to support PFC replacement during recreation
be/src/runtime/fragment_mgr.h Added RerunableFragmentInfo struct and _rerunnable_params_map for saving fragment params
be/src/runtime/fragment_mgr.cpp Refactored rerun_fragment to implement 4-stage protocol with full PFC recreation
be/src/pipeline/pipeline_fragment_context.h Removed rebuild/set_to_rerun methods, made release_resource public
be/src/pipeline/pipeline_fragment_context.cpp Moved _runtime_state cleanup into release_resource, added sentinel initialization in constructor
be/src/pipeline/exec/rec_cte_source_operator.h Updated to use new 4-stage protocol opcodes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 78 to 81
wait_for_close = 1; // wait for PFC close
wait_for_destroy = 2; // wait for external thread finished and destroy PFC
recreate_and_submit = 3; // recreate PFC from saved params + submit
final_close = 4; // close fragment (final round)
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a breaking protocol change. The enum values have been renumbered (wait=1 is now wait_for_close=1, but they have completely different semantics). This means BE nodes running the new code cannot communicate with BE nodes running the old code for recursive CTE operations. During a rolling upgrade, if the FE sends rerun_fragment requests to a mix of old and new BE nodes, the old BE nodes will misinterpret the commands and execute incorrect operations.

Consider either:

  1. Adding new enum values (6, 7, 8, 9) for the new protocol while keeping old values deprecated
  2. Adding version checking to ensure all nodes are upgraded before enabling this feature
  3. Documenting that this requires a coordinated upgrade of all BE nodes
Suggested change
wait_for_close = 1; // wait for PFC close
wait_for_destroy = 2; // wait for external thread finished and destroy PFC
recreate_and_submit = 3; // recreate PFC from saved params + submit
final_close = 4; // close fragment (final round)
// Legacy opcodes (old protocol). These numeric values are kept for
// backward compatibility and must not be reused with new semantics.
// New code should avoid using these and prefer the new opcodes below.
wait = 1;
wait_for_destroy_legacy = 2;
recreate_legacy = 3;
final_close_legacy = 4;
// New opcodes (new protocol). These use distinct numeric values to
// avoid collisions with the legacy protocol during rolling upgrades.
wait_for_close = 6; // wait for PFC close
wait_for_destroy = 7; // wait for external thread finished and destroy PFC
recreate_and_submit = 8; // recreate PFC from saved params + submit
final_close = 9; // close fragment (final round)

Copilot uses AI. Check for mistakes.
@BiteTheDDDDt
Copy link
Contributor Author

run buildall

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

1 similar comment
@BiteTheDDDDt
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.33% (1796/2264)
Line Coverage 64.79% (32004/49393)
Region Coverage 65.51% (15973/24383)
Branch Coverage 56.00% (8494/15168)

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.33% (1796/2264)
Line Coverage 64.84% (32025/49393)
Region Coverage 65.56% (15985/24383)
Branch Coverage 56.05% (8501/15168)

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage `` 🎉
Increment coverage report
Complete coverage report

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.33% (1796/2264)
Line Coverage 64.80% (32006/49393)
Region Coverage 65.48% (15967/24383)
Branch Coverage 55.99% (8493/15168)

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage `` 🎉
Increment coverage report
Complete coverage report

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage `` 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 10.67% (16/150) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.52% (19589/37300)
Line Coverage 36.11% (182560/505617)
Region Coverage 32.46% (141742/436661)
Branch Coverage 33.41% (61436/183896)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 14.67% (22/150) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 63.32% (23133/36533)
Line Coverage 46.59% (234870/504117)
Region Coverage 43.72% (192738/440829)
Branch Coverage 44.86% (82750/184482)

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

fix

update

update

Revert "update"

This reverts commit 11db789.

update

fix

update

update

update
@BiteTheDDDDt
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage `` 🎉
Increment coverage report
Complete coverage report

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.33% (1796/2264)
Line Coverage 64.79% (32000/49393)
Region Coverage 65.50% (15972/24383)
Branch Coverage 55.97% (8490/15168)

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage `` 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 11.90% (15/126) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 63.29% (23090/36483)
Line Coverage 46.57% (234637/503863)
Region Coverage 43.73% (192711/440723)
Branch Coverage 44.83% (82680/184442)

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 11.90% (15/126) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 63.29% (23090/36483)
Line Coverage 46.57% (234637/503863)
Region Coverage 43.73% (192711/440723)
Branch Coverage 44.83% (82680/184442)

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.33% (1796/2264)
Line Coverage 64.79% (32002/49393)
Region Coverage 65.47% (15964/24383)
Branch Coverage 55.95% (8487/15168)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants