Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Run (logical) optimizers on subqueries #13066

Merged
merged 1 commit into from
Oct 23, 2024

Conversation

eejbyfeldt
Copy link
Contributor

Which issue does this PR close?

Closes #3770 and maybe #2480

Rationale for this change

This is a step towards being able to run TPC-DS q41 (#4763) which has an
expressions that needs simplification before our rules can decorrelate the
subquery.

What changes are included in this PR?

This patch makes it so that rules the configure an apply_order will
also include subqueries in their traversel.

Are these changes tested?

Existing tests.

Are there any user-facing changes?

Yes, subqueries will now also be optimized.

@github-actions github-actions bot added optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels Oct 22, 2024
This patch makes it so that rules the configure an `apply_order` will
also include subqueries in their traversel.

This is a step twoards being able to run TPC-DS q41 (apache#4763) which has
an expressions that needs simplification before we can decorrelate the
subquery.

This closes apache#3770 and maybe apache#2480
@eejbyfeldt eejbyfeldt force-pushed the i-3770-also-optimize-subqueries branch from bf8b752 to df84a84 Compare October 22, 2024 19:12
@@ -250,10 +250,6 @@ impl Optimizer {
Arc::new(DecorrelatePredicateSubquery::new()),
Arc::new(ScalarSubqueryToJoin::new()),
Arc::new(ExtractEquijoinPredicate::new()),
// simplify expressions does not simplify expressions in subqueries, so we
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

This may also make planning non trivially faster as SimplifyExpressions is quite expensive

@eejbyfeldt eejbyfeldt marked this pull request as ready for review October 23, 2024 07:03
Copy link
Contributor

@Dandandan Dandandan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 🚀

@@ -384,11 +380,9 @@ impl Optimizer {

let result = match rule.apply_order() {
// optimizer handles recursion
Some(apply_order) => new_plan.rewrite(&mut Rewriter::new(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such a simple change :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we owe a significant debt to @peter-toth for his work on the tree node API to sort out how to handle subqueries

@Dandandan Dandandan merged commit 3aa9714 into apache:main Oct 23, 2024
24 checks passed
@Dandandan
Copy link
Contributor

Thanks @eejbyfeldt.
I went on and merged this, it seems a small, not controversial change.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Dandandan and @eejbyfeldt

05)--------TableScan: t1
06)--------TableScan: t2
07)--TableScan: t0 projection=[t0_id, t0_name]
01)LeftSemi Join: t0.t0_name = __correlated_sq_2.t1_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@@ -384,11 +380,9 @@ impl Optimizer {

let result = match rule.apply_order() {
// optimizer handles recursion
Some(apply_order) => new_plan.rewrite(&mut Rewriter::new(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we owe a significant debt to @peter-toth for his work on the tree node API to sort out how to handle subqueries

eejbyfeldt added a commit to eejbyfeldt/datafusion that referenced this pull request Oct 30, 2024
With apache#13066 we run optimizer rules on subquries by default which
resolved issue apache#5771
eejbyfeldt added a commit to eejbyfeldt/datafusion that referenced this pull request Oct 30, 2024
With apache#13066 we run optimizer rules on subquries by default which
resolved issue apache#5771
goldmedal pushed a commit that referenced this pull request Oct 30, 2024
With #13066 we run optimizer rules on subquries by default which
resolved issue #5771
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support rewriting subquery expressions in simplify_expressions
3 participants