Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug]: Fix wrong order by removal from plan #13497

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

akurmustafa
Copy link
Contributor

Which issue does this PR close?

Closes 13483.

Rationale for this change

In the 13483, final plan doesn't contain SortExec where it should. I have traced the reason and recognized that during constantness evaluation, when all children of an expression is contant expression is treated as constant. This is a valid assumption for BinaryExpr, or CastExpr. However, this shouldn't be the case for CaseExpr. This PR fixes this problem.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt) labels Nov 20, 2024
Comment on lines 1211 to 1213
// When expression contains branch even if all children are constant
// final result may not be constant
let is_branched = expr.as_any().is::<CaseExpr>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite get it. from constantness perspective CaseExpr is nothing special
in the issue example we have

  CASE 
        WHEN name = 'name1' THEN 0.0
        WHEN name = 'name2' THEN 0.5
    END AS a

the name = 'name1' is one of the child of CaseExpr and is should not be considered constant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After you comment, I thought about this. It turns out that my initial approach is indeed wrong. Datafusion has 2 kinds of constant expressions. One is across_partitions, other is per partition, indicate with across_partitions=false flag. In following plan

02)--ProjectionExec: expr=[CASE WHEN name@0 = name1 THEN 0 WHEN name@0 = name2 THEN 0.5 END as a]
03)----UnionExec
04)------ProjectionExec: expr=[name1 as name]
05)--------PlaceholderRowExec
06)------ProjectionExec: expr=[name2 as name]
07)--------PlaceholderRowExec

union has 2 children, each with constant name columns (since each source has single row, each column is constant). Union transforms these constant children as constant, but with across_partitions=false (which is correct). Each partition still receives constant value from their perspective (By the way Union doesn't merge any partition ). This is also correct for ProjectionExec. At ProjectionExec column name is constant, where across_partitions=false. However, in the current code, it is propagated as across_partitions=true. This behaviour is the root cause of the issue. I changed my implementation to fix this issue.

@alamb alamb mentioned this pull request Nov 20, 2024
&self,
expr: &Arc<dyn PhysicalExpr>,
) -> bool {
// As an example, assume that we know columns `a` and `b` are constant.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love this doc!
so the result is true if expression contains only constants?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, when constantness is correct across partitions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Order by is ignored
3 participants