Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expressions should also evaluate on statistics #992

Open
Tracked by #997
rdettai opened this issue Sep 12, 2021 · 3 comments · May be fixed by #13736 or gatesn/datafusion#1
Open
Tracked by #997

Expressions should also evaluate on statistics #992

rdettai opened this issue Sep 12, 2021 · 3 comments · May be fixed by #13736 or gatesn/datafusion#1
Labels
enhancement New feature or request

Comments

@rdettai
Copy link
Contributor

rdettai commented Sep 12, 2021

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Physical expressions should be aware of the rules on how to propagate statistics. For instance, for the row-wise max of two columns, the maximum statistics is the greatest of the maximum of the two input input columns. This would allow the ProjectionExec node to provide much better statistics.

Describe the solution you'd like
We should extend the PhysicalExpr trait with a method that handles statistics:

fn stats_eval(&self, stats: &Statitistics) -> Result<Statistics>

Describe alternatives you've considered
This could be implemented in the ProjectionExec with downcasting on various expression types.

Additional context
This is a follow up of #962

@rdettai rdettai added the enhancement New feature or request label Sep 12, 2021
@Dandandan
Copy link
Contributor

That's a great idea 👍

@rdettai rdettai changed the title Expressions should apply to statistics Expressions should also evaluate on statistics Sep 13, 2021
@alamb
Copy link
Contributor

alamb commented Oct 11, 2021

with #1070, this feature could replace the existing partition pruning logic with a more general purpose implementation. 👍

@alamb
Copy link
Contributor

alamb commented Dec 15, 2024

I think PhysicalExpr.html::evaluate_bounds https://docs.rs/datafusion/latest/datafusion/physical_expr/trait.PhysicalExpr.html#method.evaluate_bounds

Implements this partly (though it is in terms of bounds, not statistics) 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
3 participants