-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid pushdown of volatile functions to tablescan #13475
base: main
Are you sure you want to change the base?
Conversation
.into_iter() | ||
.zip(results) | ||
.map(|(expr, res)| { | ||
let filter_pushdown_type = if expr.is_volatile() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: We could skip checking volatile if it’s already unsupported.
let zip =
filter_predicates
.into_iter()
.zip(results)
.map(|(pred, mut res)| {
if !matches!(res, TableProviderFilterPushDown::Unsupported)
&& pred.is_volatile()
{
// Do not push down predicate with volatile functions to scan
res = TableProviderFilterPushDown::Unsupported
}
(pred, res)
});
.into_iter() | ||
.zip(results) | ||
.map(|(expr, res)| { | ||
let filter_pushdown_type = if expr.is_volatile() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we check is_volatile
before calling scan.source.supports_filters_pushdown(...)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand, we can ask a table source to tell if the filter is supported even for volatile expressions.
I thought the main difference is which call is more expensive. supports_filters_pushdown
for different sources often provides blanket all-supported or all-unsupported for all predicates. So we can save is_volatile
calls by filtering out Unsupported
values before. In contrast, if we check for volatility first and only then ask the source if the filters are supported, we can waste is_volatilty
calls. If the supports_filters_pushdown
is more costly, let's reverse it, of course.
I pushed a change to check the volatility after filtering out unsupported predicates. If we prefer the inverse order, I could refactor this optimiser quite a bit to:
- Get a subset of supported predicates
- Apply the
is_volatile
filter to it - Build
new_scan_filters
andnew_predicate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is possible that any table source can support volatile function then this checking should happen in supports_filters_pushdown
.
Otherwise, we should check is_volatile
before supports_filters_pushdown
, so those table source don't need to care about whether the function is volatile or not.
Given that volatile has no deterministic result, I think it doesn't make sense to pushdown filter with those function, therefore I think we can check volatility beforehand
It would be nice to add comment to supports_filters_pushdown
mentioning that volatile function is not passed to the function
bef3666
to
0066c0b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Which issue does this PR close?
When using pushdown filters, the planner pushes the volatile
random()
filter to the table source, so it executes in scan (for example, in parquet) and in the query engine, which leads to weird results.Closes #13268.
Rationale for this change
It's impossible to evaluate volatile filters in different layers.
What changes are included in this PR?
Are these changes tested?
As proposed in the original issue, I tried
alltypes_tiny_pages_plain.parquet
sample file containing 7300 lines:Running a query
with
datafusion-cli
gives an answer of 726, which is pretty close to the expected 730.New plan
Before the change plan was
Are there any user-facing changes?
No breaking changes.