Why does PruningPredicate
reference a row_count
for each column?
#13836
Labels
enhancement
New feature or request
Is your feature request related to a problem or challenge?
Are there any scenarios where it makes sense for each column in a container to have a different row count? I think they should always be the same. Even if they are stored separately in Parquet we should be able to pick any non-missing row count and have it be correct. If this is true we can simplify the pruning predicate a little bit which would make it (possibly insignificantly) faster to evaluate for everyone using DataFusion but selfishly would allow me to remove a couple lines of hacky code in our codebase.
datafusion/datafusion/physical-optimizer/src/pruning.rs
Line 843 in 46101f3
Describe the solution you'd like
PruningPredicate
has the option to be configured to only reference a single column calledrow_count
.Describe alternatives you've considered
Do nothing.
Additional context
No response
The text was updated successfully, but these errors were encountered: