Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does PruningPredicate reference a row_count for each column? #13836

Open
adriangb opened this issue Dec 19, 2024 · 3 comments
Open

Why does PruningPredicate reference a row_count for each column? #13836

adriangb opened this issue Dec 19, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@adriangb
Copy link
Contributor

Is your feature request related to a problem or challenge?

Are there any scenarios where it makes sense for each column in a container to have a different row count? I think they should always be the same. Even if they are stored separately in Parquet we should be able to pick any non-missing row count and have it be correct. If this is true we can simplify the pruning predicate a little bit which would make it (possibly insignificantly) faster to evaluate for everyone using DataFusion but selfishly would allow me to remove a couple lines of hacky code in our codebase.

StatisticsType::RowCount => "row_count",

Describe the solution you'd like

PruningPredicate has the option to be configured to only reference a single column called row_count.

Describe alternatives you've considered

Do nothing.

Additional context

No response

@adriangb adriangb added the enhancement New feature or request label Dec 19, 2024
@adriangb
Copy link
Contributor Author

One option would be to make RequiredColumns a trait to define how stats columns are named?

@alamb
Copy link
Contributor

alamb commented Dec 22, 2024

I agree I can't think of any reason for each column to have a separate row count

@adriangb
Copy link
Contributor Author

Even if the stats have a row count per column, surely we can pick one, rename it to row_count and discard the rest? Would you be open to that change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants