Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support filter pushdown for datafusion #203

Merged
merged 14 commits into from
Dec 8, 2024

Conversation

jonathanc-n
Copy link
Contributor

@jonathanc-n jonathanc-n commented Nov 29, 2024

Description

Added Expr to PartitionFilter conversion to pass in filters. Datafusion will pass down all filters for now using supports_filters_pushdown and will filter after partition filters.

Closes #160.

Steps forward

I noticed the pr was getting a bit big to be reviewed all at once, here are some things that will be worked on afterwards:

  • Support more operators
  • Support more Datafusion Expressions
  • Enhance the python implementation for filter conversion
  • Make easier to create PartitionFilters for testing, etc.

How are the changes test-covered

Added unit tests for all added functionality

Copy link

codecov bot commented Nov 29, 2024

Codecov Report

Attention: Patch coverage is 89.09091% with 12 lines in your changes missing coverage. Please review.

Project coverage is 91.34%. Comparing base (b2c60b1) to head (21db5b8).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
python/src/internal.rs 0.00% 7 Missing ⚠️
crates/datafusion/src/util/expr.rs 83.87% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #203      +/-   ##
==========================================
- Coverage   91.79%   91.34%   -0.46%     
==========================================
  Files          21       24       +3     
  Lines         987     1074      +87     
==========================================
+ Hits          906      981      +75     
- Misses         81       93      +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

crates/core/src/exprs/mod.rs Outdated Show resolved Hide resolved
crates/core/src/table/fs_view.rs Outdated Show resolved Hide resolved
crates/core/src/table/mod.rs Outdated Show resolved Hide resolved
python/src/internal.rs Outdated Show resolved Hide resolved
crates/datafusion/src/utils/exprs_to_filter.rs Outdated Show resolved Hide resolved
crates/datafusion/src/lib.rs Outdated Show resolved Hide resolved
crates/core/src/exprs/filter.rs Outdated Show resolved Hide resolved
crates/datafusion/src/utils/exprs_to_filter.rs Outdated Show resolved Hide resolved
crates/datafusion/src/utils/exprs_to_filter.rs Outdated Show resolved Hide resolved
python/src/internal.rs Show resolved Hide resolved
@jonathanc-n
Copy link
Contributor Author

fixed conflicts

@jonathanc-n
Copy link
Contributor Author

@xushiyan Should we add some docs to see how to run codecov locally? It takes a lot of time out of the review process

@xushiyan
Copy link
Member

xushiyan commented Dec 7, 2024

@xushiyan Should we add some docs to see how to run codecov locally? It takes a lot of time out of the review process

Yup having some command in Makefile to general html report locally would be beneficial. Codecov aggregates both rust and python, for local mode, not sure how easy to configure that, may start with separate report.

@xushiyan
Copy link
Member

xushiyan commented Dec 7, 2024

I'm going to take a pass by today and fix wherever necessary, and then merge it.

@xushiyan xushiyan added the python Related to Python codebase label Dec 8, 2024
@xushiyan xushiyan merged commit eb0f520 into apache:main Dec 8, 2024
8 of 9 checks passed
@xushiyan xushiyan mentioned this pull request Jan 30, 2025
16 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature python Related to Python codebase rust Related to Rust codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Integrate with datafusion to support filters pushdown from SQL
2 participants