-
Notifications
You must be signed in to change notification settings - Fork 420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: use projected_table_schema for projection in DeltaSchemaAdapter #3068
fix: use projected_table_schema for projection in DeltaSchemaAdapter #3068
Conversation
ACTION NEEDED delta-rs follows the Conventional Commits specification for release automation. The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. |
1840bb7
to
35fdee8
Compare
For reference, the corresponding code in Datafusion 42 is https://github.com/apache/datafusion/blob/a43ce8bd67e8d649e0e4f5260f8a5e6e10d62dbc/datafusion/core/src/datasource/physical_plan/parquet/opener.rs#L80-L82 When upgrading from 42 to 43 the meaning of The changes to the SchemaAdapter API were introduced in apache/datafusion#12135 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will have this merged one main
goes green again
35fdee8
to
43f2b8c
Compare
After upgrading from deltalake 0.20.1 to 0.22.3 it looks like Parquet column projection is broken when using DeltaTable::scan. Instead of scanning only the a single column, it looks like all columns are fetched from storage. Inspection with a debugger revelas that the adapted_projections are wrong here: https://github.com/apache/datafusion/blob/88f58bf929167c5c5e2250ad87caa88d4dff11e5/datafusion/core/src/datasource/physical_plan/parquet/opener.rs#L153-L159 The adapted_projections are obtained in https://github.com/delta-io/delta-rs/blob/5b2f46b06e0eb508f932a8b39feb11b568a78a32/crates/core/src/delta_datafusion/schema_adapter.rs#L46-L60 Changing line 49 to use the projected_table_schema seems to solve the problem. Signed-off-by: Jonas Irgens Kylling <[email protected]>
43f2b8c
to
7a89c3b
Compare
Head branch was pushed to by a user without write access
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3068 +/- ##
==========================================
+ Coverage 72.40% 72.42% +0.02%
==========================================
Files 128 128
Lines 41007 41012 +5
Branches 41007 41012 +5
==========================================
+ Hits 29690 29703 +13
+ Misses 9425 9424 -1
+ Partials 1892 1885 -7 ☔ View full report in Codecov by Sentry. |
Description
After upgrading from deltalake 0.20.1 to 0.22.3 it looks like Parquet column projection is broken when using DeltaTable::scan. Instead of scanning only the a single column, it looks like all columns are fetched from storage.
Inspection with a debugger reveals that the adapted_projections are wrong here:
https://github.com/apache/datafusion/blob/88f58bf929167c5c5e2250ad87caa88d4dff11e5/datafusion/core/src/datasource/physical_plan/parquet/opener.rs#L153-L159 The adapted_projections are obtained in
delta-rs/crates/core/src/delta_datafusion/schema_adapter.rs
Lines 46 to 60 in 5b2f46b
Opening this to see if we have test coverage.