You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Delta-rs version: 0.22 (Tested multiple versions up to latest) PyArrow version: 18.0.0
OS: Mac 14.4 (23E214)
Delta table on S3
Bug
What happened:
When applying a filter expression (lighting == "day") using pyarrow.dataset, no results are returned. However, if I do not apply the filter at this stage and instead filter the resulting pandas DataFrame (results[results["lighting"] == "day"]), I find that rows are filtered out, confirming that data matching the condition exists in the dataset.
What you expected to happen:
The filter method should correctly return rows where lighting == "day" when applied directly on the pyarrow.dataset.
I haven't been able to replicate the error using mock data. However, I discovered that applying the filter with match_substring does return the expected values.
I've already verified that the data does not contain leading or trailing spaces or unexpected characters in the lighting column.
Any ideas on what else I can check?
I haven't been able to replicate the error using mock data. However, I discovered that applying the filter with match_substring does return the expected values.
I've already verified that the data does not contain leading or trailing spaces or unexpected characters in the lighting column.
Any ideas on what else I can check?
Can you check the statistics for that column for all your add actions? Maybe there is something happening there.
Environment
Delta-rs version: 0.22 (Tested multiple versions up to latest)
PyArrow version: 18.0.0
Bug
What happened:
When applying a filter expression (
lighting == "day"
) usingpyarrow.dataset
, no results are returned. However, if I do not apply the filter at this stage and instead filter the resulting pandas DataFrame (results[results["lighting"] == "day"]
), I find that rows are filtered out, confirming that data matching the condition exists in the dataset.What you expected to happen:
The
filter
method should correctly return rows wherelighting == "day"
when applied directly on thepyarrow.dataset
.How to reproduce it:
Given a delta table as such
More Details:
The text was updated successfully, but these errors were encountered: