-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snowflake translation error: dropped filter with anti_join
#1474
Comments
Does |
That fixes it!
I really appreciate the quick update here, and am hopeful this will make it into the upcoming release! |
Adding that (1) the behavior isn't backend-specific and (2) the impact is that library(dplyr, warn.conflicts = FALSE)
table <- tibble::tibble(a = c("x", "y"), b = c(1, 2))
con <- DBI::dbConnect(RSQLite::SQLite())
copy_to(con, table, "table")
tbl_lazy <- tbl(con, "table")
# only `y` meets condition:
filtered <- tbl_lazy %>%
group_by(a) %>%
summarize(include = all(b > 1)) %>%
filter(include) %>%
select(a)
# correct: only `y` returned
tbl_lazy %>% inner_join(filtered, by = "a")
#> # Source: SQL [1 x 2]
#> # Database: sqlite 3.45.0 []
#> a b
#> <chr> <dbl>
#> 1 y 2
# incorrect: both returned
tbl_lazy %>% semi_join(filtered, by = "a")
#> # Source: SQL [2 x 2]
#> # Database: sqlite 3.45.0 []
#> a b
#> <chr> <dbl>
#> 1 x 1
#> 2 y 2 Created on 2024-03-15 with reprex v2.1.0 Confirmed that the PR resolves the issue:
@hadley @mgirlich would it be possible to include #1475 in the next release? |
Greetings,
dbplyr
friends. I'd like to report an issue we're experiencing with Snowflake translations, related tofilter()
criteria dropped from lazy tables whensemi_join()
ed. This seems specifically limited tofilter()
applied to columns which result fromsummarize()
, that are not selected for inclusion in the result.Here's a small example, reproducing the behavior:
If we inspect the query produced for
sim_2_transformed
, it looks correct:However, the join drops the
HAVING
criterion leading to an incorrect result;join_result
isNotably, if I remove
select(a)
from the definition ofsim_2_transformed
, theHAVING
clause is included as expected:Thanks for the continuing support of Snowflake backends; I'm happy to help with any testing that might be valuable.
Release version of dbplyr, 2.4.0. cc @fh-mthomson
The text was updated successfully, but these errors were encountered: