Skip to content

Commit

Permalink
keep filtered aggregates with subsequent select in semi_join
Browse files Browse the repository at this point in the history
  • Loading branch information
ejneer committed Mar 9, 2024
1 parent 0739c75 commit 26c9294
Show file tree
Hide file tree
Showing 4 changed files with 41 additions and 0 deletions.
3 changes: 3 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# dbplyr (development version)

* `semi_join()` will no longer inline away an aggregate filter (i.e. `HAVING`
clause) that was followed by a `select()` (@ejneer, #1474)

* Refined the `select()` inlining criteria to keep computed columns used to
`arrange()` subqueries that are eliminated by a subsequent select (@ejneer,
#1437).
Expand Down
3 changes: 3 additions & 0 deletions R/lazy-select-query.R
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,9 @@ is_lazy_select_query_simple <- function(x,
if (!is_empty(x$limit)) {
return(FALSE)
}
if (!is_empty(x$having)) {
return(FALSE)
}

TRUE
}
Expand Down
18 changes: 18 additions & 0 deletions tests/testthat/_snaps/verb-joins.md
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,24 @@
(`df_RHS`.`b` = 2)
)

# filtered aggregates with subsequent select are not inlined away in semi_join (#1474)

Code
out
Output
<SQL>
SELECT `df`.*
FROM `df`
WHERE EXISTS (
SELECT 1 FROM (
SELECT `x`
FROM `df`
GROUP BY `x`
HAVING (COUNT(*) = 1.0)
) AS `RHS`
WHERE (`df`.`x` = `RHS`.`x`)
)

# multiple joins create a single query

Code
Expand Down
17 changes: 17 additions & 0 deletions tests/testthat/test-verb-joins.R
Original file line number Diff line number Diff line change
Expand Up @@ -616,6 +616,23 @@ test_that("filter() before semi join is not when y has other operations", {
expect_null(lq$where)
})

test_that("filtered aggregates with subsequent select are not inlined away in semi_join (#1474)", {
lf <- lazy_frame(x = 1, y = 2, z = 3)
lf2 <- lazy_frame(x = 1, a = 2, b = 3)

out <- semi_join(
lf,
lf2 %>%
dplyr::summarize(n = n(), .by = "x") %>%
filter(n == 1) %>%
select(x)
)
lq <- out$lazy_query

expect_equal(lq$y$having, list(quo(n() == 1)), ignore_formula_env = TRUE)
expect_snapshot(out)
})

test_that("multiple joins create a single query", {
lf <- lazy_frame(x = 1, a = 1, .name = "df1")
lf2 <- lazy_frame(x = 1, b = 2, .name = "df2")
Expand Down

0 comments on commit 26c9294

Please sign in to comment.