Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter(.by = ) leads to incorrect results when NAs are present #474

Open
markfairbanks opened this issue Jul 9, 2024 · 1 comment
Open
Labels
bug an unexpected problem or unintended behavior

Comments

@markfairbanks
Copy link
Collaborator

library(dtplyr)
library(dplyr)
df <- tibble(x = c(1, 2, NA), y = c("a", "a", "b"))

lazy_dt(df) %>%
  filter(x != 2, .by = y)
#> Source: local data table [2 x 2]
#> Call:   `_DT1`[`_DT1`[, .I[x != 2], by = .(y)]$V1]
#> 
#>       x y    
#>   <dbl> <chr>
#> 1     1 a    
#> 2    NA <NA> 
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results
df %>%
  filter(x != 2, .by = y)
#> # A tibble: 1 × 2
#>       x y    
#>   <dbl> <chr>
#> 1     1 a
@markfairbanks markfairbanks added the bug an unexpected problem or unintended behavior label Jul 9, 2024
@markfairbanks
Copy link
Collaborator Author

Looks like this is happening because we use the .I trick when filtering by group. .I returns a vector with NAs, and slicing using NAs causes you to make empty rows in data.table.

library(dtplyr)
library(dplyr)

example_df <- data.table(id = c(1, 1, 1, 1), value = c(NA, NA, 0, 1))

example_df %>%
  lazy_dt() %>%
  filter(value == 0, .by = id)
#> Source: local data table [3 x 2]
#> Call:   `_DT1`[`_DT1`[, .I[value == 0], by = .(id)]$V1]
#> 
#>      id value
#>   <dbl> <dbl>
#> 1    NA    NA
#> 2    NA    NA
#> 3     1     0
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results

example_df[c(NA, NA, 3)]
#>       id value
#>    <num> <num>
#> 1:    NA    NA
#> 2:    NA    NA
#> 3:     1     0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

1 participant