Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

select() should warn on factor arguments #109

Closed
ArtemSokolov opened this issue Aug 16, 2019 · 1 comment · Fixed by #117
Closed

select() should warn on factor arguments #109

ArtemSokolov opened this issue Aug 16, 2019 · 1 comment · Fixed by #117
Labels
bug an unexpected problem or unintended behavior dsl

Comments

@ArtemSokolov
Copy link

The guide on tidy evaluation suggests forwarding ... to dplyr functions to handle multiple arguments in an NSE setting. In general, this works extremely well and allows custom functions to take advantage of the dplyr flexibility. For example,

from_mtcars <- function(...) {mtcars %>% select(...)}
from_mtcars( vs, am )              # Automatic quoting of arguments
from_mtcars( starts_with("d") )    # Select helper support

Unfortunately, almost invariably somebody passes a factor to the custom function, which leads to unexpected behavior. This usually happens when the user is working with traditional data frames and forgets to do stringsAsFactors=FALSE:

df <- data.frame( v=c("vs","am") )
from_mtcars( df$v )
#                     cyl  mpg
# Mazda RX4             6 21.0
# Mazda RX4 Wag         6 21.0
# ...

The behavior is due to factors getting converted to integer indices. This creates hard-to-catch bugs in the end user code, because the conversion is silent and the user usually expects the input to behave as strings. The author of the custom function can add a guard against this:

from_mtcars2 <- function(...)
{
    v <- purrr::map_if( list(...), is.factor, as.character )
    mtcars %>% select(!!!v)
}

Unfortunately, this breaks NSE support:

# Works as expected with factors
from_mtcars2( factor(c("vs","am")) )
#                     vs am
# Mazda RX4            0  1
# Mazda RX4 Wag        0  1
# ...

# But not with unevaluated expressions
from_mtcars2( mpg )
# Error in map_lgl(.x, .p, ...) : object 'mpg' not found

Since select() already has low-level access to variables, I was hoping that it could automatically check for the presence of factor arguments. If nothing else, I think that throwing a warning upon encountering one can help alert end users that they are about to encounter unexpected behavior. Potential place where this could be done is around

ind_list <- map_if(ind_list, is_character, match_var, table = .vars)

where character arguments get mapped to integer positions.

@lionel- lionel- added bug an unexpected problem or unintended behavior dsl labels Sep 9, 2019
lionel- added a commit to lionel-/tidyselect that referenced this issue Sep 11, 2019
@lionel-
Copy link
Member

lionel- commented Sep 11, 2019

Note that passing index vectors outside of one_of() and without unquoting will be deprecated, in order to make the selection DSL less ambiguous. See #76

We'll add support for factors and other S3 vectors in one_of(), and when unquoted:

fct <- factor(c("disp", "drat"))
mtcars %>% select(!!fct)
mtcars %>% select(one_of(fct))

lionel- added a commit to lionel-/tidyselect that referenced this issue Sep 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior dsl
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants