You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The guide on tidy evaluation suggests forwarding ... to dplyr functions to handle multiple arguments in an NSE setting. In general, this works extremely well and allows custom functions to take advantage of the dplyr flexibility. For example,
from_mtcars<-function(...) {mtcars %>% select(...)}
from_mtcars( vs, am ) # Automatic quoting of arguments
from_mtcars( starts_with("d") ) # Select helper support
Unfortunately, almost invariably somebody passes a factor to the custom function, which leads to unexpected behavior. This usually happens when the user is working with traditional data frames and forgets to do stringsAsFactors=FALSE:
The behavior is due to factors getting converted to integer indices. This creates hard-to-catch bugs in the end user code, because the conversion is silent and the user usually expects the input to behave as strings. The author of the custom function can add a guard against this:
# Works as expected with factors
from_mtcars2( factor(c("vs","am")) )
# vs am# Mazda RX4 0 1# Mazda RX4 Wag 0 1# ...# But not with unevaluated expressions
from_mtcars2( mpg )
# Error in map_lgl(.x, .p, ...) : object 'mpg' not found
Since select() already has low-level access to variables, I was hoping that it could automatically check for the presence of factor arguments. If nothing else, I think that throwing a warning upon encountering one can help alert end users that they are about to encounter unexpected behavior. Potential place where this could be done is around
Note that passing index vectors outside of one_of() and without unquoting will be deprecated, in order to make the selection DSL less ambiguous. See #76
We'll add support for factors and other S3 vectors in one_of(), and when unquoted:
The guide on tidy evaluation suggests forwarding
...
to dplyr functions to handle multiple arguments in an NSE setting. In general, this works extremely well and allows custom functions to take advantage of the dplyr flexibility. For example,Unfortunately, almost invariably somebody passes a factor to the custom function, which leads to unexpected behavior. This usually happens when the user is working with traditional data frames and forgets to do
stringsAsFactors=FALSE
:The behavior is due to factors getting converted to integer indices. This creates hard-to-catch bugs in the end user code, because the conversion is silent and the user usually expects the input to behave as strings. The author of the custom function can add a guard against this:
Unfortunately, this breaks NSE support:
Since
select()
already has low-level access to variables, I was hoping that it could automatically check for the presence of factor arguments. If nothing else, I think that throwing a warning upon encountering one can help alert end users that they are about to encounter unexpected behavior. Potential place where this could be done is aroundtidyselect/R/vars-select.R
Line 144 in 4de95ee
where character arguments get mapped to integer positions.
The text was updated successfully, but these errors were encountered: