-
Notifications
You must be signed in to change notification settings - Fork 54
/
Copy pathboolean-strategies.qmd
135 lines (100 loc) · 6.96 KB
/
boolean-strategies.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
# Prefer a enum, even if only two choices {#sec-boolean-strategies}
```{r}
#| include = FALSE
source("common.R")
```
## What's the pattern?
If your function implements two strategies, it's tempting to distinguish between them using an argument that takes either `TRUE` or `FALSE`.
However, I recommend that you use an enumeration unless:
- You're **really sure** there won't ever be another strategy. If you do discover a third (or fourth, or fifth, or ...) strategy, you'll need to change the interface of your function.
- It's very clear what both `TRUE` and `FALSE` options mean just from the name of the argument. Generally the `TRUE` value tends to be easier to understand because `something = TRUE` tells you what will happen, but `something = FALSE` only tells you what won't happen.
## What are some examples?
There are quite a few examples of the problem in tidyverse, because this is a pattern that we only discovered relatively recently:
- By default, `stringr::str_subset(string, pattern)` returns the elements of `string` that match the `pattern`. You can use `negate = TRUE` to instead return the elements that don't match the pattern, but I now wonder if would be more clear as `return = c("matches", "non-matches")`.
- `httr2::multi_req_perform()` allows you to perform a bunch of HTTP requests in parallel. It has an argument called `cancel_on_error` that can take `TRUE` or `FALSE`. It's fairly clear what `cancel_on_error = TRUE` means; but it's not so obvious what `cancel_on_error = FALSE` does. Additionally, it seems likely that I'll come up with other possible error handling strategies in the future, and even though I don't know what they are now, it would be better to plan for the future with an argument specification like `error = c("cancel", "continue")`.
`cut()` has an argument called `right` which is used to pick between right-closed left-open intervals (`TRUE)` and right-open left-closed arguments.
I think it's hard to remember which is which and a clearer specification might be `open_side = c("right", "left")` or maybe `bounds = c("[)", "(]")`.
Another interesting case in base R is `sort()` which has two arguments that take a single logical value: `decreasing` and `na.last`:
- The `decreasing` argument is used to pick between sorting in ascending or descending order.
It's easy to understand what `decreasing = TRUE` does, but slightly less clear what `decreasing = FALSE`, the default, means because it feels like a double negative:
```{r}
#| results: false
x <- sample(10)
sort(x, decreasing = TRUE)
sort(x, decreasing = FALSE)
```
Compare this with `vctrs::vec_sort()`, which uses an enum:
```{r}
#| results: false
vctrs::vec_sort(x, direction = "desc")
vctrs::vec_sort(x, direction = "asc")
```
I think this is a mild improvement because the two options are spelled out explicitly.
- The `na.last` argument is used to control the location of missing values in the result.
It takes three possible values: `TRUE` (put `NA`s at the end), `FALSE` (put `NA`s at the beginning), or `NA` (drop `NA`s from the result).
This is an interesting way to support three strategies, but as we'll see later I think this would be more clear the argument specification was `na = c("drop", "first", "last")`.
## How do you remediate past mistakes?
There are two possible ways to switch to using a strategy instead of `TRUE`/`FALSE` depending on whether the old argument name makes sense with the new argument values.
The sections below show what you'll need to do if you need a new argument (most cases) or if you're lucky enough to be able to reuse the existing argument.
### Create a new argument
Imagine we wanted to remediate the `na.last` argument to `sort()`.
Currently:
- `na.last = TRUE` means put `NA`s last.
- `na.last = FALSE` means put `NA`s first.
- `na.list = NA` means to drop them.
I think we could make this function more clear by changing the argument name to `na` and accepting one of three values: `last`, `first`, or `drop`.
Changing an argument name is equivalent removing the old name and adding the new name.
This way of thinking about the change makes it easier to see how you do it in a backward compatible way: you need to deprecate the old argument in favour of the new one.
```{r}
sort <- function(x,
na.last = lifecycle::deprecated(),
na = c("drop", "first", "last")) {
if (lifecycle::is_present(na.last)) {
lifecycle::deprecate_warn("1.0.0", "sort(na.last)", "sort(na)")
if (!is.logical(na.last) || length(na.last) != 1) {
cli::cli_abort("{.arg na.last} must be a single TRUE, FALSE, or NA.")
}
if (isTRUE(na.last)) {
na <- "last"
} else if (isFALSE(na.last)) {
na <- "first"
} else {
na <- "drop"
}
} else {
na <- arg_match(na)
}
...
}
```
::: callout-note
Note that because `na` is a prefix of `na.last` and `sort()` puts `na.last` before `…,`not after it (see @sec-dots-after-required), this introduces a very subtle behaviour change.
Previously, `sort(x, n = TRUE)` would have worked and been equivalent to `sort(x, na.last = TRUE)`.
But it will now fail because `n` is a prefix of two arguments (`na` and `na.last)`.
This is unlikely to affect much code, but is worth being aware of.
It would also be nice to make the default value `"last"` to match `order()`, especially since it's very unusual for a function to silently remove missing values.
However, that's likely to affect a lot of existing code, making it unlikely to be worthwhile.
:::
### Re-use an existing name
Originally `haven::write_sav(compress)` could either be `TRUE` (compress the file) or `FALSE` (don't compress it).
But then SPSS version 21.0 introduced a new way of compressing files leading to three possible options: compress with the new way (zsav), compress with the old way (bytes), or don't compress.
In this case we got lucky because we can continue to use the same argument name: `compress = c("byte", "zsav", "none")`.
We allowed existing code by special casing the behaviour of `TRUE` and `FALSE`:
```{r}
write_sav <- function(data, path, compress = c("byte", "zsav", "none"), adjust_tz = TRUE) {
if (isTRUE(compress)) {
compress <- "zsav"
} else if (isFALSE(compress)) {
compress <- "none"
} else {
compress <- arg_match(compress)
}
...
}
```
You could choose to deprecate `TRUE` and `FALSE`, but here we chose to the keep them since it's only a small amount of extra code in haven, and it means that existing users don't need to think about it.
See `?haven::read_sav` for how we communicated the change in the docs.
In a future version of haven we might change the order of the enum so that the `zsav` compression method becomes the default.
This generally yields smaller files but can't be read by older versions of SPSS.
Now that v21 is over 5 years old[^boolean-strategies-1], it's reasonable to make the smaller format the default.
[^boolean-strategies-1]: Five years is the general threshold for support across the tidyverse.