-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
With the introduction of .by
, we no longer sort group keys automatically. There are a whole host of good reasons for this as outlined here #5664 (comment), and I am mostly confident this is the right long term default for dplyr.
However, I am empathetic to the fact that users do often like to see their summary results sorted in ascending order. Right now, our recommendation is:
df %>%
summarise(..., .by = c(a, b, c)) %>%
arrange(a, b, c) # could also come before `summarise()`
This is nice because you get the full power of arrange()
including desc()
and .locale
.
I think we should consider a .sort
argument like:
df %>%
summarise(..., .by = c(a, b, c), .sort = TRUE)
.sort = FALSE
would be the default for reasons mentioned above.- We'd document this as the 100% backwards compatible way to transition from
group_by()
to.by
(even though most of the time the ordering isn't important). - You must accept that you get ascending order and the C locale. That makes it compatible with
group_by()
. If you need anything fancier, callarrange()
. - I do like that you won't have to repeat the group names.
- Obviously
.sort = TRUE
errors on unorderable types like clock's year-month-weekday. - This would probably only be an argument for the
.data.frame
method, as opposed to the generic, because dbplyr probably won't want to enforce a sort order? Uncertain.
Basically, this leaves the idea of a groupby + summarise
operation theoretically pure (because it shouldn't require orderable keys), but also gives users a convenient way to optionally opt in to sorted results.
There are 3 functions that would get this argument:
summarise()
reframe()
slice_sample()
(goes withslice()
andslice_head/tail/min/max()
should act like afilter()
not areframe()
#6662)
The following would not get .sort
because they aren't about row ordering:
filter()
mutate()
slice()
andslice_min/max/head/tail()
(afterslice()
andslice_head/tail/min/max()
should act like afilter()
not areframe()
#6662 is changed)