-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve consistency of names handling in prototype functions #1020
base: main
Are you sure you want to change the base?
Conversation
Now a requirement of `vec_ptype2()`
## Needed because partial classes inherit from `vctrs_sclr` which | ||
## can't be renamed. And `vec_ptype2()` etc zap the names. | ||
#' @export | ||
`names<-.vctrs_partial` <- function(x, value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't feel quite right to me — I don't think we should be systematically stripping names, and then providing names<-
methods that ignore the stripping. Why not just strip names on atomic vectors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The partial objects are not vectors, which makes them weird in vctrs. This is just a stopgap. I don't think the hacks around partial vectors should make us pause.
As for record vectors, they should support names<-
at some point, like other vectors. I would really prefer a general solution than ignoring S3 vectors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More generally, maybe there really exists some vectors that do not support names. It doesn't seem ill-founded to require of these vectors to implement set-to-null as a no-op in their names<-
method.
UNPROTECT(1); | ||
return out; | ||
} | ||
|
||
static SEXP col_ptype(SEXP x) { | ||
return vec_ptype(x, args_empty); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the columns of the data frame be named? Or should names be stripped from them too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I think they should be unnamed for consistency with vec-assign.
case vctrs_type_s3: return s3_type(x, x_arg); | ||
case vctrs_type_scalar: stop_scalar_type(x, x_arg); | ||
} | ||
never_reached("vec_ptype"); | ||
} | ||
|
||
// [[ include("vctrs.h") ]] | ||
SEXP vec_ptype_unnamed(SEXP x, struct vctrs_arg* x_arg) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does feel a bit awkward that we have to use vec_ptype_unnamed()
in vec_ptype2()
to get the unname behavior, but vec_ptype()
retains names. I am also leaning towards letting vec_ptype()
unname, which would remove the need for the specialized vec_ptype_unnamed()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like vec_ptype()
always returning unnamed vectors would simplify the notion of a prototype in vctrs.
@@ -75,6 +85,24 @@ static SEXP s3_type(SEXP x, struct vctrs_arg* x_arg) { | |||
return vec_slice(x, R_NilValue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be interesting to consider a version of vec_slice()
that doesn't preserve names so we don't have to do as much work to unname after we slice. This feels somewhat similar to having a version of vec_c()
that doesn't retain names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we did this, we'd have to reconsider how names are tracked in the lubridate PR. When we proxy, the names are added as a new column. This would get sliced and then restored even if we turned on the option to not preserve names.
https://github.com/tidyverse/lubridate/pull/871/files#diff-29798887c998d8e58008ea67fdf141d3R32
On the other hand, POSIXlt stores names directly on the $year
field. So when we proxy we get a column that is named. If we turn off name preservation that should strip these names (So it seems like the POSIXlt approach is "smarter" than by lubridate approach and might be something to keep in mind going forward)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like if we did this then we wouldn't even need a vec_set_names()
call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may still need the set-names call on S3 objects (or at least on S3 data frames) because the names might be encoded in a field, as in POSIXlt vectors. Or maybe we should enforce names as a special vctrs::rcrd_names
field in data frame proxies for performance and simplicity. Then vec-slice could rely on either the names attribute or the record field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because the names might be encoded in a field, as in POSIXlt vectors
I think I'm arguing above that a vec_slice(x, R_NilValue, preserve_names = false)
would already work correctly in this case.
The proxy for POSIXlt is a data frame with a $year
named year column. That column vector would be sliced with vec_slice(year, preserve_names = false)
so names wouldn't be preserved
There might be other cases where this doesn't work, but it might be useful to say that a proxy
should always expose the names
of a vector in such a way that slicing can remove them like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe POSIXlt is not a good example. I was thinking the difference between named and unnamed record vectors of size zero would be the presence of a names field. This is equivalent to the presence or absence of a names attributes for normal vectors of size zero.
Related issue: #623. |
I am uncomfortable with this change. I don't think it's needed for dplyr 1.0.0, so I'd prefer to put off any discussion until we have more time. |
Interestingly we have this in x <- vec_slice(x, integer())
ptype <- vec_slice(ptype, integer())
# FIXME: Remove row names for matrices and arrays, and handle empty
# but existing dimnames
x <- vec_set_names(x, NULL)
ptype <- vec_set_names(ptype, NULL) Worth noting that rcrd vectors are currently not doing the right thing with this implementation, since vec-set-names currently goes through |
Extracted from r-lib#1020
Extracted from r-lib#1020
Extracted from r-lib#1020
Consistently return unnamed vectors from
vec_ptype2()
. Record vectors gain anames<-
method to support this. It currently only allows setting names toNULL
, but should be extended in the future by storing names in a special field.Consistently preserve names in
vec_ptype()
, including row names. I still wonder ifvec_ptype()
should unname though.In the same spirit of unnaming the results of ptype2 methods, we also empty them. This simplifies the implementation of methods which can now just return
x
ory
.