Improve consistency of names handling in prototype functions #1020

lionel- · 2020-04-17T17:36:34Z

Consistently return unnamed vectors from vec_ptype2(). Record vectors gain a names<- method to support this. It currently only allows setting names to NULL, but should be extended in the future by storing names in a special field.
Consistently preserve names in vec_ptype(), including row names. I still wonder if vec_ptype() should unname though.
In the same spirit of unnaming the results of ptype2 methods, we also empty them. This simplifies the implementation of methods which can now just return x or y.

Now a requirement of `vec_ptype2()`

hadley · 2020-04-19T14:03:35Z

R/partial.R

+## Needed because partial classes inherit from `vctrs_sclr` which
+## can't be renamed. And `vec_ptype2()` etc zap the names.
+#' @export
+`names<-.vctrs_partial` <- function(x, value) {


This doesn't feel quite right to me — I don't think we should be systematically stripping names, and then providing names<- methods that ignore the stripping. Why not just strip names on atomic vectors?

The partial objects are not vectors, which makes them weird in vctrs. This is just a stopgap. I don't think the hacks around partial vectors should make us pause.

As for record vectors, they should support names<- at some point, like other vectors. I would really prefer a general solution than ignoring S3 vectors.

More generally, maybe there really exists some vectors that do not support names. It doesn't seem ill-founded to require of these vectors to implement set-to-null as a no-op in their names<- method.

DavisVaughan · 2020-04-20T12:10:52Z

src/type.c

+  UNPROTECT(1);
+  return out;
+}
+
 static SEXP col_ptype(SEXP x) {
  return vec_ptype(x, args_empty);


Should the columns of the data frame be named? Or should names be stripped from them too?

Good point. I think they should be unnamed for consistency with vec-assign.

DavisVaughan · 2020-04-20T12:13:11Z

src/type.c

  case vctrs_type_s3:          return s3_type(x, x_arg);
  case vctrs_type_scalar:      stop_scalar_type(x, x_arg);
  }
  never_reached("vec_ptype");
 }

+// [[ include("vctrs.h") ]]
+SEXP vec_ptype_unnamed(SEXP x, struct vctrs_arg* x_arg) {


It does feel a bit awkward that we have to use vec_ptype_unnamed() in vec_ptype2() to get the unname behavior, but vec_ptype() retains names. I am also leaning towards letting vec_ptype() unname, which would remove the need for the specialized vec_ptype_unnamed()

I feel like vec_ptype() always returning unnamed vectors would simplify the notion of a prototype in vctrs.

DavisVaughan · 2020-04-20T12:14:30Z

src/type.c

@@ -75,6 +85,24 @@ static SEXP s3_type(SEXP x, struct vctrs_arg* x_arg) {
  return vec_slice(x, R_NilValue);


It would be interesting to consider a version of vec_slice() that doesn't preserve names so we don't have to do as much work to unname after we slice. This feels somewhat similar to having a version of vec_c() that doesn't retain names

If we did this, we'd have to reconsider how names are tracked in the lubridate PR. When we proxy, the names are added as a new column. This would get sliced and then restored even if we turned on the option to not preserve names.
https://github.com/tidyverse/lubridate/pull/871/files#diff-29798887c998d8e58008ea67fdf141d3R32

On the other hand, POSIXlt stores names directly on the $year field. So when we proxy we get a column that is named. If we turn off name preservation that should strip these names (So it seems like the POSIXlt approach is "smarter" than by lubridate approach and might be something to keep in mind going forward)

It seems like if we did this then we wouldn't even need a vec_set_names() call

We may still need the set-names call on S3 objects (or at least on S3 data frames) because the names might be encoded in a field, as in POSIXlt vectors. Or maybe we should enforce names as a special vctrs::rcrd_names field in data frame proxies for performance and simplicity. Then vec-slice could rely on either the names attribute or the record field.

because the names might be encoded in a field, as in POSIXlt vectors

I think I'm arguing above that a vec_slice(x, R_NilValue, preserve_names = false) would already work correctly in this case.

The proxy for POSIXlt is a data frame with a $year named year column. That column vector would be sliced with vec_slice(year, preserve_names = false) so names wouldn't be preserved

There might be other cases where this doesn't work, but it might be useful to say that a proxy should always expose the names of a vector in such a way that slicing can remove them like this.

Maybe POSIXlt is not a good example. I was thinking the difference between named and unnamed record vectors of size zero would be the presence of a names field. This is equivalent to the presence or absence of a names attributes for normal vectors of size zero.

lionel- · 2020-04-20T12:24:53Z

Related issue: #623. vec_cast() should preserve the names of the input

hadley · 2020-04-20T12:25:43Z

I am uncomfortable with this change. I don't think it's needed for dplyr 1.0.0, so I'd prefer to put off any discussion until we have more time.

lionel- · 2020-04-20T14:07:45Z

Interestingly we have this in vec_is(), which also goes in the direction of removing names from prototypes:

  x <- vec_slice(x, integer())
  ptype <- vec_slice(ptype, integer())

  # FIXME: Remove row names for matrices and arrays, and handle empty
  # but existing dimnames
  x <- vec_set_names(x, NULL)
  ptype <- vec_set_names(ptype, NULL)

Worth noting that rcrd vectors are currently not doing the right thing with this implementation, since vec-set-names currently goes through names<-.vctrs_vctr. This overwrites the field names.

Extracted from r-lib#1020

lionel- added 5 commits April 17, 2020 18:42

Unname common prototypes

579ae5f

Allow setting names of partial and rcrd vectors to NULL

2126fde

Now a requirement of `vec_ptype2()`

Remove row names of common prototypes

fdfd2a0

Fix names handling of S3 classes in vec_ptype() and vec_ptype2()

66d299a

Empty the results of vec_ptype2() methods

3b869ce

lionel- requested review from hadley and DavisVaughan April 17, 2020 17:36

hadley reviewed Apr 19, 2020

View reviewed changes

DavisVaughan reviewed Apr 20, 2020

View reviewed changes

lionel- added a commit to lionel-/vctrs that referenced this pull request Apr 27, 2020

Preserve type of row names in vec_ptype()

09d151b

Extracted from r-lib#1020

lionel- mentioned this pull request Apr 27, 2020

Preserve type of row names in vec_ptype() #1050

Merged

lionel- added a commit to lionel-/vctrs that referenced this pull request Apr 27, 2020

Preserve type of row names in vec_ptype()

0cf9531

Extracted from r-lib#1020

lionel- added a commit to lionel-/vctrs that referenced this pull request Apr 27, 2020

Preserve type of row names in vec_ptype()

20392e8

Extracted from r-lib#1020

lionel- added this to the 0.4.0 milestone May 2, 2020

lionel- force-pushed the master branch from 76f8215 to 10cb693 Compare May 7, 2020 16:55

lionel- force-pushed the master branch from 82aa9c2 to 9f35b52 Compare May 29, 2020 11:32

lionel- force-pushed the master branch from 1742a31 to ee5e377 Compare August 28, 2020 15:21

lionel- mentioned this pull request Mar 10, 2021

Rework vec_ptype() into a generic for S3 types #1322

Merged

lionel- force-pushed the main branch from f14d533 to 652f0c5 Compare October 21, 2022 19:57

		@@ -75,6 +85,24 @@ static SEXP s3_type(SEXP x, struct vctrs_arg* x_arg) {
		return vec_slice(x, R_NilValue);

Improve consistency of names handling in prototype functions #1020

Are you sure you want to change the base?

Improve consistency of names handling in prototype functions #1020

Uh oh!

Conversation

lionel- commented Apr 17, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lionel- Apr 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DavisVaughan Apr 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DavisVaughan Apr 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lionel- commented Apr 20, 2020

Uh oh!

hadley commented Apr 20, 2020

Uh oh!

lionel- commented Apr 20, 2020

Uh oh!

Uh oh!

lionel- Apr 19, 2020 •

edited

Loading

DavisVaughan Apr 20, 2020 •

edited

Loading

DavisVaughan Apr 20, 2020 •

edited

Loading