You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If I read in a .sav file, split it into two subsets and then use bind_rows() to put them back together again, all of the SPSS attributes for character variables are dropped. This does not seem to be an issue for variables that get imported as labelled vectors since the labelled class is compatible with vctrs.
Here is a reprex of the behavior:
library(haven)
library(tidyverse)
path<- tempfile(fileext=".sav")
df<-data.frame(group= c(1, 1, 2, 2),
stringvar= c("a", "b", "c", "d"))
# These are attribute values from an example in a real dataset
attributes(df$stringvar) <-list(label="Some variable label",
format.spss="A255",
display_width=50)
write_sav(df, path)
orig<- read_sav(path)
attributes(orig$stringvar)
#> $label#> [1] "Some variable label"#> #> $format.spss#> [1] "A255"#> #> $display_width#> [1] 50group1<- filter(orig, group==1)
group2<- filter(orig, group==2)
result<- bind_rows(group1, group2)
attributes(result$stringvar)
#> NULL
I believe what's happening is that spss string variables are imported as character vectors instead of labelled because they do not have value labels. But they do have variable labels that are important to preserve.
It seems like it would make sense to import them as labelled variables but with the attribute val_labels = NULL so that the variable label and other attributes are preserved.
I'm not sure if there are other issues that I'm not thinking of, but it'd be hugely helpful, especially when doing data processing or cleaning on datasets that will be saved back out as .sav files.
The text was updated successfully, but these errors were encountered:
This is an issue with vctrs, which is used behind the scenes in the dplyr bind_rows() function to combine vectors and does not preserve attributes for unclassed vectors by default. There's an issue open at r-lib/vctrs#1783 that will potentially address this.
As it stands the only way to preserve attributes with vctrs would be to use classed objects for all vectors read in via haven (as you've suggested). This would work but would be a significant change to haven's behaviour for a marginal benefit, so it's not something we'd be likely to pursue.
It's a bit fiddly, but a viable alternative is to manually save the attributes after reading and reapply on the way out.
If I read in a .sav file, split it into two subsets and then use
bind_rows()
to put them back together again, all of the SPSS attributes for character variables are dropped. This does not seem to be an issue for variables that get imported aslabelled
vectors since the labelled class is compatible withvctrs
.Here is a reprex of the behavior:
I believe what's happening is that spss string variables are imported as character vectors instead of labelled because they do not have value labels. But they do have variable labels that are important to preserve.
It seems like it would make sense to import them as labelled variables but with the attribute
val_labels = NULL
so that the variable label and other attributes are preserved.I'm not sure if there are other issues that I'm not thinking of, but it'd be hugely helpful, especially when doing data processing or cleaning on datasets that will be saved back out as .sav files.
The text was updated successfully, but these errors were encountered: