Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String variables imported from .sav files lose their attributes with bind_rows #717

Closed
awmercer opened this issue Mar 1, 2023 · 1 comment

Comments

@awmercer
Copy link

awmercer commented Mar 1, 2023

If I read in a .sav file, split it into two subsets and then use bind_rows() to put them back together again, all of the SPSS attributes for character variables are dropped. This does not seem to be an issue for variables that get imported as labelled vectors since the labelled class is compatible with vctrs.

Here is a reprex of the behavior:

library(haven)
library(tidyverse)

path <- tempfile(fileext = ".sav")

df <- data.frame(group = c(1, 1, 2, 2), 
                 stringvar = c("a", "b", "c", "d"))

# These are attribute values from an example in a real dataset
attributes(df$stringvar) <- list(label = "Some variable label", 
                               format.spss = "A255", 
                               display_width = 50)

write_sav(df, path)

orig <- read_sav(path)
attributes(orig$stringvar)
#> $label
#> [1] "Some variable label"
#> 
#> $format.spss
#> [1] "A255"
#> 
#> $display_width
#> [1] 50

group1 <- filter(orig, group == 1) 
group2 <- filter(orig, group == 2)

result <- bind_rows(group1, group2)

attributes(result$stringvar)
#> NULL

I believe what's happening is that spss string variables are imported as character vectors instead of labelled because they do not have value labels. But they do have variable labels that are important to preserve.

It seems like it would make sense to import them as labelled variables but with the attribute val_labels = NULL so that the variable label and other attributes are preserved.

I'm not sure if there are other issues that I'm not thinking of, but it'd be hugely helpful, especially when doing data processing or cleaning on datasets that will be saved back out as .sav files.

@gorcha
Copy link
Member

gorcha commented Mar 3, 2023

Hi @awmercer,

This is an issue with vctrs, which is used behind the scenes in the dplyr bind_rows() function to combine vectors and does not preserve attributes for unclassed vectors by default. There's an issue open at r-lib/vctrs#1783 that will potentially address this.

As it stands the only way to preserve attributes with vctrs would be to use classed objects for all vectors read in via haven (as you've suggested). This would work but would be a significant change to haven's behaviour for a marginal benefit, so it's not something we'd be likely to pursue.

It's a bit fiddly, but a viable alternative is to manually save the attributes after reading and reapply on the way out.

@gorcha gorcha closed this as completed Mar 3, 2023
@gorcha gorcha closed this as not planned Won't fix, can't repro, duplicate, stale Mar 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants