Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In read_sheet, can't import blank cells as empty string #174

Closed
dulearnaux opened this issue Jun 19, 2020 · 5 comments
Closed

In read_sheet, can't import blank cells as empty string #174

dulearnaux opened this issue Jun 19, 2020 · 5 comments
Labels
types 🏭 Cell and column types

Comments

@dulearnaux
Copy link

Specifying na explicitly to not convert the default value of "" to NA, still converts "" to NA's once read into R.

I have created a public spread sheet( 1dyCsWgVu72w3yYA6BdmFDqMTrf2OBxawwD_ZBPN7JF8) containing:
https://docs.google.com/spreadsheets/d/1dyCsWgVu72w3yYA6BdmFDqMTrf2OBxawwD_ZBPN7JF8

col1 col2 col3
1 a NA
2 b missing
3 c
4 d hello
5 e w
gsheet_id <-"1dyCsWgVu72w3yYA6BdmFDqMTrf2OBxawwD_ZBPN7JF8"
googlesheets4::read_sheet(gsheet_id, na = c("NA", "missing"))

#> Reading from "googlesheets4_bug"
#> Range "Sheet1"
#> # A tibble: 5 x 3
#>    col1 col2  col3 
#>   <dbl> <chr> <chr>
#> 1     1 a     NA   
#> 2     2 b     NA   
#> 3     3 c     NA   
#> 4     4 d     hello
#> 5     5 e     w    

Was expecting similar behaviour to readr, where "" is retained.

csv_txt <- "col1,col2,col3
1,a,NA
2,b,missing
3,c,
4,d,hello
5,e,w"

readr::read_delim(csv_txt, na = c("missing", "NA"), delim = ",")
#> # A tibble: 5 x 3
#>    col1 col2  col3   
#>   <int> <chr> <chr>  
#> 1     1 a      NA    
#> 2     2 b      NA    
#> 3     3 c     ""     
#> 4     4 d     "hello"
#> 5     5 e     "w"    
@jennybc
Copy link
Member

jennybc commented Mar 8, 2021

I'll have to think about this.

In the meantime, the way to get readr behaviour is to actually use readr! Which the range_speedread() function does.

library(googlesheets4)

gs4_deauth()

gsheet_id <-"1dyCsWgVu72w3yYA6BdmFDqMTrf2OBxawwD_ZBPN7JF8"

range_speedread(gsheet_id, na = c("NA", "missing"))
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   col1 = col_double(),
#>   col2 = col_character(),
#>   col3 = col_character()
#> )
#> # A tibble: 5 x 3
#>    col1 col2  col3   
#>   <dbl> <chr> <chr>  
#> 1     1 a      <NA>  
#> 2     2 b      <NA>  
#> 3     3 c     ""     
#> 4     4 d     "hello"
#> 5     5 e     "w"

Created on 2021-03-08 by the reprex package (v1.0.0.9002)

https://googlesheets4.tidyverse.org/reference/range_speedread.html

@jennybc jennybc added the types 🏭 Cell and column types label Mar 8, 2021
@jennybc jennybc changed the title In read_sheet, can't import blank cells as zero length character: "" In read_sheet, can't import blank cells as empty string Mar 8, 2021
@dulearnaux
Copy link
Author

Thanks. That workaround works.

@jennybc
Copy link
Member

jennybc commented Jul 11, 2021

This sort of came up in #221. Where I also show how to use tidyr::replace_na() to replace NA with "".

@jennybc
Copy link
Member

jennybc commented Jul 11, 2021

Here's how this should probably work:

readr::read_csv("x,y\na,b\nc,")
#> # A tibble: 2 x 2
#>   x     y    
#>   <chr> <chr>
#> 1 a     b    
#> 2 c     <NA>
readr::read_csv("x,y\na,b\nc,", na = character())
#> # A tibble: 2 x 2
#>   x     y    
#>   <chr> <chr>
#> 1 a     "b"  
#> 2 c     ""

Created on 2021-07-11 by the reprex package (v2.0.0.9000)

@jennybc
Copy link
Member

jennybc commented Jul 15, 2021

After:

library(googlesheets4)

gsheet_id <-"1dyCsWgVu72w3yYA6BdmFDqMTrf2OBxawwD_ZBPN7JF8"
gs4_deauth()

range_speedread(gsheet_id, na = c("NA", "missing"))
#> ✓ Reading from "googlesheets4_bug".
#> ℹ Export URL:
#>   <https://docs.google.com/spreadsheets/d/1dyCsWgVu72w3yYA6BdmFDqMTrf2OBxawwD_ZBPN7JF8/export?format=csv>
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   col1 = col_double(),
#>   col2 = col_character(),
#>   col3 = col_character()
#> )
#> # A tibble: 5 x 3
#>    col1 col2  col3   
#>   <dbl> <chr> <chr>  
#> 1     1 a      <NA>  
#> 2     2 b      <NA>  
#> 3     3 c     ""     
#> 4     4 d     "hello"
#> 5     5 e     "w"

read_sheet(gsheet_id, na = c("NA", "missing"))
#> ✓ Reading from "googlesheets4_bug".
#> ✓ Range 'Sheet1'.
#> # A tibble: 5 x 3
#>    col1 col2  col3   
#>   <dbl> <chr> <chr>  
#> 1     1 a      <NA>  
#> 2     2 b      <NA>  
#> 3     3 c     ""     
#> 4     4 d     "hello"
#> 5     5 e     "w"

Created on 2021-07-14 by the reprex package (v2.0.0.9000)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
types 🏭 Cell and column types
Projects
None yet
Development

No branches or pull requests

2 participants