Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sheets un-even columns range_speedread incorrect #309

Closed
HugoGit39 opened this issue Dec 28, 2023 · 2 comments
Closed

Sheets un-even columns range_speedread incorrect #309

HugoGit39 opened this issue Dec 28, 2023 · 2 comments

Comments

@HugoGit39
Copy link

HugoGit39 commented Dec 28, 2023

Hi

I have a large Google Sheets with uneven column lengths.

When I use range_speedread it doesnt read the last columns correct. Why?

See this example:

Google sheets:

https://docs.google.com/spreadsheets/d/1t2JYOCsCvK05Layi3loXFWwCBGXjXNUDMUAi9xudiHg/

test <- range_speedread(as_id("1t2JYOCsCvK05Layi3loXFWwCBGXjXNUDMUAi9xudiHg"), show_col_types = F)
@jennybc
Copy link
Member

jennybc commented Jan 15, 2024

I'm not entirely sure what you mean by "doesnt read the last columns correct".

But I think you're just noticing trickiness of column type guessing in the presence of lots of missing data?

The docs for range_speedread() outline various gotchas of this function and point out that, ultimately, readr::read_csv()) gets used.

You can read about readr's column type guessing here:

https://readr.tidyverse.org/articles/column-types.html

But one solution for this dataset is just to instruct readr to use all the rows to guess column type, instead of the first 1000.

library(googlesheets4)
gs4_deauth()

test2 <- range_speedread(
  "1t2JYOCsCvK05Layi3loXFWwCBGXjXNUDMUAi9xudiHg",
  guess_max = Inf
)
#> ✔ Reading from "Test".
#> ℹ Export URL:
#>   <https://docs.google.com/spreadsheets/d/1t2JYOCsCvK05Layi3loXFWwCBGXjXNUDMUAi9xudiHg/export?format=csv>
#> Rows: 4741 Columns: 7
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> dbl (7): X0, X1, X2, X3, X4, X5, X6
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
test2
#> # A tibble: 4,741 × 7
#>       X0    X1    X2    X3    X4    X5    X6
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 0.3      NA    NA    NA    NA    NA    NA
#>  2 0.300    NA    NA    NA    NA    NA    NA
#>  3 0.295    NA    NA    NA    NA    NA    NA
#>  4 0.299    NA    NA    NA    NA    NA    NA
#>  5 0.299    NA    NA    NA    NA    NA    NA
#>  6 0.298    NA    NA    NA    NA    NA    NA
#>  7 0.32     NA    NA    NA    NA    NA    NA
#>  8 0.323    NA    NA    NA    NA    NA    NA
#>  9 0.323    NA    NA    NA    NA    NA    NA
#> 10 0.327    NA    NA    NA    NA    NA    NA
#> # ℹ 4,731 more rows
tail(test2)
#> # A tibble: 6 × 7
#>       X0    X1    X2    X3    X4      X5    X6
#>    <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl> <dbl>
#> 1 42272.  NA    NA    NA   NA         NA  NA  
#> 2 43605.  NA    NA    NA   NA         NA  NA  
#> 3 43870.  NA    NA    NA   NA         NA  NA  
#> 4 44010.  10.2  10.8   7.3  7.54 9193097  42.9
#> 5 43769.  NA    NA    NA   NA         NA  NA  
#> 6 43098.  NA    NA    NA   NA         NA  NA

Created on 2024-01-15 with reprex v2.1.0.9000

@jennybc jennybc closed this as completed Jan 15, 2024
@HugoGit39
Copy link
Author

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants