-
Notifications
You must be signed in to change notification settings - Fork 190
Open
Description
Bigrquery seems to assume that the table schema won't change, which can cause issues. The reprex demonstrates that if you create a table in BigQuery and then add a column, you can't access the new column using dplyr syntax. I'm told that this is due to bigrquery getting the schema from a cache.
We have a couple of work arounds, but neither are ideal:
- If you disconnect the connection, wait up to a minute or so and then reconnect, it can see the new column.
- Redefining
bq_table_meta(https://github.com/r-dbi/bigrquery/blob/main/R/bq-table.R) as below works but has been inconsistent when I have tested it. Sometimes just waiting the minute or so works, or sometimes I have had to runcars_new_col_qu2 = tbl(con, cars_table)in the reprex twice to get it to work.
bq_table_meta <- function (x, fields = NULL) {
x <- as_bq_table(x)
url <- bq_path(x$project, x$dataset, x$table)
bq_get(url, useQueryCache = FALSE, query = list(fields = fields))
}Reprex (uses the standard bigrquery bq_table_meta, not the custom one above):
library(DBI)
library(bigrquery)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(dbplyr)
#>
#> Attaching package: 'dbplyr'
#> The following objects are masked from 'package:dplyr':
#>
#> ident, sql
library(glue)
# Replace xxxx with your values
project = "xxxx"
dataset = "xxxx"
test_id = round(as.numeric(Sys.time()))
cars_table = glue("cars_test_{test_id}")
con = dbConnect(
bigquery(),
project = project,
dataset = dataset,
billing = project
)
# Create table in database
cars_df = as.data.frame(mtcars)
dbWriteTable(con, cars_table, cars_df)
# Verify it is there
cars_qu = tbl(con, cars_table)
colnames(cars_qu)
#> [1] "am" "carb" "vs" "qsec" "wt" "drat" "disp" "hp" "cyl" "gear"
#> [11] "mpg"
# Make a new column and write to the table
cars_new_col_qu = cars_qu |>
mutate(new_col = cyl / mpg)
cars_new_col_qu_string = sql_render(cars_new_col_qu)
dbExecute(con, glue("create or replace table {cars_table} as {cars_new_col_qu_string}"))
#> [1] 0
# Query the table with the new column (this is the same query as cars_qu)
cars_new_col_qu2 = tbl(con, cars_table)
# Doesn't show new_col
colnames(cars_new_col_qu2)
#> [1] "am" "carb" "vs" "qsec" "wt" "drat" "disp" "hp" "cyl" "gear"
#> [11] "mpg"
# Gives an error because new_col isn't in the schema
cars_new_col_qu3 = cars_new_col_qu2 |>
select(mpg, cyl, new_col)
#> Error in `select()`:
#> ! Can't subset columns that don't exist.
#> ✖ Column `new_col` doesn't exist.
# But it does exist which we can see by querying the table directly
cars_table_schema = dbColumnInfo(dbSendQuery(con, glue("select * from `{project}.{dataset}.{cars_table}` limit 0")))
print(cars_table_schema)
#> name type
#> 1 am INTEGER
#> 2 carb INTEGER
#> 3 vs INTEGER
#> 4 qsec FLOAT
#> 5 wt FLOAT
#> 6 drat FLOAT
#> 7 disp FLOAT
#> 8 hp INTEGER
#> 9 cyl INTEGER
#> 10 gear INTEGER
#> 11 mpg FLOAT
#> 12 new_col FLOAT
# Clean up
dbExecute(con, glue("drop table {cars_table}"))
#> [1] 0Metadata
Metadata
Assignees
Labels
No labels