Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement write table methods #19

Open
palmaresk8 opened this issue Jun 20, 2018 · 1 comment
Open

Implement write table methods #19

palmaresk8 opened this issue Jun 20, 2018 · 1 comment

Comments

@palmaresk8
Copy link

palmaresk8 commented Jun 20, 2018

It would be nice if the seargeant package could create new tables as I can do with other database drivers. Something like this:

library(sergeant)
library(dplyr)

conn = dbConnect(Drill())

# Load original data
original_data = tbl(conn, "`dfs.downloads`.`original_data `")

# Do some operation
edited_data = original_data %>% 
  filter(name = "John")

# Write the result to a table without loading it into R
dbWriteTable(conn, table = "`dfs.downloads`.`edited_data`", values = edited_data )
@hrbrmstr
Copy link
Owner

Agreed, but even the ODBC interface has issues with this. i.e. just try doing:

library(DBI)
library(odbc)
library(tidyverse)

DBI::dbConnect(
  odbc::odbc(), 
  driver = "/Library/mapr/drill/lib/libdrillodbc_sbu.dylib",
  ConnectionType = "Zookeeper",
  AuthenticationType = "No Authentication",
  ZKCLusterID = "drillbits1",
  ZkQuorum = "localhost:2181",
  AdvancedProperties = "CastAnyToVarchar=true;HandshakeTimeout=30;QueryTimeout=180;TimestampTZDisplayTimezone=utc;
  ExcludedSchemas=sys,INFORMATION_SCHEMA;NumberOfPrefetchBuffers=5;"
) -> drill_con

dbCreateTable(drill_con, "dfs.tmp.mtcars", mtcars)

One reason for the ODBC interface error is the syntax Drill needs.

One of the only ways I initially went down a path on was to save off a data frame as JSON then use CREATE TABLE x AS SELECT cols FROM thejsonfile but that wld assume Drill is running on the local host. Trying to generalize that (e.g. having an option to automatically scp the JSON to a remote Drill server, having an option to point the CTAS call to a specific path assuming it was, say an NFS share, etc) would require doing quite a bit of condition handling and hasn't made it back to the priority list.

I also went down a path of using CTAS and then adding VALUES() but the REST interface rly does not like it when you do that with a ton of columns and values.

I'll keep the issue open tho in the event I get some time or others find it and have other ideas on how to make it work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants