-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document EnableQueryResultDownload
for databricks()
#856
Comments
Are there any potential downsides to setting this by default? |
I couldn't find much about the setting, given that it's not even in Databricks' own documentation. I would be hesitant to override Databricks' default behavior though, since it also applies to |
It looks like I'd be tempted to try and catch + rethrow this error with a more informative message, telling users to set that parameter. There's not much in this error message that feels specific to this issue, but it does seem like |
I think throwing a more informative error and suggesting this setting would be great. You're right that with some good error message copying and googling, a relatively savvy user could figure it out, but I figure it wouldn't hurt to save folks that step. I'm thinking just from a user perspective, the expectation is that the success of query made via DBI/odbc won't depend on the number of rows returned, other than the possibility of running out of memory in R. The default for It's also weird that while Cloud Fetch is mentioned wrt Databricks ODBC in the Azure documentation:
There is no mention of the parameter that needs to be set in the connection string. |
We've pinged out internal databricks contacts, so we'll wait to hear back from them before we implement anything. |
@MCMaurer we don't recommend setting this by default as its a significant performance improvement for larger result sets. We'd love to help get to the bottom of the underlying cause for the data not appearing, which from the error, is likely networking related.
If you have a dedicated Databricks contact already, please reach out and submit a ticket, otherwise we can try arrange a way to get to bottom of it. |
Thanks all- just got contacted by our organization's Databricks rep. I'll update here if there's anything pertinent that comes up in our conversation. |
Without setting
EnableQueryResultDownload='0'
in the Databricks connection string, queries of > ~40k rows will return no data and a relatively cryptic error message:However, all it takes to solve this is adding
EnableQueryResultDownload='0'
to the connection string or DBI() call. Then all rows will be returned.This isn't found anywhere within Databricks' documentation, and figuring it out involves a fair bit of Stack Overflow sleuthing. It's also very different behavior compared to many other DBI connections, which could lead to extra confusion. I think it would be helpful to include this tip in the documentation for
odbc::databricks()
.The text was updated successfully, but these errors were encountered: