-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
snowflake: runtime driver config checks #857
base: main
Are you sure you want to change the base?
Conversation
@atheriel can you take a look at this too please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A first pass through! I'm really happy we're aiming to resolve this issue for Snowflake users, too.
I said this in PM as well but wanted to reiterate once more here in case other folks support (feel free to ignore and I won't bug you any longer😜): I think the same issue should lead to the same outcome across drivers, and I'd argue that we ought to just edit the file as we do with Databricks. Especially given that this PR introduces odbc.no_config_override
(on board!), I'm fine with the more invasive approach given that it allows for this connection to "just work" for users. No longer needing to manage the different behavior for difference DBMS', I think, would greatly simplify this PR as well.
R/utils.R
Outdated
# performs. | ||
# 3. If action == "modify" then we attempt to modify the config in-situ. | ||
# 4. Otherwise we throw a warning asking the user to revise. | ||
configure_simba <- function(locate_config_callback = (function() character()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
configure_simba <- function(locate_config_callback = (function() character()), | |
configure_simba <- function(locate_config_callback = character(), |
Poking at this for a bit, it's unclear to me why this is a function rather than a character string? In this function, we just call the inputted function which is inlined as returning a character everywhere it's currently used, and then use that character result as usual. Maybe this function just takes the config file as a string and we pass e.g. snowflake_simba_config(args$driver)
rather than function() snowflake_simba_config(args$driver)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Simon:
Wrote it as a callback because in may cases ( not macos, or no_config_override
is set ) it may not get executed at all. Can move those checks outside of the method, but it felt like they belong within ( and they would need to be duplicated for each call ).
Let me know if you think it's worth trying to chase this down further / refactor more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha! Since those callbacks take some time (and could even raise some unanticipated error), you opt to wait to evaluate the code until you're sure you need its result.
We're actually lucky here in that R will evaluate function arguments lazily. So, if you pass a function call as an argument to f()
, do some checks, and then reference the result of the function call after the checks, the function call will never actually evaluate if the check returns beforehand. e.g.:
fn_a <- function() {
cli::cli_abort("Stoppp!")
}
fn_b <- function(x) {
if (TRUE) {
cli::cli_abort("{.fun fn_a} hasn't evaluated yet!")
}
x
}
fn_b(x = fn_a())
#> Error in `fn_b()`:
#> ! `fn_a()` hasn't evaluated yet!
Created on 2024-11-12 with reprex v2.1.1
...the implication here being that we can pass snowflake_simba_config(args$driver)
directly as the function argument and it won't actually run until those Not macOS or No no_config_override
checks have evaluated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think efforts to make drivers "just work" are worthwhile for sure, so in that sense I'm supportive. However, it's a little unclear to me from the code whether this only applies to macOS users or if it's designed to help Linux users, too.
If the former, I think the code should be more explicit about that. If the latter, I think the Linux paths are missing from spark_simba_config()
and snowflake_simba_config()
.
Thanks Simon - appreciate you taking a look. Apologies for misinterpreting your earlier note on whether to modify in place or just warn. I changed I did leave the |
Hey @atheriel - appreciate the feedback. It's the former - at least for now. Most of these issues are coming from the fact that vendors seem to be distributing MacOS drivers that, out-of-the-box, are configured to work when paired with iODBC, rather than unixODBC. Our package, can't switch between At some point, wouldn't mind doing some investigative work to determine what it would take to be able to adjust our posture relative to the driver managers more dynamically.
Let me know if you had something else in mind. |
lines[matching_lines_loc] <- replacement | ||
} | ||
lines | ||
return(list("new_lines" = lines, "modified" = !found_ok)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return(list("new_lines" = lines, "modified" = !found_ok)) | |
return(list(new_lines = lines, modified = !found_ok)) |
matching_lines <- lines[matching_lines_loc] | ||
found_ok = length(matching_lines) != 0 && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
found_ok = length(matching_lines) != 0 && | |
found_ok <- length(matching_lines) != 0 && |
cli::cli_warn(c( | ||
i = "Detected potentially unsafe driver settings. | ||
Please consider revising the {.arg ODBCInstLib} field in | ||
{simba_config} and setting its value to {unixodbc_install}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{simba_config} and setting its value to {unixodbc_install}" | |
{.file {simba_config}} and setting its value to {unixodbc_install}." |
To make the filename a clickable link in RStudio/Positron!
# driver argument could be an outright path, or a name | ||
# of a driver specified in odbcinst.ini Try to discern | ||
driver_spec <- subset(odbcListDrivers(), name == driver) | ||
if (nrow(driver_spec) ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (nrow(driver_spec) ) { | |
if (nrow(driver_spec)) { |
Warning: | ||
i Detected potentially unsafe driver settings. Please consider revising the `ODBCInstLib` field in simba.sparkodbc.ini and setting its value to libodbcinst.dylib | ||
Warning: | ||
i Detected potentially unsafe driver settings. Please consider revising the `DriverManagerEncoding` field in simba.sparkodbc.ini and setting its value to 'UTF-16' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this warning is duplicated in practice?
R/utils.R
Outdated
# performs. | ||
# 3. If action == "modify" then we attempt to modify the config in-situ. | ||
# 4. Otherwise we throw a warning asking the user to revise. | ||
configure_simba <- function(locate_config_callback = (function() character()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha! Since those callbacks take some time (and could even raise some unanticipated error), you opt to wait to evaluate the code until you're sure you need its result.
We're actually lucky here in that R will evaluate function arguments lazily. So, if you pass a function call as an argument to f()
, do some checks, and then reference the result of the function call after the checks, the function call will never actually evaluate if the check returns beforehand. e.g.:
fn_a <- function() {
cli::cli_abort("Stoppp!")
}
fn_b <- function(x) {
if (TRUE) {
cli::cli_abort("{.fun fn_a} hasn't evaluated yet!")
}
x
}
fn_b(x = fn_a())
#> Error in `fn_b()`:
#> ! `fn_a()` hasn't evaluated yet!
Created on 2024-11-12 with reprex v2.1.1
...the implication here being that we can pass snowflake_simba_config(args$driver)
directly as the function argument and it won't actually run until those Not macOS or No no_config_override
checks have evaluated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! Leaving review as Comment one more time just to get eyes on the callback revision before sending this one in. Thanks for making this happen!
Hi @simonpcouch:
Following up on a conversation we had a few weeks ago where I noticed that the simba
snowflake
drivers on macOS suffer from similar issues as the ones you ran into withdatabricks
. In particular, and this is one that bit me as well, the simba configuration file (.ini) needs to specifyDriverManagerEncoding=UTF-16
. The OEM SNOWFLAKE driver, for example, looks to be configured foriODBC
, and has a different value that doesn't work for us.In this PR I expanded some of the checks you had written for
databricks
tosnowflake
as well, with the following changes:snowflake
code path throw a warning. Left thedatabricks
default behavior unchanged.option
to disable to config check altogether, for both.The last bullet in particular, I think is why the diff ended up being (much) larger than I originally would have anticipated.
Rather than duplicating methods, tried to make some of the methods "simba" generic ( tried to keep those in
utils.R
), while others that were specific one of the two back ends I moved todriver-*.R
. We are close to perhaps needing to make some of those specific methods S3 or S4, but at this point IMO, that seems like a bit of an overkill.Cheers.
TODO: