Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebase to main package #3

Merged
merged 41 commits into from
Jan 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
5a3f83e
Use urllib3 for thrift transport + reuse http connections (#131)
Jun 7, 2023
9ef50e8
Default socket timeout to 15 min (#137)
mattdeekay Jun 7, 2023
dfabbdd
Bump version to 2.6.0 (#139)
Jun 7, 2023
3d359bc
Fix: some thrift RPCs failed with BadStatusLine (#141)
Jun 8, 2023
5379803
Bump version to 2.6.1 (#142)
Jun 8, 2023
8698039
[ES-706907] Retry GetOperationStatus for http errors (#145)
Jun 14, 2023
bbe539e
Bump version to 2.6.2 (#147)
Jun 14, 2023
54e3769
[PECO-626] Support OAuth flow for Databricks Azure (#86)
jackyhu-db Jun 20, 2023
7fcfa7b
Use a separate logger for unsafe thrift responses (#153)
Jun 23, 2023
fecfa88
Improve e2e test development ergonomics (#155)
Jun 23, 2023
8d70f6c
Don't raise exception when closing a stale Thrift session (#159)
Jun 26, 2023
c351b57
Bump to version 2.7.0 (#161)
Jun 26, 2023
64be9bc
Cloud Fetch download handler (#127)
mattdeekay Jun 27, 2023
01b7a8d
Cloud Fetch download manager (#146)
mattdeekay Jul 3, 2023
5a34a4a
Cloud fetch queue and integration (#151)
mattdeekay Jul 5, 2023
759401c
Cloud Fetch e2e tests (#154)
mattdeekay Jul 7, 2023
0e5c244
Update changelog for cloudfetch (#172)
mattdeekay Jul 10, 2023
f45280d
Improve sqlalchemy backward compatibility with 1.3.24 (#173)
Jul 11, 2023
7382631
OAuth: don't override auth headers with contents of .netrc file (#122)
Jul 12, 2023
1965df5
Fix proxy connection pool creation (#158)
sebbegg Jul 12, 2023
d7f76e4
Relax pandas dependency constraint to allow ^2.0.0 (#164)
itsdani Jul 12, 2023
207dd7c
Use hex string version of operation ID instead of bytes (#170)
Jul 12, 2023
22e5aaa
SQLAlchemy: fix has_table so it honours schema= argument (#174)
Jul 12, 2023
1eef432
Fix socket timeout test (#144)
mattdeekay Jul 12, 2023
ec58144
Disable non_native_boolean_check_constraint (#120)
bkyryliuk Jul 12, 2023
728d33a
Remove unused import for SQLAlchemy 2 compatibility (#128)
WilliamGentry Jul 12, 2023
6a1d3b5
Bump version to 2.8.0 (#178)
Jul 21, 2023
b894605
Fix typo in python README quick start example (#186)
dbarrundia-tiger Aug 9, 2023
00a3928
Configure autospec for mocked Client objects (#188)
Aug 9, 2023
019acd8
Use urllib3 for retries (#182)
Aug 9, 2023
af1aae7
Bump version to 2.9.0 (#189)
Aug 10, 2023
0d99fc7
Explicitly add urllib3 dependency (#191)
jacobus-herman Aug 10, 2023
7aaa014
Bump to 2.9.1 (#195)
Aug 11, 2023
d28a692
Make backwards compatible with urllib3~=1.0 (#197)
Aug 16, 2023
871294e
Convenience improvements to v3 retry logic (#199)
Aug 17, 2023
54a6102
Bump version to 2.9.2 (#201)
Aug 18, 2023
a072574
Github Actions Fix: poetry install fails for python 3.7 tests (#208)
Aug 24, 2023
a918f13
Make backwards compatible with urllib3~=1.0 [Follow up #197] (#206)
Aug 24, 2023
a737ef3
Bump version to 2.9.3 (#209)
Aug 24, 2023
fddc9f9
Add timeout hack to mitigate timeouts
capitancambio Jun 2, 2023
377e158
Merge branch 'main' into rebase-to-main-package
matt-fleming Jan 26, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
163 changes: 0 additions & 163 deletions .github/workflows/code-quality-checks.yml

This file was deleted.

5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -204,4 +204,7 @@ dist/
build/

# vs code stuff
.vscode
.vscode

# don't commit authentication info to source control
test.env
55 changes: 54 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,59 @@
# Release History

## 2.5.x (Unreleased)
## 2.9.4 (Unreleased)

## 2.9.3 (2023-08-24)

- Fix: Connections failed when urllib3~=1.0.0 is installed (#206)

## 2.9.2 (2023-08-17)

- Other: Add `examples/v3_retries_query_execute.py` (#199)
- Other: suppress log message when `_enable_v3_retries` is not `True` (#199)
- Other: make this connector backwards compatible with `urllib3>=1.0.0` (#197)

## 2.9.1 (2023-08-11)

- Other: Explicitly pin urllib3 to ^2.0.0 (#191)

## 2.9.0 (2023-08-10)

- Replace retry handling with DatabricksRetryPolicy. This is disabled by default. To enable, set `enable_v3_retries=True` when creating `databricks.sql.client` (#182)
- Other: Fix typo in README quick start example (#186)
- Other: Add autospec to Client mocks and tidy up `make_request` (#188)

## 2.8.0 (2023-07-21)

- Add support for Cloud Fetch. Disabled by default. Set `use_cloud_fetch=True` when building `databricks.sql.client` to enable it (#146, #151, #154)
- SQLAlchemy has_table function now honours schema= argument and adds catalog= argument (#174)
- SQLAlchemy set non_native_boolean_check_constraint False as it's not supported by Databricks (#120)
- Fix: Revised SQLAlchemy dialect and examples for compatibility with SQLAlchemy==1.3.x (#173)
- Fix: oauth would fail if expired credentials appeared in ~/.netrc (#122)
- Fix: Python HTTP proxies were broken after switch to urllib3 (#158)
- Other: remove unused import in SQLAlchemy dialect
- Other: Relax pandas dependency constraint to allow ^2.0.0 (#164)
- Other: Connector now logs operation handle guids as hexadecimal instead of bytes (#170)
- Other: test_socket_timeout_user_defined e2e test was broken (#144)

## 2.7.0 (2023-06-26)

- Fix: connector raised exception when calling close() on a closed Thrift session
- Improve e2e test development ergonomics
- Redact logged thrift responses by default
- Add support for OAuth on Databricks Azure

## 2.6.2 (2023-06-14)

- Fix: Retry GetOperationStatus requests for http errors

## 2.6.1 (2023-06-08)

- Fix: http.client would raise a BadStatusLine exception in some cases

## 2.6.0 (2023-06-07)

- Add support for HTTP 1.1 connections (connection pools)
- Add a default socket timeout for thrift RPCs

## 2.5.2 (2023-05-08)

Expand Down
11 changes: 11 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,17 @@ export http_path=""
export access_token=""
```

Or you can write these into a file called `test.env` in the root of the repository:

```
host="****.cloud.databricks.com"
http_path="/sql/1.0/warehouses/***"
access_token="dapi***"
staging_ingestion_user="***@example.com"
```

To see logging output from pytest while running tests, set `log_cli = "true"` under `tool.pytest.ini_options` in `pyproject.toml`. You can also set `log_cli_level` to any of the default Python log levels: `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`

There are several e2e test suites available:
- `PySQLCoreTestSuite`
- `PySQLLargeQueriesSuite`
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ from databricks import sql

host = os.getenv("DATABRICKS_HOST")
http_path = os.getenv("DATABRICKS_HTTP_PATH")
access_token = os.getenv("DATABRICKS_ACCESS_TOKEN")
access_token = os.getenv("DATABRICKS_TOKEN")

connection = sql.connect(
server_hostname=host,
Expand Down
3 changes: 2 additions & 1 deletion examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,4 +38,5 @@ To run all of these examples you can clone the entire repository to your disk. O
this example the string `ExamplePartnerTag` will be added to the the user agent on every request.
- **`staging_ingestion.py`** shows how the connector handles Databricks' experimental staging ingestion commands `GET`, `PUT`, and `REMOVE`.
- **`sqlalchemy.py`** shows a basic example of connecting to Databricks with [SQLAlchemy](https://www.sqlalchemy.org/).
- **`custom_cred_provider.py`** shows how to pass a custom credential provider to bypass connector authentication. Please install databricks-sdk prior to running this example.
- **`custom_cred_provider.py`** shows how to pass a custom credential provider to bypass connector authentication. Please install databricks-sdk prior to running this example.
- **`v3_retries_query_execute.py`** shows how to enable v3 retries in connector version 2.9.x including how to enable retries for non-default retry cases.
35 changes: 28 additions & 7 deletions examples/sqlalchemy.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,9 +42,15 @@
"""

import os
from sqlalchemy.orm import declarative_base, Session
import sqlalchemy
from sqlalchemy.orm import Session
from sqlalchemy import Column, String, Integer, BOOLEAN, create_engine, select

try:
from sqlalchemy.orm import declarative_base
except ImportError:
from sqlalchemy.ext.declarative import declarative_base

host = os.getenv("DATABRICKS_SERVER_HOSTNAME")
http_path = os.getenv("DATABRICKS_HTTP_PATH")
access_token = os.getenv("DATABRICKS_TOKEN")
Expand All @@ -59,10 +65,20 @@
"_user_agent_entry": "PySQL Example Script",
}

engine = create_engine(
f"databricks://token:{access_token}@{host}?http_path={http_path}&catalog={catalog}&schema={schema}",
connect_args=extra_connect_args,
)
if sqlalchemy.__version__.startswith("1.3"):
# SQLAlchemy 1.3.x fails to parse the http_path, catalog, and schema from our connection string
# Pass these in as connect_args instead

conn_string = f"databricks://token:{access_token}@{host}"
connect_args = dict(catalog=catalog, schema=schema, http_path=http_path)
all_connect_args = {**extra_connect_args, **connect_args}
engine = create_engine(conn_string, connect_args=all_connect_args)
else:
engine = create_engine(
f"databricks://token:{access_token}@{host}?http_path={http_path}&catalog={catalog}&schema={schema}",
connect_args=extra_connect_args,
)

session = Session(bind=engine)
base = declarative_base(bind=engine)

Expand All @@ -86,9 +102,14 @@ class SampleObject(base):

session.commit()

stmt = select(SampleObject).where(SampleObject.name.in_(["Bim Adewunmi", "Miki Meek"]))
# SQLAlchemy 1.3 has slightly different methods
if sqlalchemy.__version__.startswith("1.3"):
stmt = select([SampleObject]).where(SampleObject.name.in_(["Bim Adewunmi", "Miki Meek"]))
output = [i for i in session.execute(stmt)]
else:
stmt = select(SampleObject).where(SampleObject.name.in_(["Bim Adewunmi", "Miki Meek"]))
output = [i for i in session.scalars(stmt)]

output = [i for i in session.scalars(stmt)]
assert len(output) == 2

base.metadata.drop_all()
35 changes: 35 additions & 0 deletions examples/v3_retries_query_execute.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
from databricks import sql
import os

# Users of connector versions >= 2.9.0 and <= 3.0.0 can use the v3 retry behaviour by setting _enable_v3_retries=True
# This flag will be deprecated in databricks-sql-connector~=3.0.0 as it will become the default.
#
# The new retry behaviour is defined in src/databricks/sql/auth/retry.py
#
# The new retry behaviour allows users to force the connector to automatically retry requests that fail with codes
# that are not retried by default (in most cases only codes 429 and 503 are retried by default). Additional HTTP
# codes to retry are specified as a list passed to `_retry_dangerous_codes`.
#
# Note that, as implied in the name, doing this is *dangerous* and should not be configured in all usages.
# With the default behaviour, ExecuteStatement Thrift commands are only retried for codes 429 and 503 because
# we can be certain at run-time that the statement never reached Databricks compute. These codes are returned by
# the SQL gateway / load balancer. So there is no risk that retrying the request would result in a doubled
# (or tripled etc) command execution. These codes are always accompanied by a Retry-After header, which we honour.
#
# However, if your use-case emits idempotent queries such as SELECT statements, it can be helpful to retry
# for 502 (Bad Gateway) codes etc. In these cases, there is a possibility that the initial command _did_ reach
# Databricks compute and retrying it could result in additional executions. Retrying under these conditions uses
# an exponential back-off since a Retry-After header is not present.

with sql.connect(server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME"),
http_path = os.getenv("DATABRICKS_HTTP_PATH"),
access_token = os.getenv("DATABRICKS_TOKEN"),
_enable_v3_retries = True,
_retry_dangerous_codes=[502,400]) as connection:

with connection.cursor() as cursor:
cursor.execute("SELECT * FROM default.diamonds LIMIT 2")
result = cursor.fetchall()

for row in result:
print(row)
Loading
Loading