Test #1

evb123 · 2024-01-26T11:00:02Z

TEst

Signed-off-by: Jesse Whitehouse <[email protected]>

Signed-off-by: Matthew Kim <[email protected]>

Signed-off-by: Jesse Whitehouse <[email protected]>

--------- Signed-off-by: Jesse Whitehouse <[email protected]>

Signed-off-by: Jesse Whitehouse <[email protected]>

## Summary Support OAuth flow for Databricks Azure ## Background Some OAuth endpoints (e.g. Open ID Configuration) and scopes are different between Databricks Azure and AWS. Current code only supports OAuth flow on Databricks in AWS ## What changes are proposed in this pull request? - Change `OAuthManager` to decouple Databricks AWS specific configuration from OAuth flow - Add `sql/auth/endpoint.py` that implements cloud specific OAuth endpoint configuration - Change `DatabricksOAuthProvider` to work with the OAuth configurations in different Databricks cloud (AWS, Azure) - Add the corresponding unit tests

--------- Signed-off-by: Jesse Whitehouse <[email protected]>

Signed-off-by: Jesse Whitehouse <[email protected]>

* Cloud Fetch download handler Signed-off-by: Matthew Kim <[email protected]> * Issue fix: final result link compressed data has multiple LZ4 end-of-frame markers Signed-off-by: Matthew Kim <[email protected]> * Addressing PR comments - Linting - Type annotations - Use response.ok - Log exception - Remove semaphore and only use threading.event - reset() flags method - Fix tests after removing semaphore - Link expiry logic should be in secs - Decompress data static function - link_expiry_buffer and static public methods - Docstrings and comments Signed-off-by: Matthew Kim <[email protected]> * Changing logger.debug to remove url Signed-off-by: Matthew Kim <[email protected]> * _reset() comment to docstring Signed-off-by: Matthew Kim <[email protected]> * link_expiry_buffer -> link_expiry_buffer_secs Signed-off-by: Matthew Kim <[email protected]> --------- Signed-off-by: Matthew Kim <[email protected]>

* Cloud Fetch download manager Signed-off-by: Matthew Kim <[email protected]> * Bug fix: submit handler.run Signed-off-by: Matthew Kim <[email protected]> * Type annotations Signed-off-by: Matthew Kim <[email protected]> * Namedtuple -> dataclass Signed-off-by: Matthew Kim <[email protected]> * Shutdown thread pool and clear handlers Signed-off-by: Matthew Kim <[email protected]> * Docstrings and comments Signed-off-by: Matthew Kim <[email protected]> * handler.run is the correct call Signed-off-by: Matthew Kim <[email protected]> * Link expiry buffer in secs Signed-off-by: Matthew Kim <[email protected]> * Adding type annotations for download_handlers and downloadable_result_settings Signed-off-by: Matthew Kim <[email protected]> * Move DownloadableResultSettings to downloader.py to avoid circular import Signed-off-by: Matthew Kim <[email protected]> * Black linting Signed-off-by: Matthew Kim <[email protected]> * Timeout is never None Signed-off-by: Matthew Kim <[email protected]> --------- Signed-off-by: Matthew Kim <[email protected]>

* Cloud fetch queue and integration Signed-off-by: Matthew Kim <[email protected]> * Enable cloudfetch with direct results Signed-off-by: Matthew Kim <[email protected]> * Typing and style changes Signed-off-by: Matthew Kim <[email protected]> * Client-settable max_download_threads Signed-off-by: Matthew Kim <[email protected]> * Docstrings and comments Signed-off-by: Matthew Kim <[email protected]> * Increase default buffer size bytes to 104857600 Signed-off-by: Matthew Kim <[email protected]> * Move max_download_threads to kwargs of ThriftBackend, fix unit tests Signed-off-by: Matthew Kim <[email protected]> * Fix tests: staticmethod make_arrow_table mock not callable Signed-off-by: Matthew Kim <[email protected]> * cancel_futures in shutdown() only available in python >=3.9.0 Signed-off-by: Matthew Kim <[email protected]> * Black linting Signed-off-by: Matthew Kim <[email protected]> * Fix typing errors Signed-off-by: Matthew Kim <[email protected]> --------- Signed-off-by: Matthew Kim <[email protected]>

* Cloud Fetch e2e tests Signed-off-by: Matthew Kim <[email protected]> * Test case works for e2-dogfood shared unity catalog Signed-off-by: Matthew Kim <[email protected]> * Moving test to LargeQueriesSuite and setting catalog to hive_metastore Signed-off-by: Matthew Kim <[email protected]> * Align default value of buffer_size_bytes in driver tests Signed-off-by: Matthew Kim <[email protected]> * Adding comment to specify what's needed to run successfully Signed-off-by: Matthew Kim <[email protected]> --------- Signed-off-by: Matthew Kim <[email protected]>

Signed-off-by: Matthew Kim <[email protected]>

Signed-off-by: Jesse Whitehouse <[email protected]>

Signed-off-by: Sebastian Eckweiler <[email protected]> Signed-off-by: Jesse Whitehouse <[email protected]> Co-authored-by: Sebastian Eckweiler <[email protected]> Co-authored-by: Jesse Whitehouse <[email protected]>

Signed-off-by: Daniel Segesdi <[email protected]> Signed-off-by: Jesse Whitehouse <[email protected]> Co-authored-by: Jesse Whitehouse <[email protected]>

--------- Signed-off-by: Jesse Whitehouse <[email protected]>

Signed-off-by: Jesse Whitehouse <[email protected]> Co-authored-by: Jesse Whitehouse <[email protected]>

--------- Signed-off-by: Bogdan Kyryliuk <[email protected]> Signed-off-by: Jesse Whitehouse <[email protected]> Co-authored-by: Jesse Whitehouse <[email protected]>

Signed-off-by: William Gentry <[email protected]> Signed-off-by: Jesse Whitehouse <[email protected]> Co-authored-by: Jesse Whitehouse <[email protected]>

Signed-off-by: Jesse Whitehouse <[email protected]>

--------- Co-authored-by: Jesse <[email protected]>

Resolves #187 Signed-off-by: Jesse Whitehouse <[email protected]>

Behaviour is gated behind `enable_v3_retries` config. This will be removed and become the default behaviour in a subsequent release. Signed-off-by: Jesse Whitehouse <[email protected]>

* move py.typed to correct places https://peps.python.org/pep-0561/ says 'For namespace packages (see PEP 420), the py.typed file should be in the submodules of the namespace, to avoid conflicts and for clarity.'. Previously, when I added the py.typed file to this project, #382 , I was unaware this was a namespace package (although, curiously, it seems I had done it right initially and then changed to the wrong way). As PEP 561 warns us, this does create conflicts; other libraries in the databricks namespace package (such as, in my case, databricks-vectorsearch) are then treated as though they are typed, which they are not. This commit moves the py.typed file to the correct places, the submodule folders, fixing that problem. Signed-off-by: wyattscarpenter <[email protected]> * change target of mypy to src/databricks instead of src. I think this might fix the CI code-quality checks failure, but unfortunately I can't replicate that failure locally and the error message is unhelpful Signed-off-by: wyattscarpenter <[email protected]> * Possible workaround for bad error message 'error: --install-types failed (no mypy cache directory)'; see python/mypy#10768 (comment) Signed-off-by: wyattscarpenter <[email protected]> * fix invalid yaml syntax Signed-off-by: wyattscarpenter <[email protected]> * Best fix (#3) Fixes the problem by cding and supplying a flag to mypy (that mypy needs this flag is seemingly fixed/changed in later versions of mypy; but that's another pr altogether...). Also fixes a type error that was somehow in the arguments of the program (?!) (I guess this is because you guys are still using implicit optional) --------- Signed-off-by: wyattscarpenter <[email protected]> * return the old result_links default (#5) Return the old result_links default, make the type optional, & I'm pretty sure the original problem is that add_file_links can't take a None, so these statements should be in the body of the if-statement that ensures it is not None Signed-off-by: wyattscarpenter <[email protected]> * Update src/databricks/sql/utils.py "self.download_manager is unconditionally used later, so must be created. Looks this part of code is totally not covered with tests 🤔" Co-authored-by: Levko Kravets <[email protected]> Signed-off-by: wyattscarpenter <[email protected]> --------- Signed-off-by: wyattscarpenter <[email protected]> Co-authored-by: Levko Kravets <[email protected]>

* Upgrade mypy This commit removes the flag (and cd step) from f53aa37 which we added to get mypy to treat namespaces correctly. This was apparently a bug in mypy, or behavior they decided to change. To get the new behavior, we must upgrade mypy. (This also allows us to remove a couple `# type: ignore` comment that are no longer needed.) This commit runs changes the version of mypy and runs `poetry lock`. It also conforms the whitespace of files in this project to the expectations of various tools and standard (namely: removing trailing whitespace as expected by git and enforcing the existence of one and only one newline at the end of a file as expected by unix and github.) It also uses https://github.com/hauntsaninja/no_implicit_optional to automatically upgrade codebase due to a change in mypy behavior. For a similar reason, it also fixes a new type (or otherwise) errors: * "Return type 'Retry' of 'new' incompatible with return type 'DatabricksRetryPolicy' in supertype 'Retry'" * databricks/sql/auth/retry.py:225: error: object has no attribute update [attr-defined] * /test_param_escaper.py:31: DeprecationWarning: invalid escape sequence \) [as it happens, I think it was also wrong for the string not to be raw, because I'm pretty sure it wants all of its backslashed single-quotes to appear literally with the backslashes, which wasn't happening until now] * ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject [this is like a numpy version thing, which I fixed by being stricter about numpy version] --------- Signed-off-by: wyattscarpenter <[email protected]> * Incorporate suggestion. I decided the most expedient way of dealing with this type error was just adding the type ignore comment back in, but with a `[attr-defined]` specifier this time. I mean, otherwise I would have to restructure the code or figure out the proper types for a TypedDict for the dict and I don't think that's worth it at the moment. Signed-off-by: wyattscarpenter <[email protected]> --------- Signed-off-by: wyattscarpenter <[email protected]>

- Raises NonRecoverableNetworkError when request results in 401 status code Signed-off-by: Tor Hødnebø <[email protected]> Signed-off-by: Tor Hødnebø <[email protected]>

Signed-off-by: Jacky Hu <[email protected]>

…#405) * [PECO-1751] Refactor CloudFetch downloader: handle files sequentially; utilize Futures Signed-off-by: Levko Kravets <[email protected]> * Retry failed CloudFetch downloads Signed-off-by: Levko Kravets <[email protected]> * Update tests Signed-off-by: Levko Kravets <[email protected]> --------- Signed-off-by: Levko Kravets <[email protected]>

…ons we support (#412) Signed-off-by: Levko Kravets <[email protected]>

* Disable SSL verification for CloudFetch links Signed-off-by: Levko Kravets <[email protected]> * Use existing `_tls_no_verify` option in CloudFetch downloader Signed-off-by: Levko Kravets <[email protected]> * Update tests Signed-off-by: Levko Kravets <[email protected]> --------- Signed-off-by: Levko Kravets <[email protected]>

@arikfr

* Prepare relese 3.3.0 Signed-off-by: Levko Kravets <[email protected]> * Remove @arikfr from CODEOWNERS Signed-off-by: Levko Kravets <[email protected]> --------- Signed-off-by: Levko Kravets <[email protected]>

* Support pandas 2.2.2 See release note numpy 2.2.2: https://pandas.pydata.org/docs/dev/whatsnew/v2.2.0.html#to-numpy-for-numpy-nullable-and-arrow-types-converts-to-suitable-numpy-dtype * Allow pandas 2.2.2 in pyproject.toml * Update poetry.lock, poetry lock --no-update * Code style Signed-off-by: Levko Kravets <[email protected]> --------- Signed-off-by: Levko Kravets <[email protected]> Co-authored-by: Levko Kravets <[email protected]>

…ion setting is provided (#419) * [PECO-1801] Make OAuth as the default authenticator if no authentication setting is provided Signed-off-by: Jacky Hu <[email protected]>

* [PECO-1857] Use SSL options with HTTPS connection pool Signed-off-by: Levko Kravets <[email protected]> * Some cleanup Signed-off-by: Levko Kravets <[email protected]> * Resolve circular dependencies Signed-off-by: Levko Kravets <[email protected]> * Update existing tests Signed-off-by: Levko Kravets <[email protected]> * Fix MyPy issues Signed-off-by: Levko Kravets <[email protected]> * Fix `_tls_no_verify` handling Signed-off-by: Levko Kravets <[email protected]> * Add tests Signed-off-by: Levko Kravets <[email protected]> --------- Signed-off-by: Levko Kravets <[email protected]>

Prepare release 3.4.0 Signed-off-by: Levko Kravets <[email protected]>

… column set (#440) * Implemented the columnar flow for non arrow users * Minor fixes * Introduced the Column Table structure * Added test for the new column table * Minor fix * Removed unnecessory fikes

…rmation in error (#447) * added error info on non-retryable error

Reformatted the files using black

Prepare release 3.5.0 Signed-off-by: Jacky Hu <[email protected]>

Signed-off-by: Jacky Hu <[email protected]>

…odejs drivers (#467) * Added the exponential backoff code * Added the exponential backoff algorithm and refractored the code * Added jitter and added unit tests * Reformatted * Fixed the test_retry_exponential_backoff integration test

…#463) * Built the basic flow for the async pipeline - testing is remaining * Implemented the flow for the get_execution_result, but the problem of invalid operation handle still persists * Missed adding some files in previous commit * Working prototype of execute_async, get_query_state and get_execution_result * Added integration tests for execute_async * add docs for functions * Refractored the async code * Fixed java doc * Reformatted

Fixed the chekc_types failing

* Remove upper caps on numpy and pyarrow versions

…supported from >=3.x connector (#477) Added doc update

* Raised error when incorrect Row offset it returned * Changed error type * grammar fix * Added unit tests and modified the code * Updated error message * Updated the non retying to only inline case * Updated fix * Changed the flow * Minor update * Updated the retryable condition * Minor test fix * Added extra space

* bumped up version * Updated to version 3.7.0 * Grammar fix * Minor fix

* Modified the gitignore file to not have .idea file * [PECO-1803] Splitting the PySql connector into the core and the non core part (#417) * Implemented ColumnQueue to test the fetchall without pyarrow Removed token removed token * order of fields in row corrected * Changed the folder structure and tested the basic setup to work * Refractored the code to make connector to work * Basic Setup of connector, core and sqlalchemy is working * Basic integration of core, connect and sqlalchemy is working * Setup working dynamic change from ColumnQueue to ArrowQueue * Refractored the test code and moved to respective folders * Added the unit test for column_queue Fixed __version__ Fix * venv_main added to git ignore * Added code for merging columnar table * Merging code for columnar * Fixed the retry_close sesssion test issue with logging * Fixed the databricks_sqlalchemy tests and introduced pytest.ini for the sqla_testing * Added pyarrow_test mark on pytest * Fixed databricks.sqlalchemy to databricks_sqlalchemy imports * Added poetry.lock * Added dist folder * Changed the pyproject.toml * Minor Fix * Added the pyarrow skip tag on unit tests and tested their working * Fixed the Decimal and timestamp conversion issue in non arrow pipeline * Removed not required files and reformatted * Fixed test_retry error * Changed the folder structure to src / databricks * Removed the columnar non arrow flow to another PR * Moved the README to the root * removed columnQueue instance * Revmoved databricks_sqlalchemy dependency in core * Changed the pysql_supports_arrow predicate, introduced changes in the pyproject.toml * Ran the black formatter with the original version * Extra .py removed from all the __init__.py files names * Undo formatting check * Check * Check * Check * Check * Check * Check * Check * Check * Check * Check * Check * Check * Check * Check * BIG UPDATE * Refeactor code * Refractor * Fixed versioning * Minor refractoring * Minor refractoring * Changed the folder structure such that sqlalchemy has not reference here * Fixed README.md and CONTRIBUTING.md * Added manual publish * On push trigger added * Manually setting the publish step * Changed versioning in pyproject.toml * Bumped up the version to 4.0.0.b3 and also changed the structure to have pyarrow as optional * Removed the sqlalchemy tests from integration.yml file * [PECO-1803] Print warning message if pyarrow is not installed (#468) Print warning message if pyarrow is not installed Signed-off-by: Jacky Hu <[email protected]> * [PECO-1803] Remove sqlalchemy and update README.md (#469) Remove sqlalchemy and update README.md Signed-off-by: Jacky Hu <[email protected]> * Removed all sqlalchemy related stuff * generated the lock file * Fixed failing tests * removed poetry.lock * Updated the lock file * Fixed poetry numpy 2.2.2 issue * Workflow fixes --------- Signed-off-by: Jacky Hu <[email protected]> Co-authored-by: Jacky Hu <[email protected]>

* Removed python3.8 support * Minor fix

Support for Py till 3.12

* Increased the number of retry attempts allowed (#486) Updated the number of attempts allowed * bump version to 3.7.1 (#487) bumped up version * Refractore * Minor change

bumped up the version

Jesse and others added 30 commits June 7, 2023 14:02

Use urllib3 for thrift transport + reuse http connections (#131)

5a3f83e

Signed-off-by: Jesse Whitehouse <[email protected]>

Default socket timeout to 15 min (#137)

9ef50e8

Signed-off-by: Matthew Kim <[email protected]>

Bump version to 2.6.0 (#139)

dfabbdd

Signed-off-by: Jesse Whitehouse <[email protected]>

Fix: some thrift RPCs failed with BadStatusLine (#141)

3d359bc

--------- Signed-off-by: Jesse Whitehouse <[email protected]>

Bump version to 2.6.1 (#142)

5379803

Signed-off-by: Jesse Whitehouse <[email protected]>

[ES-706907] Retry GetOperationStatus for http errors (#145)

8698039

Signed-off-by: Jesse Whitehouse <[email protected]>

Bump version to 2.6.2 (#147)

bbe539e

Signed-off-by: Jesse Whitehouse <[email protected]>

Use a separate logger for unsafe thrift responses (#153)

7fcfa7b

--------- Signed-off-by: Jesse Whitehouse <[email protected]>

Improve e2e test development ergonomics (#155)

fecfa88

--------- Signed-off-by: Jesse Whitehouse <[email protected]>

Don't raise exception when closing a stale Thrift session (#159)

8d70f6c

Signed-off-by: Jesse Whitehouse <[email protected]>

Bump to version 2.7.0 (#161)

c351b57

Signed-off-by: Jesse Whitehouse <[email protected]>

Update changelog for cloudfetch (#172)

0e5c244

Signed-off-by: Matthew Kim <[email protected]>

Improve sqlalchemy backward compatibility with 1.3.24 (#173)

f45280d

Signed-off-by: Jesse Whitehouse <[email protected]>

OAuth: don't override auth headers with contents of .netrc file (#122)

7382631

Signed-off-by: Jesse Whitehouse <[email protected]>

Fix proxy connection pool creation (#158)

1965df5

Signed-off-by: Sebastian Eckweiler <[email protected]> Signed-off-by: Jesse Whitehouse <[email protected]> Co-authored-by: Sebastian Eckweiler <[email protected]> Co-authored-by: Jesse Whitehouse <[email protected]>

Relax pandas dependency constraint to allow ^2.0.0 (#164)

d7f76e4

Signed-off-by: Daniel Segesdi <[email protected]> Signed-off-by: Jesse Whitehouse <[email protected]> Co-authored-by: Jesse Whitehouse <[email protected]>

Use hex string version of operation ID instead of bytes (#170)

207dd7c

--------- Signed-off-by: Jesse Whitehouse <[email protected]>

SQLAlchemy: fix has_table so it honours schema= argument (#174)

22e5aaa

--------- Signed-off-by: Jesse Whitehouse <[email protected]>

Fix socket timeout test (#144)

1eef432

Signed-off-by: Jesse Whitehouse <[email protected]> Co-authored-by: Jesse Whitehouse <[email protected]>

Disable non_native_boolean_check_constraint (#120)

ec58144

--------- Signed-off-by: Bogdan Kyryliuk <[email protected]> Signed-off-by: Jesse Whitehouse <[email protected]> Co-authored-by: Jesse Whitehouse <[email protected]>

Remove unused import for SQLAlchemy 2 compatibility (#128)

728d33a

Signed-off-by: William Gentry <[email protected]> Signed-off-by: Jesse Whitehouse <[email protected]> Co-authored-by: Jesse Whitehouse <[email protected]>

Bump version to 2.8.0 (#178)

6a1d3b5

Signed-off-by: Jesse Whitehouse <[email protected]>

Fix typo in python README quick start example (#186)

b894605

--------- Co-authored-by: Jesse <[email protected]>

Configure autospec for mocked Client objects (#188)

00a3928

Resolves #187 Signed-off-by: Jesse Whitehouse <[email protected]>

Use urllib3 for retries (#182)

019acd8

Behaviour is gated behind `enable_v3_retries` config. This will be removed and become the default behaviour in a subsequent release. Signed-off-by: Jesse Whitehouse <[email protected]>

wyattscarpenter and others added 30 commits July 2, 2024 14:09

Do not retry failing requests with status code 401 (#408)

5d869d0

- Raises NonRecoverableNetworkError when request results in 401 status code Signed-off-by: Tor Hødnebø <[email protected]> Signed-off-by: Tor Hødnebø <[email protected]>

[PECO-1715] Remove username/password (BasicAuth) auth option (#409)

9f9e96d

Signed-off-by: Jacky Hu <[email protected]>

Fix CloudFetch retry policy to be compatible with all urllib3 versi…

134b21d

…ons we support (#412) Signed-off-by: Levko Kravets <[email protected]>

Prepare relese 3.3.0 (#415)

b438c38

* Prepare relese 3.3.0 Signed-off-by: Levko Kravets <[email protected]> * Remove @arikfr from CODEOWNERS Signed-off-by: Levko Kravets <[email protected]> --------- Signed-off-by: Levko Kravets <[email protected]>

[PECO-1801] Make OAuth as the default authenticator if no authenticat…

2d2b3c1

…ion setting is provided (#419) * [PECO-1801] Make OAuth as the default authenticator if no authentication setting is provided Signed-off-by: Jacky Hu <[email protected]>

Prepare release v3.4.0 (#430)

d31063c

Prepare release 3.4.0 Signed-off-by: Levko Kravets <[email protected]>

[PECO-1926] Create a non pyarrow flow to handle small results for the…

a151df2

… column set (#440) * Implemented the columnar flow for non arrow users * Minor fixes * Introduced the Column Table structure * Added test for the new column table * Minor fix * Removed unnecessory fikes

[PECO-1961] On non-retryable error, ensure PySQL includes useful info…

08f14a0

…rmation in error (#447) * added error info on non-retryable error

Reformatted all the files using black (#448)

97c815e

Reformatted the files using black

Prepare release v3.5.0 (#457)

55105fe

Prepare release 3.5.0 Signed-off-by: Jacky Hu <[email protected]>

[PECO-2051] Add custom auth headers into cloud fetch request (#460)

d3cb62c

Signed-off-by: Jacky Hu <[email protected]>

Prepare release 3.6.0 (#461)

ecdddba

Signed-off-by: Jacky Hu <[email protected]>

Fix for check_types github action failing (#472)

980af88

Fixed the chekc_types failing

Remove upper caps on dependencies (#452)

d690516

* Remove upper caps on numpy and pyarrow versions

Updated the doc to specify native parameters in PUT operation is not …

680b3b6

…supported from >=3.x connector (#477) Added doc update

Bumped up to version 3.7.0 (#482)

f9d6ef1

* bumped up version * Updated to version 3.7.0 * Grammar fix * Minor fix

Removed CI CD for python3.8 (#490)

b6433fc

* Removed python3.8 support * Minor fix

Added CI CD upto python 3.12 (#491)

1d3d8d7

Support for Py till 3.12

Merging changes from v3.7.1 release (#488)

cfdcab7

* Increased the number of retry attempts allowed (#486) Updated the number of attempts allowed * bump version to 3.7.1 (#487) bumped up version * Refractore * Minor change

Bumped up to version 4.0.0 (#493)

3d0db70

bumped up the version

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test #1

Test #1

evb123 commented Jan 26, 2024

Test #1

Are you sure you want to change the base?

Test #1

Conversation

evb123 commented Jan 26, 2024