apache-airflow-providers-snowflake: 5.3.1 #38712
-
Apache Airflow Provider(s)snowflake Versions of Apache Airflow ProvidersUnmentioned version lock: Upgrading to latest Apache Airflow version2.8.4 Operating SystemN/A DeploymentOfficial Apache Airflow Helm Chart Deployment detailsNo response What happened
What you think should happen insteadNo response How to reproduceinstall:
run:
Anything elseNo response Are you willing to submit PR?
Code of Conduct
|
Beta Was this translation helpful? Give feedback.
Replies: 11 comments
-
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval. |
Beta Was this translation helpful? Give feedback.
-
Would love someone takes a look and fixes those incompatibilities. |
Beta Was this translation helpful? Give feedback.
-
Sorry, I mis-spoke when opening the issue. It looks like However, snowflake-connector-python==3.5.0 removes the When installing these 3 packages,
To rephrase, I believe |
Beta Was this translation helpful? Give feedback.
-
What happen if you use latest version of package: https://pypi.org/project/snowflake-connector-python/#history ? |
Beta Was this translation helpful? Give feedback.
-
Yes. I think I also would like to the bottom of this. I believe (after a quick look) snowlfake connector dropped pyarrow because it's well, optional, and it's very likely that what you are doing and the query you are running is triggering this error becasue of some implicit conversion of timestamp. I saw Snowflake suggesting FROM_TIMESTAMP and other ways of getting timestamp that does not require So I think 1st you should take a look at the queries you use and see if by modifying your queries you could get rid of the pyarrow requirement - following what snowflake did. I'd wait for your report on attempting to do so if latest version does not work for you already. And even if there are cases where pyarrow might be useful, then at MOST this would call for an optional feature of the provider, not requirement. For example we could have |
Beta Was this translation helpful? Give feedback.
-
Maarked it as |
Beta Was this translation helpful? Give feedback.
-
To reproduce the error, I have the following PyPi packages installed:
And ran the following code: from airflow.providers.snowflake.hooks.snowflake import SnowflakeHook
import pandas as pd
from snowflake.connector.pandas_tools import pd_writer
hook = SnowflakeHook(
# Add SF connection details here...
)
# Create table
temp_sf = {
"database": "test",
"schema": "scratch",
"table": "temp_table",
}
table_name = f"{temp_sf['database']}.{temp_sf['schema']}.{temp_sf['table']}"
create_table_query = f"""
CREATE OR REPLACE TABLE {table_name}
(COL1 INT, COL2 TIMESTAMP_NTZ(9)) AS
SELECT 1, '2021-01-01T01:00:00.000000000'::timestamp_ntz
"""
results = hook.get_pandas_df(create_table_query)
# Append new data
new_data = pd.DataFrame({
"COL1": [4, 5, 6],
"COL2": [
'2021-01-04T04:00:00.000000000',
'2021-01-05T05:00:00.000000000',
'2021-01-06T06:00:00.000000000',
],
})
engine = hook.get_sqlalchemy_engine()
with engine.connect() as conn:
# Regardless of whether pyarrow is installed, this will append data to Snowflake table
# However, if pyarrow isn't installed, then COL2 will have invalid timestamps
new_data.to_sql(
name=table_name,
con=conn,
if_exists="append",
index=False,
method=pd_writer,
)
# If pyarrow is installed, this will return the correct data
# If pyarrow isn't installed, this will error: Timestamp '(seconds_since_epoch=1712074200000000000)' is not recognized
results = hook.get_pandas_df(f"SELECT * FROM {table_name}")
print(results) Results: But when Open Questions: |
Beta Was this translation helpful? Give feedback.
-
That's likely what you should check. following Snowflake documentation and experimenting with snowflake connector. I suggest to open isssue in the connector repository - there is likely a reason they dropped pyarrow, probably there is a way to do the same without Pyarrow. |
Beta Was this translation helpful? Give feedback.
-
30 seconds googling from my side: This is what is called under-the-hood: You seem to be using an implicit conversion of Pandas that is happening in this case. But those are all guesses - I have no experience with either of the technologies, just googled it. But for me it looks like it's not snowflake provider that depends on |
Beta Was this translation helpful? Give feedback.
-
Converting to a discussion - that does not seem like provider's dependency missing, simply user implicitly depending on Pandas' optional feature of using Pyarrow for data conversion. |
Beta Was this translation helpful? Give feedback.
-
For anyone stumbling across this thread, if you cast the
|
Beta Was this translation helpful? Give feedback.
That's likely what you should check. following Snowflake documentation and experimenting with snowflake connector. I suggest to open isssue in the connector repository - there is likely a reason they dropped pyarrow, probably there is a way to do the same without Pyarrow.