You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There was an error where aws wrangler would not convert parquet dates correctly. This has been fixed in AWS Data Wrangler 2.12.0 (PR that fixed it for context) so we need to point pydbtools to at least this release.
Check that it functions correctly run the workaround that I posted in the PR:
You can use any SQL query that has an mojap_end_datetime as the default value is 2999-01-01 00:00:00 for latest records. This query should give back correct timestamps (2999-01-01 00:00:00).
If it works you will need to update the pyproject.toml to the relevant awrangler dependency.
The text was updated successfully, but these errors were encountered:
isichei
changed the title
pydbtools needs to be updated to latest awswrangler
CCDE-293 pydbtools needs to be updated to latest awswrangler
Nov 10, 2021
I don't have access to the xhibit_v1 database at the moment but this fails on the following query.
releases = wr.athena.read_sql_query(
"""
SELECT DISTINCT C.prison, C.offender_id, S.date_of_release
FROM nomis_ao.core AS C
LEFT JOIN nomis_ao.sentences AS S
ON S.extract_datetime = C.extract_datetime AND S.record_number = C.record_number
WHERE S.date_of_release IS NOT NULL
""",
database = "nomis_ao",
ctas_approach = False,
pyarrow_additional_kwargs={
"coerce_int96_timestamp_unit": "ms",
"timestamp_as_object": True
}
)
gives the AttributeError: Can only use .dt accessor with datetimelike values error.
For reference filtering the year does work
releases = wr.athena.read_sql_query(
"""
SELECT DISTINCT C.prison, C.offender_id, S.date_of_release
FROM nomis_ao.core AS C
LEFT JOIN nomis_ao.sentences AS S
ON S.extract_datetime = C.extract_datetime AND S.record_number = C.record_number
WHERE S.date_of_release IS NOT NULL AND YEAR(S.date_of_release) < 2100
""",
database = "nomis_ao",
ctas_approach = False,
pyarrow_additional_kwargs={
"coerce_int96_timestamp_unit": "ms",
"timestamp_as_object": True
}
)
It seems not to like years after 2500 - this data looks like typos from prison staff.
There was an error where aws wrangler would not convert parquet dates correctly. This has been fixed in AWS Data Wrangler 2.12.0 (PR that fixed it for context) so we need to point pydbtools to at least this release.
Check that it functions correctly run the workaround that I posted in the PR:
You can use any SQL query that has an
mojap_end_datetime
as the default value is2999-01-01 00:00:00
for latest records. This query should give back correct timestamps (2999-01-01 00:00:00
).If it works you will need to update the
pyproject.toml
to the relevant awrangler dependency.The text was updated successfully, but these errors were encountered: