Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] truncate() partition transformation does not work when it includes more than 100 partitions #594

Open
2 tasks done
alex-antonison opened this issue Mar 11, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@alex-antonison
Copy link

Is this a new bug in dbt-athena?

  • I believe this is a new bug in dbt-athena
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

When you use a truncate() partition transformation for a column that will result in more than 100 partitions, the batch partitioning functionality starts up and allows you to exceed 100 partitions.

{{
    config(
        materialized = 'table',
        table_type = 'iceberg',
        force_batch = true,
        partitioned_by = ['truncate(string_partition,2)', 'month(date_partition)']
    )
}}

However, when the query reaches out to Athena to pull in the distinct partitions, it uses truncate() in the query which is not a supported method of extracting values from a string in Athena.

select distinct truncate(string_partition,2), date_trunc('month', date_partition)
from "awsdatacatalog"."data_lake"."table__ha__tmp_not_partitioned"
order by truncate(string_partition,2), date_trunc('month', date_partition)

Instead, it could use something like substring() to pull back the unique partial values

select distinct substring(string_partition,1,2), date_trunc('month', date_partition)
from "awsdatacatalog"."data_lake"."table__ha__tmp_not_partitioned"
order by substring(string_partition,1,2), date_trunc('month', date_partition)

Expected Behavior

When I do a truncate() Iceberg partition transformation on a column, it is capable of handling something with greater than 100 partitions.

Steps To Reproduce

Create a model with a column that when a partition transformation of truncate() is used, it will result in more than 100 partitions.

Environment

- OS: MacOS
- Python: 3.11
- dbt: 1.7.7
- dbt-athena-community: 1.7.1

Additional Context

This is out of a Slack conversation: https://getdbt.slack.com/archives/C013MLFR7BQ/p1709755667814619

This method was referenced as where the work would need to be changed: https://github.com/dbt-athena/dbt-athena/blob/289be4f4f44f3d5a6cf575d8fe218209c4a41171/dbt/adapters/athena/impl.py#L1279

Apache Iceberg Truncate Partition documentation: https://iceberg.apache.org/spec/#truncate-transform-details

@alex-antonison alex-antonison added the bug Something isn't working label Mar 11, 2024
@nicor88
Copy link
Contributor

nicor88 commented Mar 11, 2024

@svdimchenko FYI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants