Skip to content

Commit 0c18eda

Browse files
Merge pull request #30 from fivetran/MagicBot/databricks-compatibility
Feature: Databricks compatibility
2 parents 7863ee1 + f048089 commit 0c18eda

14 files changed

+80
-43
lines changed

.buildkite/hooks/pre-command

+2-1
Original file line numberDiff line numberDiff line change
@@ -21,4 +21,5 @@ export CI_SNOWFLAKE_DBT_USER=$(gcloud secrets versions access latest --secret="C
2121
export CI_SNOWFLAKE_DBT_WAREHOUSE=$(gcloud secrets versions access latest --secret="CI_SNOWFLAKE_DBT_WAREHOUSE" --project="dbt-package-testing-363917")
2222
export CI_DATABRICKS_DBT_HOST=$(gcloud secrets versions access latest --secret="CI_DATABRICKS_DBT_HOST" --project="dbt-package-testing-363917")
2323
export CI_DATABRICKS_DBT_HTTP_PATH=$(gcloud secrets versions access latest --secret="CI_DATABRICKS_DBT_HTTP_PATH" --project="dbt-package-testing-363917")
24-
export CI_DATABRICKS_DBT_TOKEN=$(gcloud secrets versions access latest --secret="CI_DATABRICKS_DBT_TOKEN" --project="dbt-package-testing-363917")
24+
export CI_DATABRICKS_DBT_TOKEN=$(gcloud secrets versions access latest --secret="CI_DATABRICKS_DBT_TOKEN" --project="dbt-package-testing-363917")
25+
export CI_DATABRICKS_DBT_CATALOG=$(gcloud secrets versions access latest --secret="CI_DATABRICKS_DBT_CATALOG" --project="dbt-package-testing-363917")

.buildkite/pipeline.yml

+15
Original file line numberDiff line numberDiff line change
@@ -57,3 +57,18 @@ steps:
5757
- "CI_REDSHIFT_DBT_USER"
5858
commands: |
5959
bash .buildkite/scripts/run_models.sh redshift
60+
61+
- label: ":databricks: Run Tests - Databricks"
62+
key: "run_dbt_databricks"
63+
plugins:
64+
- docker#v3.13.0:
65+
image: "python:3.8"
66+
shell: [ "/bin/bash", "-e", "-c" ]
67+
environment:
68+
- "BASH_ENV=/tmp/.bashrc"
69+
- "CI_DATABRICKS_DBT_HOST"
70+
- "CI_DATABRICKS_DBT_HTTP_PATH"
71+
- "CI_DATABRICKS_DBT_TOKEN"
72+
- "CI_DATABRICKS_DBT_CATALOG"
73+
commands: |
74+
bash .buildkite/scripts/run_models.sh databricks

CHANGELOG.md

+15-4
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,21 @@
1-
# dbt_iterable v0.UPDATE.UPDATE
1+
# dbt_iterable v0.8.0
2+
[PR #30](https://github.com/fivetran/dbt_iterable/pull/30) includes the following updates:
3+
## 🚨 Breaking Changes 🚨 (recommend `--full-refresh`)
4+
- Updated the incremental strategy for end model `iterable__events`:
5+
- For Bigquery, Spark, and Databricks, the strategy has been updated to `insert_overwrite`.
6+
- For Snowflake, Redshift, and PostgreSQL, the strategy has been updated to `delete+insert`.
7+
- We recommend running `dbt run --full-refresh` the next time you run your project.
8+
## 🎉 Feature Update 🎉
9+
- Databricks compatibility for Runtime 12.2 or later.
10+
- Note some models may run with an earlier runtime, however 12.2 or later is required to run all models. This is because of syntax changes from earlier versions for use with arrays and JSON.
11+
- We also recommend using the `dbt-databricks` adapter over `dbt-spark` because each adapter handles incremental models differently. If you must use the `dbt-spark` adapter and run into issues, please refer to [this section](https://docs.getdbt.com/reference/resource-configs/spark-configs#the-insert_overwrite-strategy) found in dbt's documentation of Spark configurations.
212

3-
## Under the Hood:
4-
5-
- Incorporated the new `fivetran_utils.drop_schemas_automation` macro into the end of each Buildkite integration test job.
13+
[PR #27](https://github.com/fivetran/dbt_iterable/pull/27) includes the following updates:
14+
## 🚘 Under the Hood 🚘
15+
- Incorporated the new `fivetran_utils.drop_schemas_automation` macro into the end of each Buildkite integration test job.
616
- Updated the pull request [templates](/.github).
717

18+
819
# dbt_iterable v0.7.0
920
[PR #28](https://github.com/fivetran/dbt_iterable/pull/28) adds the following changes:
1021

README.md

+7-3
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,11 @@ The following table provides a detailed list of all models materialized within t
4242
To use this dbt package, you must have the following:
4343

4444
- At least one Fivetran Iterable connector syncing data into your destination.
45-
- A **BigQuery**, **Snowflake**, **Redshift**, or **PostgreSQL** destination.
45+
- A **BigQuery**, **Snowflake**, **Redshift**, **PostgreSQL**, or **Databricks** destination.
46+
47+
### Databricks Configuration
48+
- **Databricks Runtime 12.2** or later is required to run all models in this package.
49+
- We also recommend using the `dbt-databricks` adapter over `dbt-spark` because each adapter handles incremental models differently. If you must use the `dbt-spark` adapter and run into issues, please refer to [this section](https://docs.getdbt.com/reference/resource-configs/spark-configs#the-insert_overwrite-strategy) found in dbt's documentation of Spark configurations.
4650

4751
## Step 2: Install the package
4852
Include the following Iterable package version in your `packages.yml` file.
@@ -52,7 +56,7 @@ Include the following Iterable package version in your `packages.yml` file.
5256
```yaml
5357
packages:
5458
- package: fivetran/iterable
55-
version: [">=0.7.0", "<0.8.0"]
59+
version: [">=0.8.0", "<0.9.0"]
5660
```
5761
## Step 3: Define database and schema variables
5862
By default, this package runs using your destination and the `iterable` schema of your [target database](https://docs.getdbt.com/docs/running-a-dbt-project/using-the-command-line-interface/configure-your-profile). If this is not where your Iterable data is located (for example, if your Iterable schema is named `iterable_fivetran`), add the following configuration to your root `dbt_project.yml` file:
@@ -143,7 +147,7 @@ packages:
143147
version: [">=1.0.0", "<2.0.0"]
144148
145149
- package: fivetran/iterable_source
146-
version: [">=0.6.0", "<0.7.0"]
150+
version: [">=0.7.0", "<0.8.0"]
147151
```
148152

149153
# 🙌 How is this package maintained and can I contribute?

dbt_project.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
name: 'iterable'
2-
version: '0.7.0'
2+
version: '0.8.0'
33
config-version: 2
44
require-dbt-version: [">=1.3.0", "<2.0.0"]
55
models:

docs/catalog.json

+1-1
Large diffs are not rendered by default.

docs/index.html

+4-4
Large diffs are not rendered by default.

docs/manifest.json

+1-1
Large diffs are not rendered by default.

docs/run_results.json

+1-1
Large diffs are not rendered by default.

integration_tests/ci/sample.profiles.yml

+7-7
Original file line numberDiff line numberDiff line change
@@ -16,13 +16,13 @@ integration_tests:
1616
pass: "{{ env_var('CI_REDSHIFT_DBT_PASS') }}"
1717
dbname: "{{ env_var('CI_REDSHIFT_DBT_DBNAME') }}"
1818
port: 5439
19-
schema: iterable_integration_tests
19+
schema: iterable_integration_tests_03
2020
threads: 8
2121
bigquery:
2222
type: bigquery
2323
method: service-account-json
2424
project: 'dbt-package-testing'
25-
schema: iterable_integration_tests
25+
schema: iterable_integration_tests_03
2626
threads: 8
2727
keyfile_json: "{{ env_var('GCLOUD_SERVICE_KEY') | as_native }}"
2828
snowflake:
@@ -33,7 +33,7 @@ integration_tests:
3333
role: "{{ env_var('CI_SNOWFLAKE_DBT_ROLE') }}"
3434
database: "{{ env_var('CI_SNOWFLAKE_DBT_DATABASE') }}"
3535
warehouse: "{{ env_var('CI_SNOWFLAKE_DBT_WAREHOUSE') }}"
36-
schema: iterable_integration_tests
36+
schema: iterable_integration_tests_03
3737
threads: 8
3838
postgres:
3939
type: postgres
@@ -42,13 +42,13 @@ integration_tests:
4242
pass: "{{ env_var('CI_POSTGRES_DBT_PASS') }}"
4343
dbname: "{{ env_var('CI_POSTGRES_DBT_DBNAME') }}"
4444
port: 5432
45-
schema: iterable_integration_tests
45+
schema: iterable_integration_tests_03
4646
threads: 8
4747
databricks:
48-
catalog: null
48+
catalog: "{{ env_var('CI_DATABRICKS_DBT_CATALOG') }}"
4949
host: "{{ env_var('CI_DATABRICKS_DBT_HOST') }}"
5050
http_path: "{{ env_var('CI_DATABRICKS_DBT_HTTP_PATH') }}"
51-
schema: iterable_integration_tests
52-
threads: 2
51+
schema: iterable_integration_tests_03
52+
threads: 8
5353
token: "{{ env_var('CI_DATABRICKS_DBT_TOKEN') }}"
5454
type: databricks

integration_tests/dbt_project.yml

+6-3
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
config-version: 2
22
name: 'iterable_integration_tests'
3-
version: '0.7.0'
3+
version: '0.8.0'
44
profile: 'integration_tests'
55
vars:
66
iterable_source:
7-
iterable_schema: iterable_integration_tests
7+
iterable_schema: iterable_integration_tests_03
88
iterable_campaign_history_identifier: "campaign_history_data"
99
iterable_campaign_label_history_identifier: "campaign_label_history_data"
1010
iterable_campaign_list_history_identifier: "campaign_list_history_data"
@@ -46,4 +46,7 @@ seeds:
4646
message_type_id: "{%- if target.type == 'bigquery' -%} INT64 {%- else -%} bigint {%- endif -%}"
4747
user_history_data:
4848
+column_types:
49-
updated_at: timestamp
49+
updated_at: timestamp
50+
dispatch:
51+
- macro_namespace: dbt_utils
52+
search_order: ['spark_utils', 'dbt_utils']

models/intermediate/int_iterable__list_user_unnest.sql

+12-6
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
unique_key='unique_key',
44
incremental_strategy='insert_overwrite' if target.type in ('bigquery', 'spark', 'databricks') else 'delete+insert',
55
partition_by={"field": "date_day", "data_type": "date"} if target.type not in ('spark','databricks') else ['date_day'],
6-
file_format='delta',
6+
file_format='parquet',
77
on_schema_change='fail'
88
)
99
}}
@@ -71,9 +71,13 @@ with user_history as (
7171
is_current,
7272
email_list_ids,
7373
case when email_list_ids != '[]' then
74-
{% if target.type == 'snowflake' %}
75-
email_list_id.value
76-
{% else %} email_list_id {% endif %} else null end as email_list_id
74+
{% if target.type == 'snowflake' %}
75+
email_list_id.value
76+
{% elif target.type in ('spark','databricks') %}
77+
email_list_id.col
78+
{% else %} email_list_id {% endif %}
79+
else null
80+
end as email_list_id
7781

7882
from user_history
7983

@@ -83,8 +87,10 @@ with user_history as (
8387
{% elif target.type == 'bigquery' %}
8488
cross join
8589
unnest(JSON_EXTRACT_STRING_ARRAY(email_list_ids)) as email_list_id
86-
{% else %}
87-
{# postgres #}
90+
{% elif target.type in ('spark','databricks') %}
91+
cross join
92+
lateral explode_outer(from_json(email_list_ids, 'array<int>')) as email_list_id
93+
{% else %} {# target is postgres #}
8894
cross join
8995
json_array_elements_text(cast((
9096
case when email_list_ids = '[]' then '["is_null"]' {# to not remove empty array-rows #}

models/iterable__events.sql

+6-9
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,11 @@
1-
{{
2-
config(
1+
{{ config(
32
materialized='incremental',
43
unique_key='event_id',
5-
partition_by={
6-
"field": "created_on",
7-
"data_type": "date"
8-
} if target.type == 'bigquery' else none,
9-
incremental_strategy = 'merge' if target.type not in ('snowflake', 'postgres', 'redshift') else 'delete+insert',
10-
file_format = 'delta'
11-
)
4+
incremental_strategy='insert_overwrite' if target.type in ('bigquery', 'spark', 'databricks') else 'delete+insert',
5+
partition_by={"field": "created_on", "data_type": "date"} if target.type not in ('spark','databricks') else ['created_on'],
6+
file_format='parquet',
7+
on_schema_change='fail'
8+
)
129
}}
1310

1411
with events as (

packages.yml

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
packages:
2-
- package: fivetran/iterable_source
3-
version: [">=0.6.0", "<0.7.0"]
2+
- package: fivetran/iterable_source
3+
version: [">=0.7.0", "<0.8.0"]

0 commit comments

Comments
 (0)