Skip to content

Commit ce41a02

Browse files
feature/databricks-delta-incremental-support (#130)
* feature/databricks-delta-incremental-support * changelog and integration test updates * pre review modifications * Update consistency__audit_table.sql * Apply suggestions from code review Co-authored-by: Avinash Kunnath <[email protected]> * changelog, readme, and docs rebuild updates * Apply suggestions from code review Co-authored-by: Avinash Kunnath <[email protected]> * macro updates for full coverage * removed artifacts * schema change and CHANGELOG edit * added supported destinations to elif and docs regen * Update is_incremental_compatible.sql --------- Co-authored-by: Avinash Kunnath <[email protected]>
1 parent 5fef364 commit ce41a02

26 files changed

+701
-31
lines changed

CHANGELOG.md

+18
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,21 @@
1+
# dbt_fivetran_log v1.8.0
2+
[PR #130](https://github.com/fivetran/dbt_fivetran_log/pull/130) includes the following updates:
3+
4+
## 🚨 Breaking Changes 🚨
5+
> ⚠️ Since the following changes result in the table format changing, we recommend running a `--full-refresh` after upgrading to this version to avoid possible incremental failures.
6+
- For Databricks All-Purpose clusters, the `fivetran_platform__audit_table` model will now be materialized using the delta table format (previously parquet).
7+
- Delta tables are generally more performant than parquet and are also more widely available for Databricks users. Previously, the parquet file format was causing compilation issues on customers' managed tables.
8+
9+
## Documentation Updates
10+
- Updated the `sync_start` and `sync_end` field descriptions for the `fivetran_platform__audit_table` to explicitly define that these fields only represent the sync start/end times for when the connector wrote new or modified existing records to the specified table.
11+
- Addition of integrity and consistency validation tests within integration tests for every end model.
12+
- Removed duplicate Databricks dispatch instructions listed in the README.
13+
14+
## Under the Hood
15+
- The `is_databricks_sql_warehouse` macro has been renamed to `is_incremental_compatible` and has been modified to return `true` if the Databricks runtime being used is an all-purpose cluster (previously this macro checked if a sql warehouse runtime was used) **or** if any other non-Databricks supported destination is being used.
16+
- This update was applied as there have been other Databricks runtimes discovered (ie. an endpoint and external runtime) which do not support the `insert_overwrite` incremental strategy used in the `fivetran_platform__audit_table` model.
17+
- In addition to the above, for Databricks users the `fivetran_platform__audit_table` model will now leverage the incremental strategy only if the Databricks runtime is all-purpose. Otherwise, all other Databricks runtimes will not leverage an incremental strategy.
18+
119
# dbt_fivetran_log v1.7.3
220
[PR #126](https://github.com/fivetran/dbt_fivetran_log/pull/126) includes the following updates:
321

README.md

+1-9
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ Include the following Fivetran Platform package version range in your `packages.
6969
```yaml
7070
packages:
7171
- package: fivetran/fivetran_log
72-
version: [">=1.7.0", "<1.8.0"]
72+
version: [">=1.8.0", "<1.9.0"]
7373
```
7474

7575
> Note that although the source connector is now "Fivetran Platform", the package retains the old name of "fivetran_log".
@@ -112,14 +112,6 @@ vars:
112112
fivetran_platform_<default_table_name>_identifier: your_table_name
113113
```
114114

115-
### Databricks Additional Configuration
116-
If you are using a Databricks destination with this package you will need to add the below (or a variation of the below) dispatch configuration within your root `dbt_project.yml`. This is required in order for the package to accurately search for macros within the `dbt-labs/spark_utils` then the `dbt-labs/dbt_utils` packages respectively.
117-
```yml
118-
dispatch:
119-
- macro_namespace: dbt_utils
120-
search_order: ['spark_utils', 'dbt_utils']
121-
```
122-
123115
## (Optional) Step 6: Orchestrate your models with Fivetran Transformations for dbt Core™
124116
<details><summary>Expand for details</summary>
125117
<br>

dbt_project.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
config-version: 2
22
name: 'fivetran_log'
3-
version: '1.7.3'
3+
version: '1.8.0'
44
require-dbt-version: [">=1.3.0", "<2.0.0"]
55

66
models:

docs/catalog.json

+1-1
Large diffs are not rendered by default.

docs/manifest.json

+1-1
Large diffs are not rendered by default.

docs/run_results.json

+1-1
Large diffs are not rendered by default.

integration_tests/ci/sample.profiles.yml

+6-6
Original file line numberDiff line numberDiff line change
@@ -16,13 +16,13 @@ integration_tests:
1616
pass: "{{ env_var('CI_REDSHIFT_DBT_PASS') }}"
1717
dbname: "{{ env_var('CI_REDSHIFT_DBT_DBNAME') }}"
1818
port: 5439
19-
schema: fivetran_platform_integration_tests
19+
schema: fivetran_platform_integration_tests_5
2020
threads: 8
2121
bigquery:
2222
type: bigquery
2323
method: service-account-json
2424
project: 'dbt-package-testing'
25-
schema: fivetran_platform_integration_tests
25+
schema: fivetran_platform_integration_tests_5
2626
threads: 8
2727
keyfile_json: "{{ env_var('GCLOUD_SERVICE_KEY') | as_native }}"
2828
snowflake:
@@ -33,7 +33,7 @@ integration_tests:
3333
role: "{{ env_var('CI_SNOWFLAKE_DBT_ROLE') }}"
3434
database: "{{ env_var('CI_SNOWFLAKE_DBT_DATABASE') }}"
3535
warehouse: "{{ env_var('CI_SNOWFLAKE_DBT_WAREHOUSE') }}"
36-
schema: fivetran_platform_integration_tests
36+
schema: fivetran_platform_integration_tests_5
3737
threads: 8
3838
postgres:
3939
type: postgres
@@ -42,13 +42,13 @@ integration_tests:
4242
pass: "{{ env_var('CI_POSTGRES_DBT_PASS') }}"
4343
dbname: "{{ env_var('CI_POSTGRES_DBT_DBNAME') }}"
4444
port: 5432
45-
schema: fivetran_platform_integration_tests
45+
schema: fivetran_platform_integration_tests_5
4646
threads: 8
4747
databricks:
4848
catalog: "{{ env_var('CI_DATABRICKS_DBT_CATALOG') }}"
4949
host: "{{ env_var('CI_DATABRICKS_DBT_HOST') }}"
5050
http_path: "{{ env_var('CI_DATABRICKS_DBT_HTTP_PATH') }}"
51-
schema: fivetran_platform_integration_tests
51+
schema: fivetran_platform_integration_tests_5
5252
threads: 8
5353
token: "{{ env_var('CI_DATABRICKS_DBT_TOKEN') }}"
5454
type: databricks
@@ -66,7 +66,7 @@ integration_tests:
6666
server: "{{ env_var('CI_SQLSERVER_DBT_SERVER') }}"
6767
port: 1433
6868
database: "{{ env_var('CI_SQLSERVER_DBT_DATABASE') }}"
69-
schema: fivetran_platform_integration_tests
69+
schema: fivetran_platform_integration_tests_5
7070
user: "{{ env_var('CI_SQLSERVER_DBT_USER') }}"
7171
password: "{{ env_var('CI_SQLSERVER_DBT_PASS') }}"
7272
threads: 8

integration_tests/dbt_project.yml

+3-3
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
name: 'fivetran_log_integration_tests'
2-
version: '1.7.3'
2+
version: '1.8.0'
33

44
config-version: 2
55
profile: 'integration_tests'
@@ -10,7 +10,7 @@ dispatch:
1010

1111
vars:
1212
fivetran_log:
13-
fivetran_platform_schema: "fivetran_platform_integration_tests"
13+
fivetran_platform_schema: "fivetran_platform_integration_tests_5"
1414
fivetran_platform_account_identifier: "account"
1515
fivetran_platform_incremental_mar_identifier: "incremental_mar"
1616
fivetran_platform_connector_identifier: "connector"
@@ -21,10 +21,10 @@ vars:
2121
fivetran_platform_log_identifier: "log"
2222
fivetran_platform_user_identifier: "user"
2323

24-
2524
models:
2625
fivetran_log:
2726
+schema: "{{ 'sqlw_tests' if target.name == 'databricks-sql' else 'fivetran_platform' }}"
27+
# +schema: "fivetran_platform_{{ var('directed_schema','dev') }}"
2828

2929
seeds:
3030
fivetran_log_integration_tests:

integration_tests/packages.yml

+1-2
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,2 @@
1-
21
packages:
3-
- local: ../
2+
- local: ../
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
2+
{{ config(
3+
tags="fivetran_validations",
4+
enabled=var('fivetran_validation_tests_enabled', false)
5+
) }}
6+
7+
with prod as (
8+
select
9+
connector_id,
10+
table_name,
11+
count(*) as total_records
12+
from {{ target.schema }}_fivetran_platform_prod.fivetran_platform__audit_table
13+
group by 1, 2
14+
),
15+
16+
dev as (
17+
select
18+
connector_id,
19+
table_name,
20+
count(*) as total_records
21+
from {{ target.schema }}_fivetran_platform_dev.fivetran_platform__audit_table
22+
group by 1, 2
23+
),
24+
25+
final_consistency_check as (
26+
select
27+
prod.connector_id,
28+
prod.table_name,
29+
prod.total_records as prod_total,
30+
dev.total_records as dev_total
31+
from prod
32+
left join dev
33+
on dev.connector_id = prod.connector_id
34+
and dev.table_name = prod.table_name
35+
),
36+
37+
-- Checking to ensure the dev totals match the prod totals
38+
consistency_check as (
39+
select *
40+
from final_consistency_check
41+
where prod_total != dev_total
42+
),
43+
44+
-- For use when the current release changes the row count of the audit table model intentionally.
45+
-- The below queries prove the records that do not match are still accurate by checking the source.
46+
verification_staging_setup as (
47+
select
48+
connector_id,
49+
{{ fivetran_log.fivetran_log_json_parse(string='message_data', string_path=['table']) }} as table_name,
50+
count(*) as row_count
51+
from {{ target.schema }}_fivetran_platform_dev.stg_fivetran_platform__log
52+
where event_subtype in ('write_to_table_start')
53+
group by 1, 2
54+
),
55+
56+
final_verification as (
57+
select *
58+
from consistency_check
59+
left join verification_staging_setup
60+
on consistency_check.connector_id = verification_staging_setup.connector_id
61+
and consistency_check.table_name = verification_staging_setup.table_name
62+
where consistency_check.dev_total != verification_staging_setup.row_count
63+
)
64+
65+
select *
66+
from final_verification
67+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
2+
{{ config(
3+
tags="fivetran_validations",
4+
enabled=var('fivetran_validation_tests_enabled', false)
5+
) }}
6+
7+
with prod as (
8+
select
9+
connector_id,
10+
email,
11+
date_day,
12+
count(*) as total_records
13+
from {{ target.schema }}_fivetran_platform_prod.fivetran_platform__audit_user_activity
14+
group by 1, 2, 3
15+
),
16+
17+
dev as (
18+
select
19+
connector_id,
20+
email,
21+
date_day,
22+
count(*) as total_records
23+
from {{ target.schema }}_fivetran_platform_dev.fivetran_platform__audit_user_activity
24+
group by 1, 2, 3
25+
),
26+
27+
final as (
28+
select
29+
prod.connector_id,
30+
prod.email,
31+
prod.date_day,
32+
prod.total_records as prod_total,
33+
dev.total_records as dev_total
34+
from prod
35+
left join dev
36+
on dev.connector_id = prod.connector_id
37+
and dev.email = prod.email
38+
and dev.date_day = prod.date_day
39+
)
40+
41+
select *
42+
from final
43+
where prod_total != dev_total
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
2+
{{ config(
3+
tags="fivetran_validations",
4+
enabled=var('fivetran_validation_tests_enabled', false)
5+
) }}
6+
7+
with prod as (
8+
select
9+
date_day,
10+
connector_id,
11+
destination_id,
12+
count(*) as total_records
13+
from {{ target.schema }}_fivetran_platform_prod.fivetran_platform__connector_daily_events
14+
group by 1, 2, 3
15+
),
16+
17+
dev as (
18+
select
19+
date_day,
20+
connector_id,
21+
destination_id,
22+
count(*) as total_records
23+
from {{ target.schema }}_fivetran_platform_dev.fivetran_platform__connector_daily_events
24+
group by 1, 2, 3
25+
),
26+
27+
final as (
28+
select
29+
prod.date_day,
30+
prod.connector_id,
31+
prod.destination_id,
32+
prod.total_records as prod_total,
33+
dev.total_records as dev_total
34+
from prod
35+
left join dev
36+
on dev.date_day = prod.date_day
37+
and dev.connector_id = prod.connector_id
38+
and dev.destination_id = prod.destination_id
39+
)
40+
41+
select *
42+
from final
43+
where prod_total != dev_total
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
2+
{{ config(
3+
tags="fivetran_validations",
4+
enabled=var('fivetran_validation_tests_enabled', false)
5+
) }}
6+
7+
with prod as (
8+
select
9+
1 as join_key,
10+
count(*) as total_records,
11+
sum(number_of_schema_changes_last_month) as total_schema_changes_last_month
12+
from {{ target.schema }}_fivetran_platform_prod.fivetran_platform__connector_status
13+
group by 1
14+
),
15+
16+
dev as (
17+
select
18+
1 as join_key,
19+
count(*) as total_records,
20+
sum(number_of_schema_changes_last_month) as total_schema_changes_last_month
21+
from {{ target.schema }}_fivetran_platform_dev.fivetran_platform__connector_status
22+
group by 1
23+
),
24+
25+
final as (
26+
select
27+
prod.join_key,
28+
dev.join_key,
29+
prod.total_records as prod_total,
30+
dev.total_records as dev_total,
31+
prod.total_schema_changes_last_month as prod_total_schema_changes,
32+
dev.total_schema_changes_last_month as dev_total_schema_changes
33+
from prod
34+
left join dev
35+
on dev.join_key = prod.join_key
36+
)
37+
38+
select *
39+
from final
40+
where prod_total != dev_total
41+
or prod_total_schema_changes != dev_total_schema_changes
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
2+
{{ config(
3+
tags="fivetran_validations",
4+
enabled=var('fivetran_validation_tests_enabled', false)
5+
) }}
6+
7+
with prod as (
8+
select
9+
connector_name,
10+
schema_name,
11+
table_name,
12+
destination_id,
13+
measured_month,
14+
sum(total_monthly_active_rows) as total_mar,
15+
count(*) as total_records
16+
from {{ target.schema }}_fivetran_platform_prod.fivetran_platform__mar_table_history
17+
group by 1, 2, 3, 4, 5
18+
),
19+
20+
dev as (
21+
select
22+
connector_name,
23+
schema_name,
24+
table_name,
25+
destination_id,
26+
measured_month,
27+
sum(total_monthly_active_rows) as total_mar,
28+
count(*) as total_records
29+
from {{ target.schema }}_fivetran_platform_dev.fivetran_platform__mar_table_history
30+
group by 1, 2, 3, 4, 5
31+
),
32+
33+
final as (
34+
select
35+
prod.connector_name,
36+
prod.schema_name,
37+
prod.table_name,
38+
prod.destination_id,
39+
prod.measured_month,
40+
prod.total_records as prod_total,
41+
dev.total_records as dev_total,
42+
prod.total_mar as prod_total_mar,
43+
dev.total_mar as dev_total_mar
44+
from prod
45+
left join dev
46+
on dev.connector_name = prod.connector_name
47+
and dev.schema_name = prod.schema_name
48+
and dev.table_name = prod.table_name
49+
and dev.destination_id = prod.destination_id
50+
and dev.measured_month = prod.measured_month
51+
)
52+
53+
select *
54+
from final
55+
where prod_total != dev_total
56+
or prod_total_mar != dev_total_mar

0 commit comments

Comments
 (0)