Skip to content

Commit 8b325b8

Browse files
Bug/postgres performance (#126)
* patch/update-macro-readme * bug/postgres-performance * remove int model * remove int model * switch to jsonb * add limit for test * update changelog and regen docs * Update README.md fixed links * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md --------- Co-authored-by: Alex Ilyichov <[email protected]>
1 parent d355614 commit 8b325b8

13 files changed

+61
-48
lines changed

CHANGELOG.md

+13
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,16 @@
1+
# dbt_fivetran_log v1.7.3
2+
[PR #126](https://github.com/fivetran/dbt_fivetran_log/pull/126) includes the following updates:
3+
4+
## Performance Improvements
5+
- Updated the sequence of JSON parsing for model `fivetran_platform__audit_table` to reduce runtime.
6+
7+
## Bug Fixes
8+
- Updated model `fivetran_platform__audit_user_activity` to correct the JSON parsing used to determine column `email`. This fixes an issue introduced in v1.5.0 where `fivetran_platform__audit_user_activity` could potentially have 0 rows.
9+
10+
## Under the hood
11+
- Updated logic for macro `fivetran_log_lookback` to align with logic used in similar macros in other packages.
12+
- Updated logic for the postgres dispatch of macro `fivetran_log_json_parse` to utilize `jsonb` instead of `json` for performance.
13+
114
# dbt_fivetran_log v1.7.2
215
[PR #123](https://github.com/fivetran/dbt_fivetran_log/pull/123) includes the following updates:
316

README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
# Fivetran Platform dbt Package ([Docs](https://fivetran.github.io/dbt_fivetran_log/))
1717
# 📣 What does this dbt package do?
1818
- Generates a comprehensive data dictionary of your Fivetran Platform connector (previously called Fivetran Log) data via the [dbt docs site](https://fivetran.github.io/dbt_fivetran_log/)
19-
- Produces staging models in the format described by [this ERD](https://fivetran.com/docs/logs/fivetran-log#schemainformation) which clean, test, and prepare your Fivetran data from [Fivetran's free connector](https://fivetran.com/docs/applications/fivetran-log) and generates analysis ready end models.
19+
- Produces staging models in the format described by [this ERD](https://fivetran.com/docs/logs/fivetran-platform#schemainformation) which clean, test, and prepare your Fivetran data from [Fivetran's free connector](https://fivetran.com/docs/logs/fivetran-platform)) and generates analysis ready end models.
2020
- The above mentioned models enable you to better understand how you are spending money in Fivetran according to our [consumption-based pricing model](https://fivetran.com/docs/getting-started/consumption-based-pricing) as well as providing details about the performance and status of your Fivetran connectors. This is achieved by:
2121
- Displaying consumption data at the table, connector, destination, and account levels
2222
- Providing a history of measured free and paid monthly active rows (MAR), credit consumption, and the relationship between the two

dbt_project.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
config-version: 2
22
name: 'fivetran_log'
3-
version: '1.7.2'
3+
version: '1.7.3'
44
require-dbt-version: [">=1.3.0", "<2.0.0"]
55

66
models:

docs/catalog.json

+1-1
Large diffs are not rendered by default.

docs/manifest.json

+1-1
Large diffs are not rendered by default.

docs/run_results.json

+1-1
Large diffs are not rendered by default.

integration_tests/ci/sample.profiles.yml

+2-2
Original file line numberDiff line numberDiff line change
@@ -49,15 +49,15 @@ integration_tests:
4949
host: "{{ env_var('CI_DATABRICKS_DBT_HOST') }}"
5050
http_path: "{{ env_var('CI_DATABRICKS_DBT_HTTP_PATH') }}"
5151
schema: fivetran_platform_integration_tests
52-
threads: 2
52+
threads: 8
5353
token: "{{ env_var('CI_DATABRICKS_DBT_TOKEN') }}"
5454
type: databricks
5555
databricks-sql:
5656
catalog: "{{ env_var('CI_DATABRICKS_DBT_CATALOG') }}"
5757
host: "{{ env_var('CI_DATABRICKS_DBT_HOST') }}"
5858
http_path: "{{ env_var('CI_DATABRICKS_SQL_DBT_HTTP_PATH') }}"
5959
schema: sqlw_tests
60-
threads: 2
60+
threads: 8
6161
token: "{{ env_var('CI_DATABRICKS_SQL_DBT_TOKEN') }}"
6262
type: databricks
6363
sqlserver:

integration_tests/dbt_project.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
name: 'fivetran_log_integration_tests'
2-
version: '1.7.2'
2+
version: '1.7.3'
33

44
config-version: 2
55
profile: 'integration_tests'

macros/fivetran_log_json_parse.sql

+1-1
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@
2828
{% macro postgres__fivetran_log_json_parse(string, string_path) %}
2929

3030
case when {{ string }} ~ '^\s*[\{].*[\}]?\s*$' -- Postgres has no native json check, so this will check the string for indicators of a JSON object
31-
then {{ string }}::json #>> '{ {%- for s in string_path -%}{{ s }}{%- if not loop.last -%},{%- endif -%}{%- endfor -%} }'
31+
then {{ string }}::jsonb #>> '{ {%- for s in string_path -%}{{ s }}{%- if not loop.last -%},{%- endif -%}{%- endfor -%} }'
3232
else null end
3333

3434
{% endmacro %}

macros/fivetran_log_lookback.sql

+9-26
Original file line numberDiff line numberDiff line change
@@ -1,35 +1,18 @@
1-
{% macro fivetran_log_lookback(from_date, datepart='day', interval=7, default_start_date='2010-01-01') %}
1+
{% macro fivetran_log_lookback(from_date, datepart='day', interval=7, safety_date='2010-01-01') %}
22

3-
{{ adapter.dispatch('fivetran_log_lookback', 'fivetran_log') (from_date, datepart='day', interval=7, default_start_date='2010-01-01') }}
3+
{{ adapter.dispatch('fivetran_log_lookback', 'fivetran_log') (from_date, datepart='day', interval=7, safety_date='2010-01-01') }}
44

55
{%- endmacro %}
66

7-
{% macro default__fivetran_log_lookback(from_date, datepart='day', interval=7, default_start_date='2010-01-01') %}
7+
{% macro default__fivetran_log_lookback(from_date, datepart='day', interval=7, safety_date='2010-01-01') %}
88

9-
coalesce(
10-
(select {{ dbt.dateadd(datepart=datepart, interval=-interval, from_date_or_timestamp=from_date) }}
11-
from {{ this }}),
12-
{{ "'" ~ default_start_date ~ "'" }}
13-
)
9+
{% set sql_statement %}
10+
select coalesce({{ from_date }}, {{ "'" ~ safety_date ~ "'" }})
11+
from {{ this }}
12+
{%- endset -%}
1413

15-
{% endmacro %}
14+
{%- set result = dbt_utils.get_single_value(sql_statement) %}
1615

17-
{% macro bigquery__fivetran_log_lookback(from_date, datepart='day', interval=7, default_start_date='2010-01-01') %}
18-
19-
-- Capture the latest timestamp in a call statement instead of a subquery for optimizing BQ costs on incremental runs
20-
{%- call statement('date_agg', fetch_result=True) -%}
21-
select {{ from_date }} from {{ this }}
22-
{%- endcall -%}
23-
24-
-- load the result from the above query into a new variable
25-
{%- set query_result = load_result('date_agg') -%}
26-
27-
-- the query_result is stored as a dataframe. Therefore, we want to now store it as a singular value.
28-
{%- set date_agg = query_result['data'][0][0] %}
29-
30-
coalesce(
31-
{{ dbt.dateadd(datepart='day', interval=-7, from_date_or_timestamp="'" ~ date_agg ~ "'") }},
32-
{{ "'" ~ default_start_date ~ "'" }}
33-
)
16+
{{ dbt.dateadd(datepart=datepart, interval=-interval, from_date_or_timestamp="cast('" ~ result ~ "' as date)") }}
3417

3518
{% endmacro %}

models/fivetran_platform__audit_table.sql

+28-11
Original file line numberDiff line numberDiff line change
@@ -10,21 +10,39 @@
1010
file_format='delta' if is_databricks_sql_warehouse(target) else 'parquet'
1111
) }}
1212

13-
with sync_log as (
13+
with base as (
1414

15-
select
16-
*,
17-
{{ fivetran_log.fivetran_log_json_parse(string='message_data', string_path=['table']) }} as table_name
15+
select *
1816
from {{ ref('stg_fivetran_platform__log') }}
1917
where event_subtype in ('sync_start', 'sync_end', 'write_to_table_start', 'write_to_table_end', 'records_modified')
2018

2119
{% if is_incremental() %}
22-
2320
and cast(created_at as date) > {{ fivetran_log.fivetran_log_lookback(from_date='max(sync_start_day)', interval=7) }}
24-
2521
{% endif %}
2622
),
2723

24+
sync_log as (
25+
select
26+
*,
27+
{{ fivetran_log.fivetran_log_json_parse(string='message_data', string_path=['table']) }} as table_name,
28+
cast(null as {{ dbt.type_string() }}) as schema_name,
29+
cast(null as {{ dbt.type_string() }}) as operation_type,
30+
cast(null as {{ dbt.type_bigint() }}) as row_count
31+
from base
32+
where event_subtype in ('sync_start', 'sync_end', 'write_to_table_start', 'write_to_table_end')
33+
34+
union all
35+
36+
select
37+
*,
38+
{{ fivetran_log.fivetran_log_json_parse(string='message_data', string_path=['table']) }} as table_name,
39+
{{ fivetran_log.fivetran_log_json_parse(string='message_data', string_path=['schema']) }} as schema_name,
40+
{{ fivetran_log.fivetran_log_json_parse(string='message_data', string_path=['operationType']) }} as operation_type,
41+
cast ({{ fivetran_log.fivetran_log_json_parse(string='message_data', string_path=['count']) }} as {{ dbt.type_bigint() }}) as row_count
42+
from base
43+
where event_subtype = 'records_modified'
44+
),
45+
2846
connector as (
2947

3048
select *
@@ -80,13 +98,12 @@ records_modified_log as (
8098
select
8199
connector_id,
82100
created_at,
83-
{{ fivetran_log.fivetran_log_json_parse(string='message_data', string_path=['table']) }} as table_name,
84-
{{ fivetran_log.fivetran_log_json_parse(string='message_data', string_path=['schema']) }} as schema_name,
85-
{{ fivetran_log.fivetran_log_json_parse(string='message_data', string_path=['operationType']) }} as operation_type,
86-
cast ({{ fivetran_log.fivetran_log_json_parse(string='message_data', string_path=['count']) }} as {{ dbt.type_bigint() }}) as row_count
101+
table_name,
102+
schema_name,
103+
operation_type,
104+
row_count
87105
from sync_log
88106
where event_subtype = 'records_modified'
89-
90107
),
91108

92109
sum_records_modified as (

models/fivetran_platform__audit_user_activity.sql

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ with logs as (
22

33
select
44
*,
5-
{{ fivetran_log.fivetran_log_json_parse(string='message_data', string_path='actor') }} as actor_email
5+
{{ fivetran_log.fivetran_log_json_parse(string='message_data', string_path=['actor']) }} as actor_email
66
from {{ ref('stg_fivetran_platform__log') }}
77
where lower(message_data) like '%actor%'
88
),

models/staging/stg_fivetran_platform__log.sql

+1-1
Original file line numberDiff line numberDiff line change
@@ -34,4 +34,4 @@ final as (
3434
)
3535

3636
select *
37-
from final
37+
from final

0 commit comments

Comments
 (0)