-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bugfix/hubspot-duplicates #5
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -48,7 +48,7 @@ engagement_deals as ( | |
from {{ ref('stg_rag_hubspot__engagement_deal') }} | ||
), | ||
|
||
engagement_details as ( | ||
engagement_detail_prep as ( | ||
|
||
select | ||
deals.deal_id, | ||
|
@@ -84,6 +84,21 @@ engagement_details as ( | |
and engagement_deals.source_relation = engagement_notes.source_relation | ||
), | ||
|
||
engagement_details as ( | ||
select | ||
deal_id, | ||
deal_name, | ||
url_reference, | ||
created_on, | ||
source_relation, | ||
{{ fivetran_utils.string_agg(field_to_agg="distinct engagement_type", delimiter="', '") }} as engagement_type, | ||
{{ fivetran_utils.string_agg(field_to_agg="distinct contact_name", delimiter="', '") }} as contact_name, | ||
{{ fivetran_utils.string_agg(field_to_agg="distinct created_by", delimiter="', '") }} as created_by, | ||
{{ fivetran_utils.string_agg(field_to_agg="distinct company_name", delimiter="', '") }} as company_name | ||
from engagement_detail_prep | ||
group by 1,2,3,4,5 | ||
Comment on lines
+87
to
+99
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Similar to the previous model. The joins in the prev cte are not 1:1. So we need to do some creative aggregating to make sure we retain all the necessary information, but to not cause any fannouts. |
||
), | ||
|
||
engagement_markdown as ( | ||
|
||
select | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,8 @@ | ||
{{ | ||
config( | ||
materialized='table' if unified_rag.is_databricks_sql_warehouse() else 'incremental', | ||
partition_by = {'field': 'most_recent_chunk_update', 'data_type': 'date', 'granularity': 'month'} | ||
if target.type not in ['spark', 'databricks'] else ['most_recent_chunk_update'], | ||
partition_by = {'field': 'update_date', 'data_type': 'date'} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ran into some issues with the incremental logic on BQ. These changes helped address those issues although it did require adding a new field (which has been documented and docs regen'd). |
||
if target.type not in ['spark', 'databricks'] else ['update_date'], | ||
cluster_by = ['unique_id'], | ||
unique_key='unique_id', | ||
incremental_strategy = 'insert_overwrite' if target.type in ('bigquery', 'databricks', 'spark') else 'delete+insert', | ||
|
@@ -26,14 +26,15 @@ | |
" platform, \n" ~ | ||
" source_relation, \n" ~ | ||
" most_recent_chunk_update, \n" ~ | ||
" cast(most_recent_chunk_update as date) as update_date, \n" ~ | ||
" chunk_index, \n" ~ | ||
" chunk_tokens_approximate, \n" ~ | ||
" chunk \n" ~ | ||
"from " ~ ref('rag_' ~ platform_name ~ '__document')) %} | ||
|
||
{% if is_incremental() %} | ||
{% set select_statement = select_statement ~ | ||
"\n where most_recent_chunk_update >= (select max(most_recent_chunk_update) from " ~ this ~ ")" %} | ||
"\n where cast(most_recent_chunk_update as date) >= (select max(update_date) from " ~ this ~ ")" %} | ||
{% endif %} | ||
|
||
{% do queries.append(select_statement) -%} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found a few cases where there are multiple contacts associated with an engagement email. This resulted in fannout. The stringagg will ensure all parties are included in the resulting data, but also not cause any fannout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.