Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add user passthrough column to users model and update dependencies #44

Merged

Conversation

fivetran-reneeli
Copy link
Contributor

@fivetran-reneeli fivetran-reneeli commented Jul 17, 2024

PR Overview

This PR will address the following Issue/Feature:
#43
and internal ticket

This PR will result in the following new package version: v0.12.0

Non breaking as the default config for event_extension is still true so existing users won't see any changes. And the passthrough columns only add columns, instead of deleting or modifying existing ones.

Please provide the finalized CHANGELOG entry which details the relevant changes included in this PR:

Bug Fixes

  • Introduces variable iterable__using_event_extension to disable the event_extension table and exclude its fields from persisting downstream. This allows the downstream models to run even if the source event_extension table does not exist. For more information on how to configure the iterable__using_event_extension variable, refer to the README.
  • Persists user_history passthrough columns as stipulated via the iterable_user_history_pass_through_columns variable through to the iterable__users model. For more information on how to configure the iterable_user_history_pass_through_columns variable, refer to the README.

Under the Hood

  • Addition of integrity and consistency validation tests within integration tests pertaining to the iterable__user_unsubscriptions model.
  • Updated seed data to ensure proper testing of the latest v0.8.1 dbt_iterable_source release.
  • Updated pull request and issue templates.
  • Included auto-releaser GitHub Actions workflow to automate future releases.

PR Checklist

Basic Validation

Please acknowledge that you have successfully performed the following commands locally:

  • dbt run –full-refresh && dbt test
  • dbt run (if incremental models are present) && dbt test

Before marking this PR as "ready for review" the following have been applied:

  • The appropriate issue has been linked, tagged, and properly assigned
  • All necessary documentation and version upgrades have been applied
  • docs were regenerated (unless this PR does not include any code or yml updates)
  • BuildKite integration tests are passing
  • Detailed validation steps have been provided below

Detailed Validation

Please share any and all of your validation steps:

Variables tried:

    iterable_user_history_pass_through_columns:
      - name: phone_number_line_type
        alias: phone_digits
      - name: additional_properties
    iterable_event_extension_pass_through_columns:
      - name: browser_token
    iterable__using_event_extension: true
#    iterable__using_event_extension: false
    

If you had to summarize this PR in an emoji, which would it be?

💃

@fivetran-reneeli
Copy link
Contributor Author

will regen docs upon approval

Copy link
Contributor

@fivetran-avinash fivetran-avinash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-reneeli Awesome job! This is a pretty extensive update with a lot of custom logic you had to implement, so great work in getting it to this point.

Some comments to address before approval. Let me know if you have any questions.

Also, you will need to regenerate docs.

CHANGELOG.md Outdated
@@ -1,4 +1,9 @@
# dbt_iterable v0.RELEASE.RELEASE
# dbt_iterable v0.11.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are changes to when experiment_id is or is not being brought in this model, plus the new aggregate metrics, which I believe classifies as a schema change. Should we consider making this a breaking change and bumping the version up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point-- bumped to breaking change


## Bug Fixes
- Introduces variable `iterable__using_event_extension` to disable the `event_extension` table and exclude its fields from persisting downstream. This allows the downstream models to run even if the source `event_extension` table does not exist. For more information on how to configure the `iterable__using_event_extension` variable, refer to the [README](https://github.com/fivetran/dbt_iterable/blob/main/README.md#step-4-enablingdisabling-models).
- Persists `user_history` passthrough columns, as stipulated via the `iterable_user_history_pass_through_columns` variable, through to the `iterable__users` model. For more information on how to configure the `iterable_user_history_pass_through_columns` variable, refer to the [README](https://github.com/fivetran/dbt_iterable/blob/main/README.md#passing-through-additional-fields).

## Under the Hood
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add more details around the consistency tests you've created here too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added!

README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
## For validation testing. To be commented out before release.
models:
+schema: "iterable_{{ var('directed_schema','dev') }}"
# models:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can uncomment this, Buildkite is equipped to handle it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

'experiment_id'] %}
, sum( {{ col.name }} ) as {{ col.name }}
{% if col.name|lower not in ['unique_user_key', 'user_id', '_fivetran_user_id', 'user_email', 'user_full_name', 'campaign_id', 'campaign_name', 'recurring_campaign_id', 'recurring_campaign_name', 'first_event_at', 'last_event_at', 'template_id', 'template_name'] %}
{% if col.name|lower == 'experiment_id' %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar note to int_iterable__campaign_events.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added experiment_id back to the exclusion

{% if col.name|lower == 'experiment_id' %}
{% if col.name in user_campaign_columns %}
{% if var('iterable__using_event_extension', True) %}
, sum({{ col.name }}) as {{ col.name }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar note to int_iterable__campaign_events.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

dev as (
select *
from {{ target.schema }}_iterable_dev.iterable__users
where date(created_at) < date({{ dbt.current_timestamp() }})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
where date(created_at) < date({{ dbt.current_timestamp() }})
where date(updated_at) < date({{ dbt.current_timestamp() }})


{%- set user_campaign_columns = adapter.get_columns_in_relation(ref('iterable__user_campaign')) %}

{%- set user_campaign_columns = adapter.get_columns_in_relation(ref('iterable__user_campaign')) %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{%- set user_campaign_columns = adapter.get_columns_in_relation(ref('iterable__user_campaign')) %}
{%- set user_campaign_columns = adapter.get_columns_in_relation(ref('iterable__user_campaign')) %}

Copy link
Contributor

@fivetran-avinash fivetran-avinash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-reneeli thanks for applying all these changes! I verified that the validation tests worked as expected.

A few comments left to address (plus another reminder to regenerate the docs) and then you should be good for approval.

.buildkite/scripts/run_models.sh Show resolved Hide resolved
README.md Show resolved Hide resolved
.buildkite/scripts/run_models.sh Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder to bump the README dbt_iterable version in lines 76-80, now that it's a breaking change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah thank you, updated

@@ -21,7 +21,8 @@ dbt compile --target "$db"
dbt run --target "$db" --full-refresh
dbt run --target "$db"
dbt test --target "$db"
dbt run --vars '{iterable__using_campaign_label_history: false, iterable__using_user_unsubscribed_message_type_history: false, iterable__using_campaign_suppression_list_history: false}' --target "$db" --full-refresh
dbt run --vars '{iterable_user_history_pass_through_columns: [{name: "phone_number_updated_at"}], iterable_event_extension_pass_through_columns: [{name: "web_push_message"}]}' --target "$db" --full-refresh
dbt run --vars '{iterable__using_campaign_label_history: false, iterable__using_user_unsubscribed_message_type_history: false, iterable__using_campaign_suppression_list_history: false, iterable__using_event_extension: false}' --target "$db"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-avinash this is now failing I think due to needing a full refresh after the variables change. Gonna update the script

Copy link
Contributor

@fivetran-avinash fivetran-avinash Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-reneeli Apologies. See PR approval comment.

Copy link
Contributor

@fivetran-avinash fivetran-avinash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @fivetran-reneeli ! Thanks for addressing all the additional comments! This looks good to go.

Regarding your script issue and your commit history, it looks like I sent a final incorrect comment. My initial recommendation was to add the additional incremental run script after the full refresh. But my last suggestion in Github did indeed remove it. Sorry, that line should remain unchanegd.

Can you test this configuration in Buildkite that runs a full-refresh, then an incremental run, and see if this works:

dbt run --vars '{iterable__using_campaign_label_history: false, iterable__using_user_unsubscribed_message_type_history: false, iterable__using_campaign_suppression_list_history: false, iterable__using_event_extension: false}' --target "$db" —full-refresh

dbt run --vars '{iterable__using_campaign_label_history: false, iterable__using_user_unsubscribed_message_type_history: false, iterable__using_campaign_suppression_list_history: false, iterable__using_event_extension: false}' --target "$db"

@fivetran-reneeli
Copy link
Contributor Author

Hi @fivetran-reneeli ! Thanks for addressing all the additional comments! This looks good to go.

Regarding your script issue and your commit history, it looks like I sent a final incorrect comment. My initial recommendation was to add the additional incremental run script after the full refresh. But my last suggestion in Github did indeed remove it. Sorry, that line should remain unchanegd.

Can you test this configuration in Buildkite that runs a full-refresh, then an incremental run, and see if this works:

dbt run --vars '{iterable__using_campaign_label_history: false, iterable__using_user_unsubscribed_message_type_history: false, iterable__using_campaign_suppression_list_history: false, iterable__using_event_extension: false}' --target "$db" —full-refresh

dbt run --vars '{iterable__using_campaign_label_history: false, iterable__using_user_unsubscribed_message_type_history: false, iterable__using_campaign_suppression_list_history: false, iterable__using_event_extension: false}' --target "$db"

Thanks @fivetran-avinash yeah I had figured this was what you meant and I tried that in this commit, while it failed I think it's because I didn't add the vars the second time around. Just pushed this and should be good. Appreciate your review!!

Copy link
Contributor

@fivetran-joemarkiewicz fivetran-joemarkiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-reneeli thanks for working through these changes. Please see my comments below to address before this will be ready for release approval.

Additionally, I noticed that dbt test fails if you have set iterable__using_event_extension to false. This is because in the yml where we define the tests we are still including experiment_id and not catching if the variable is turned on/off.

image

I recommend it would probably be best to create a surrogate key in the model that defines the uniqueness of the end model (like here in our netsuite model) and make that surrogate key dynamic so we can then test that field in the yml (like we have done here for the same netsuite model).

Lastly, I know this is out of scope for this task, but would you be able to create an issue for us to explore the iterable__events incremental logic. I did an integrity test on our internal data (albeit our internal data may be incomplete and not emulate a customer environment) and found that the row count for the iterable__events model does not match the row count of the stg_iterable__event model following an incremental run (this is present in both dev and prod). It would be great for us to proactively create a ticket for us to explore this in a future update.

image

packages.yml Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
@fivetran-reneeli
Copy link
Contributor Author

@fivetran-reneeli thanks for working through these changes. Please see my comments below to address before this will be ready for release approval.

Additionally, I noticed that dbt test fails if you have set iterable__using_event_extension to false. This is because in the yml where we define the tests we are still including experiment_id and not catching if the variable is turned on/off.

image

I recommend it would probably be best to create a surrogate key in the model that defines the uniqueness of the end model (like here in our netsuite model) and make that surrogate key dynamic so we can then test that field in the yml (like we have done here for the same netsuite model).

Lastly, I know this is out of scope for this task, but would you be able to create an issue for us to explore the iterable__events incremental logic. I did an integrity test on our internal data (albeit our internal data may be incomplete and not emulate a customer environment) and found that the row count for the iterable__events model does not match the row count of the stg_iterable__event model following an incremental run (this is present in both dev and prod). It would be great for us to proactively create a ticket for us to explore this in a future update.

image

Thanks @fivetran-joemarkiewicz for catching-- I removed experiment_id from anywhere it was included in a unique test and instead created a surrogate key in those models (iterable__user_campaign and iterable__campaigns).

Hmm strange about the row counts-- will make issue for that

Copy link
Contributor

@fivetran-joemarkiewicz fivetran-joemarkiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-reneeli thanks for working through this PR. I approved, but have a request to add another CHANGELOG entry for the new surrogate key fields and their purpose for the dynamic uniqueness tests. No need to block release approval, but please make sure that comment is addressed and the packages.yml is updated before being merged and released. Thanks!

packages.yml Outdated
# - package: fivetran/iterable_source
# version: [">=0.8.0", "<0.9.0"]
# - local: ../dbt_iterable_source
- git: https://github.com/fivetran/dbt_iterable_source.git
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder to update the source range and switch before merge and release

[PR #44](https://github.com/fivetran/dbt_iterable/pull/44) includes the following updates:

## 🚨 Breaking Changes 🚨
- Introduces variable `iterable__using_event_extension` to allow the `event_extension` table to be disabled and exclude its field, `experiment_id`, from persisting downstream. This permits the downstream models to run even if the source `event_extension` table does not exist. By default the variable is set to True. If you don't have this table, you will need to set `iterable__using_event_extension` to False. For more information on how to configure the `iterable__using_event_extension` variable, refer to the [README](https://github.com/fivetran/dbt_iterable/blob/main/README.md#step-4-enablingdisabling-models).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll also want to add in here that the uniqueness test has been updated to account for instances wheniterable__using_event_extension is enabled/disabled. It would be good to add what the new surrogate fields are and which models they can be found in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated!

@fivetran-reneeli fivetran-reneeli merged commit 0c3b449 into main Jul 25, 2024
8 checks passed
@fivetran-reneeli fivetran-reneeli deleted the bugfix/table_dependencies_and_passthrough_columns branch July 25, 2024 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Include User History Passthrough Columns in iterable__users model
3 participants