Skip to content

Commit

Permalink
update readme and changelog
Browse files Browse the repository at this point in the history
  • Loading branch information
fivetran-catfritz committed May 16, 2023
1 parent 1aa067a commit dd8e7c3
Show file tree
Hide file tree
Showing 2 changed files with 18 additions and 8 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# dbt_mixpanel v0.8.0
>Note: If you run into issues with this update, we suggest to try a **full refresh**.
## 🎉 Feature Updates 🎉
- Databricks and Postgres compatibility! ([PR #33](https://github.com/fivetran/dbt_mixpanel/pull/33))

Expand Down
25 changes: 17 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,23 @@ To use this dbt package, you must have the following:
- At least one Fivetran Mixpanel connector syncing data into your destination.
- A **BigQuery**, **Snowflake**, **Redshift**, **PostgreSQL**, or **Databricks** destination.

### Databricks dispatch configuration
If you are using a Databricks destination with this package, you must add the following (or a variation of the following) dispatch configuration within your `dbt_project.yml`. This is required in order for the package to accurately search for macros within the `dbt-labs/spark_utils` then the `dbt-labs/dbt_utils` packages respectively.
```yml
dispatch:
- macro_namespace: dbt_utils
search_order: ['spark_utils', 'dbt_utils']
```
### Database Incremental Strategies
Some end models in this package are materialized incrementally. We currently use the `merge` as the default strategy for **BigQuery**, **Snowflake**, and **Databricks** databases. For **Redshift** and **Postgres** databases, we use `delete+insert` as the default strategy.

`merge` is our current incremental strategy as it handles duplicates well and automatically handles insertions, updates, and deletions. We recognize there are some limitations with this strategy and are assessing using a different strategy in the future.

When `merge` is not available in a warehouse, `delete+insert` handles incremental loads well that do not contain changes to past records. However, if a past record has been updated and is outside of the incremental window, `delete+insert` will insert a duplicate record. 😱

> Because of this, we highly recommend that **Redshift** and **Postgres** users periodically run a `--full-refresh` to ensure a high level of data quality and remove any possible duplicates.

## Step 2: Install the package
Include the following mixpanel package version in your `packages.yml` file:
> TIP: Check [dbt Hub](https://hub.getdbt.com/) for the latest installation instructions or [read the dbt docs](https://docs.getdbt.com/docs/package-management) for more information on installing packages.
Expand All @@ -50,14 +67,6 @@ packages:
version: [">=0.8.0", "<0.9.0"] # we recommend using ranges to capture non-breaking changes automatically
```

### Databricks dispatch configuration
If you are using a Databricks destination with this package, you must add the following (or a variation of the following) dispatch configuration within your `dbt_project.yml`. This is required in order for the package to accurately search for macros within the `dbt-labs/spark_utils` then the `dbt-labs/dbt_utils` packages respectively.
```yml
dispatch:
- macro_namespace: dbt_utils
search_order: ['spark_utils', 'dbt_utils']
```

## Step 3: Define database and schema variables
By default, this package runs using your destination and the `mixpanel` schema. If this is not where your Mixpanel data is (for example, if your Mixpanel schema is named `mixpanel_fivetran`), add the following configuration to your root `dbt_project.yml` file:

Expand Down

0 comments on commit dd8e7c3

Please sign in to comment.