Recce
is data validation toolkit for pull request (PR) review in dbt projects. Get enhanced visibility into how your team’s dbt modeling changes impact data by comparing your dev branch with stable production data. Run manual data checks during development, and automate checks in CI for PR review.
Get up and running quickly by prepping your dev and prod environments. The key is building prod into the target-base
folder to use as the base for the data comparison.
# Build prod and generate dbt docs into ./target-base
dbt seed --target prod
dbt run --target prod
dbt docs generate --target prod --target-path ./target-base
# Switch to your dev branch
git switch my-awesome-branch
# build your dev environment
dbt seed
dbt run
dbt docs generate
# Start a Recce Instance
recce server
Follow our 5-minute Jaffle Shop tutorial to try it out for yourself.
recce server
launches a web UI that shows you the area of your lineage that is impacted by the branch changes.
- Select nodes in the lineage to perform Checks (diffs) as part of your impact assessment during development or PR review.
- Add Checks to your Checklist to note observed impacts.
- Share your Checklist with the PR reviewer.
- (
Recce Cloud
) Automatically sync Check status between Recce Instances - (
Recce Cloud
) Block PR merging until all Recce Checks have been approved
Read more about using Recce for Impact Assessment on the Recce blog.
We provide three online Recce demos (based on Jaffle Shop), each is related to a specific pull request. Use these demos to inspect the data impact caused by the modeling changes in the PR.
For each demo, review the following:
- The pull request comment
- The code changes
- How the lineage and data has changed in
Recce
This will enable you to validate if the intention of the PR has been successfully implemented without unintended impact.
Tip
Don't forget to click the Checks tab to view the Recce Checklist, and perform your own Checks for further investigation.
This pull request adjusts the logic for how customer lifetime value is calculated:
This pull request performs some refactoring on the customers model by turning two CTEs into intermediate models, enhancing readability and maintainability:
This pull request introduces a new Rounding Effect Analysis feature, aimed at analyzing and reporting the impacts of rounding in our data processing.
dbt has brought many software best practices to data projects, such as:
- Version controlled code
- Modular SQL
- Reproducible pipelines
Even so, 'bad merges' still happen and erroneous data and silent errors make their way into prod data. As self-serve analytics opens dbt projects to many roles, and the size of dbt projects increase, the job of reviewing data modeling changes is even more critical.
The only way to understand the impact of code changes on data is to compare the data before-and-after the changes.
Recce
provides a data review environment for data teams to check their work during development, and then again as part of PR review. The suite of tools and diffs in Recce are specifically geared towards surfacing, understanding, and recording data impact from code changes.
Lineage Diff is the main interface to Recce
and shows which nodes in the lineage have been added, removed, or modified.
- Schema Diff - Show the struture of the table including added or removed columns
- Row Count Diff - Compares the row count for tables
Advanced Diffs provide high level statistics about data change:
- Profile Diff: Compares stats such as count, distinct count, min, max, average.
- Value Diff: The matched count and percentage for each column in the table.
- Top-K Diff: Compares the distribution of a categorical column.
- Histogram Diff: Compares the distribution of a numeric column in an overlay histogram chart.
Query Diff compares the results of any ad-hoc query, and supports the use of dbt macros.
The checklist provides a way to record the results of your data validation process.
- Save the results of checks
- Re-run checks
- Annotate checks to add context
- Share the results of checks
- (
Recce Cloud
) Sync checks and check results across Recce instances - (
Recce Cloud
) Block PR merging until checks have been approved
Recce
is useful for validating your own work or the work of others, and can also be used to share data impact with non-technical stakeholders to approve data checks.
- Data engineers can use
Recce
to ensure the structural integrity of the data and understand the scope of impact before merging. - Analysts can use
Recce
to self-review and understand how data modeling changes have changed the data. - Stakeholders can use
Recce
to sign-off on data after updates have been made
The Recce Documentation covers everything you need to get started.
We’d advise first following the 5-minute tutorial that uses Jaffle Shop and then trying out Recce in your own project.
For advice on best practices in preparing dbt environments to enable effective PR review, check out Best Practices for Preparing Environments.
Recce Cloud
provides a backbone of supporting services that make Recce usage more suitable for teams reviewing multiple pull requests.
With Recce Cloud
:
Recce
Instances can be launched directly from a PR- Checks are automatically synced across
Recce
Instances - Blocked merging until all checks are approved
Recce Cloud is currently in early-access private beta.
To find out how you can get access please book an appointment for a short meeting.
Recce
consists of a local server application that you run on your own device or compute services.
- Diffs or queries that are performed by
Recce
happen either in your data warehouse, or in the browser itself. Recce
does not store your data.
For Recce Cloud
users:
- An encrypted version of your
Recce
state file is storedonRecce Cloud
. This file is encrypted before transmission.
Here's where you can get in touch with the Recce
team and find support:
- dbt Slack in the #tools-recce channel
- Recce Discord
- Email us [email protected]
If you believe you have found a bug, or there is some missing functionality in Recce, please open a GitHub Issue.
You can follow along with news about Recce
and blogs from our team in the following places:
- DataRecce.io
- Medium blog
- @datarecce on Twitter/X
- @[email protected] on Mastodon
- @datarecce.bsky.social on BlueSky