A hands-on dbt project for learning data pipeline testing and monitoring using the complete Chinook database with real production-like data.
This repository accompanies Module 4: Data Pipeline Testing and Monitoring and contains 8 hands-on labs across 4 decks:
| Deck | Topic | Labs |
|---|---|---|
| 1 | Introduction to Testing Tools | Lab 1.1, Lab 1.2 |
| 2 | Testing Data Pipelines | Lab 2.1, Lab 2.2 |
| 3 | Writing Unit and Integration Tests | Lab 3.1, Lab 3.2 |
| 4 | Monitoring and Maintenance | Lab 4.1, Lab 4.2 |
- ✅ Connect dbt to real data in BigQuery
- ✅ Write schema tests (unique, not_null, accepted_values, relationships)
- ✅ Create singular tests for complex business logic
- ✅ Build custom generic tests (reusable macros)
- ✅ Implement unit tests for transformation logic
- ✅ Set up integration tests across models
- ✅ Configure monitoring and alerts
- ✅ Debug pipeline failures systematically
The Chinook database represents a digital music store (like iTunes) with real, complete data:
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Artist │────▶│ Album │────▶│ Track │
│ 275 records │ │ 347 records │ │ 3,503 records │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Customer │────▶│ Invoice │────▶│ InvoiceLine │
│ 59 records │ │ 412 records │ │ 2,240 records │
└─────────────────┘ └─────────────────┘ └─────────────────┘
Total: 15,000+ records across 11 tables - This is real-world scale data!
| Table | Records | Description |
|---|---|---|
| Artist | 275 | Music artists and bands |
| Album | 347 | Albums linked to artists |
| Track | 3,503 | Individual songs with pricing |
| Genre | 25 | Music genres |
| MediaType | 5 | File format types |
| Customer | 59 | Customer information |
| Employee | 8 | Sales representatives |
| Invoice | 412 | Purchase transactions |
| InvoiceLine | 2,240 | Line items per invoice |
| Playlist | 18 | Music playlists |
| PlaylistTrack | 8,715 | Playlist-track associations |
Before starting, ensure you have:
- Google Cloud account with a GCP project
- BigQuery API enabled in your project
- Python 3.8+ installed
- Google Cloud SDK installed (
gcloud)
git clone https://github.com/your-org/chinook-dbt-testing-labs.git
cd chinook-dbt-testing-labs# Create virtual environment
python -m venv venv
# Activate it
# On Mac/Linux:
source venv/bin/activate
# On Windows:
venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt# Login to Google Cloud
gcloud auth login
# Set your project
gcloud config set project YOUR_PROJECT_ID
# Create application default credentials
gcloud auth application-default loginThis is the key step! Run our data loader to populate your BigQuery project with the complete Chinook database:
python scripts/load_chinook_to_bigquery.py --project YOUR_PROJECT_IDThis script will:
- Download the official Chinook database
- Create a
chinook_rawdataset in your BigQuery project - Load all 11 tables with complete data
- Verify the load was successful
Expected output:
🎵 Chinook Database Loader for BigQuery
==========================================
📥 Downloading Chinook database...
✅ Downloaded
📊 Loading tables to BigQuery...
✅ Artist: 275 rows loaded
✅ Album: 347 rows loaded
✅ Track: 3,503 rows loaded
...
🎉 SUCCESS! Chinook database loaded to BigQuery
Create or edit ~/.dbt/profiles.yml:
chinook_testing:
target: dev
outputs:
dev:
type: bigquery
method: oauth
project: YOUR_PROJECT_ID # <-- Your GCP project ID
dataset: chinook_dev # dbt creates this for transformed models
location: US
threads: 4
timeout_seconds: 300# Test the connection
dbt debug
# Install dbt packages
dbt deps
# Build all models
dbt run
# Run all tests
dbt testYou should see:
Completed successfully
Done. PASS=X WARN=0 ERROR=0
🎉 You're ready to start the labs!
chinook-dbt-testing-labs/
├── README.md # This file
├── dbt_project.yml # dbt project configuration
├── packages.yml # dbt package dependencies
├── requirements.txt # Python dependencies
│
├── scripts/
│ ├── load_chinook_to_bigquery.py # ⬅️ Run this first!
│ ├── setup.sh # Quick setup script
│ └── run_tests_with_retry.sh # Test runner with retries
│
├── models/
│ ├── staging/ # Source → Staging transformations
│ │ ├── _stg_chinook.yml # Source definitions & tests
│ │ ├── stg_artists.sql
│ │ ├── stg_albums.sql
│ │ ├── stg_tracks.sql
│ │ ├── stg_genres.sql
│ │ ├── stg_media_types.sql
│ │ ├── stg_customers.sql
│ │ ├── stg_employees.sql
│ │ ├── stg_invoices.sql
│ │ └── stg_invoice_lines.sql
│ │
│ ├── intermediate/ # Business logic transformations
│ │ ├── _int_chinook.yml
│ │ ├── int_tracks_enriched.sql
│ │ └── int_invoice_totals.sql
│ │
│ ├── marts/ # Analytics-ready models
│ │ ├── _marts_chinook.yml
│ │ ├── dim_customers.sql
│ │ ├── dim_tracks.sql
│ │ └── fct_sales.sql
│ │
│ └── unit_tests/ # Unit test models
│ └── ...
│
├── tests/ # Test SQL files
│ ├── assert_*.sql # Singular tests
│ ├── integration/ # Integration tests
│ ├── monitoring/ # Monitoring tests
│ └── unit_tests/ # Unit test assertions
│
├── macros/tests/ # Custom generic tests
│ ├── test_is_positive.sql
│ ├── test_valid_email.sql
│ └── test_within_range.sql
│
├── seeds/ # Test fixtures only (not source data)
│ ├── test_tracks_input.csv
│ └── test_tracks_expected.csv
│
├── labs/ # Lab instructions
│ ├── deck1/
│ │ ├── LAB_1_1_explore_chinook.md
│ │ └── LAB_1_2_first_tests.md
│ ├── deck2/
│ │ ├── LAB_2_1_schema_tests.md
│ │ └── LAB_2_2_singular_tests.md
│ ├── deck3/
│ │ ├── LAB_3_1_unit_tests.md
│ │ └── LAB_3_2_integration_tests.md
│ └── deck4/
│ ├── LAB_4_1_monitoring_setup.md
│ └── LAB_4_2_debugging_failures.md
│
└── analyses/ # Ad-hoc debug queries
└── debug_invoice_issues.sql
- Lab 1.1: Explore Chinook & Build Your First Models
- Lab 1.2: Write Your First dbt Tests
- Lab 2.1: Master Schema Tests (unique, not_null, relationships)
- Lab 2.2: Create Singular Tests for Business Logic
- Lab 3.1: Build Unit Tests with Test Fixtures
- Lab 3.2: Implement Integration Tests Across Models
- Lab 4.1: Set Up Freshness Monitoring & Alerts
- Lab 4.2: Debug Pipeline Failures Systematically
# Load data to BigQuery (run once)
python scripts/load_chinook_to_bigquery.py --project YOUR_PROJECT_ID
# Build all models
dbt run
# Run all tests
dbt test
# Build and test together
dbt build
# Run specific model
dbt run --select stg_customers
# Test specific model
dbt test --select stg_customers
# Run only schema tests
dbt test --select test_type:schema
# Run only singular tests
dbt test --select test_type:singular
# Generate and view documentation
dbt docs generate && dbt docs serve# Make sure you're authenticated
gcloud auth application-default loginThe loader creates chinook_raw dataset. Make sure you ran:
python scripts/load_chinook_to_bigquery.py --project YOUR_PROJECT_IDCheck that your profile points to the correct project where you loaded the data.
# Store failures for investigation
dbt test --store-failures
# Then query the failure table in BigQueryThis project is for educational purposes as part of the CBF Data Engineering curriculum.
Happy Testing! 🎉