🎵 Chinook Music Store - dbt Testing & Monitoring Labs

A hands-on dbt project for learning data pipeline testing and monitoring using the complete Chinook database with real production-like data.

📋 Course Overview

This repository accompanies Module 4: Data Pipeline Testing and Monitoring and contains 8 hands-on labs across 4 decks:

Deck	Topic	Labs
1	Introduction to Testing Tools	Lab 1.1, Lab 1.2
2	Testing Data Pipelines	Lab 2.1, Lab 2.2
3	Writing Unit and Integration Tests	Lab 3.1, Lab 3.2
4	Monitoring and Maintenance	Lab 4.1, Lab 4.2

🎯 What You'll Learn

✅ Connect dbt to real data in BigQuery
✅ Write schema tests (unique, not_null, accepted_values, relationships)
✅ Create singular tests for complex business logic
✅ Build custom generic tests (reusable macros)
✅ Implement unit tests for transformation logic
✅ Set up integration tests across models
✅ Configure monitoring and alerts
✅ Debug pipeline failures systematically

🗄️ The Chinook Database - Real Production Data

The Chinook database represents a digital music store (like iTunes) with real, complete data:

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│     Artist      │────▶│      Album      │────▶│      Track      │
│   275 records   │     │   347 records   │     │  3,503 records  │
└─────────────────┘     └─────────────────┘     └─────────────────┘
                                                       │
┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│    Customer     │────▶│     Invoice     │────▶│   InvoiceLine   │
│    59 records   │     │   412 records   │     │  2,240 records  │
└─────────────────┘     └─────────────────┘     └─────────────────┘

Total: 15,000+ records across 11 tables - This is real-world scale data!

Tables Included:

Table	Records	Description
Artist	275	Music artists and bands
Album	347	Albums linked to artists
Track	3,503	Individual songs with pricing
Genre	25	Music genres
MediaType	5	File format types
Customer	59	Customer information
Employee	8	Sales representatives
Invoice	412	Purchase transactions
InvoiceLine	2,240	Line items per invoice
Playlist	18	Music playlists
PlaylistTrack	8,715	Playlist-track associations

🚀 Getting Started

Prerequisites

Before starting, ensure you have:

Google Cloud account with a GCP project
BigQuery API enabled in your project
Python 3.8+ installed
Google Cloud SDK installed (gcloud)

Step 1: Clone the Repository

git clone https://github.com/your-org/chinook-dbt-testing-labs.git
cd chinook-dbt-testing-labs

Step 2: Set Up Python Environment

# Create virtual environment
python -m venv venv

# Activate it
# On Mac/Linux:
source venv/bin/activate
# On Windows:
venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Step 3: Authenticate with Google Cloud

# Login to Google Cloud
gcloud auth login

# Set your project
gcloud config set project YOUR_PROJECT_ID

# Create application default credentials
gcloud auth application-default login

Step 4: Load Chinook Data to BigQuery

This is the key step! Run our data loader to populate your BigQuery project with the complete Chinook database:

python scripts/load_chinook_to_bigquery.py --project YOUR_PROJECT_ID

This script will:

Download the official Chinook database
Create a chinook_raw dataset in your BigQuery project
Load all 11 tables with complete data
Verify the load was successful

Expected output:

🎵 Chinook Database Loader for BigQuery
==========================================
📥 Downloading Chinook database...
   ✅ Downloaded

📊 Loading tables to BigQuery...
   ✅ Artist: 275 rows loaded
   ✅ Album: 347 rows loaded
   ✅ Track: 3,503 rows loaded
   ...

🎉 SUCCESS! Chinook database loaded to BigQuery

Step 5: Configure dbt Profile

Create or edit ~/.dbt/profiles.yml:

chinook_testing:
  target: dev
  outputs:
    dev:
      type: bigquery
      method: oauth
      project: YOUR_PROJECT_ID      # <-- Your GCP project ID
      dataset: chinook_dev          # dbt creates this for transformed models
      location: US
      threads: 4
      timeout_seconds: 300

Step 6: Verify Everything Works

# Test the connection
dbt debug

# Install dbt packages
dbt deps

# Build all models
dbt run

# Run all tests
dbt test

You should see:

Completed successfully
Done. PASS=X WARN=0 ERROR=0

🎉 You're ready to start the labs!

📁 Project Structure

chinook-dbt-testing-labs/
├── README.md                    # This file
├── dbt_project.yml              # dbt project configuration
├── packages.yml                 # dbt package dependencies
├── requirements.txt             # Python dependencies
│
├── scripts/
│   ├── load_chinook_to_bigquery.py  # ⬅️ Run this first!
│   ├── setup.sh                 # Quick setup script
│   └── run_tests_with_retry.sh  # Test runner with retries
│
├── models/
│   ├── staging/                 # Source → Staging transformations
│   │   ├── _stg_chinook.yml     # Source definitions & tests
│   │   ├── stg_artists.sql
│   │   ├── stg_albums.sql
│   │   ├── stg_tracks.sql
│   │   ├── stg_genres.sql
│   │   ├── stg_media_types.sql
│   │   ├── stg_customers.sql
│   │   ├── stg_employees.sql
│   │   ├── stg_invoices.sql
│   │   └── stg_invoice_lines.sql
│   │
│   ├── intermediate/            # Business logic transformations
│   │   ├── _int_chinook.yml
│   │   ├── int_tracks_enriched.sql
│   │   └── int_invoice_totals.sql
│   │
│   ├── marts/                   # Analytics-ready models
│   │   ├── _marts_chinook.yml
│   │   ├── dim_customers.sql
│   │   ├── dim_tracks.sql
│   │   └── fct_sales.sql
│   │
│   └── unit_tests/              # Unit test models
│       └── ...
│
├── tests/                       # Test SQL files
│   ├── assert_*.sql             # Singular tests
│   ├── integration/             # Integration tests
│   ├── monitoring/              # Monitoring tests
│   └── unit_tests/              # Unit test assertions
│
├── macros/tests/                # Custom generic tests
│   ├── test_is_positive.sql
│   ├── test_valid_email.sql
│   └── test_within_range.sql
│
├── seeds/                       # Test fixtures only (not source data)
│   ├── test_tracks_input.csv
│   └── test_tracks_expected.csv
│
├── labs/                        # Lab instructions
│   ├── deck1/
│   │   ├── LAB_1_1_explore_chinook.md
│   │   └── LAB_1_2_first_tests.md
│   ├── deck2/
│   │   ├── LAB_2_1_schema_tests.md
│   │   └── LAB_2_2_singular_tests.md
│   ├── deck3/
│   │   ├── LAB_3_1_unit_tests.md
│   │   └── LAB_3_2_integration_tests.md
│   └── deck4/
│       ├── LAB_4_1_monitoring_setup.md
│       └── LAB_4_2_debugging_failures.md
│
└── analyses/                    # Ad-hoc debug queries
    └── debug_invoice_issues.sql

🧪 Labs Overview

Deck 1: Introduction to Testing Tools

Lab 1.1: Explore Chinook & Build Your First Models
Lab 1.2: Write Your First dbt Tests

Deck 2: Testing Data Pipelines

Lab 2.1: Master Schema Tests (unique, not_null, relationships)
Lab 2.2: Create Singular Tests for Business Logic

Deck 3: Writing Unit and Integration Tests

Lab 3.1: Build Unit Tests with Test Fixtures
Lab 3.2: Implement Integration Tests Across Models

Deck 4: Monitoring and Maintenance

Lab 4.1: Set Up Freshness Monitoring & Alerts
Lab 4.2: Debug Pipeline Failures Systematically

🏃 Quick Commands

# Load data to BigQuery (run once)
python scripts/load_chinook_to_bigquery.py --project YOUR_PROJECT_ID

# Build all models
dbt run

# Run all tests
dbt test

# Build and test together
dbt build

# Run specific model
dbt run --select stg_customers

# Test specific model
dbt test --select stg_customers

# Run only schema tests
dbt test --select test_type:schema

# Run only singular tests
dbt test --select test_type:singular

# Generate and view documentation
dbt docs generate && dbt docs serve

🆘 Troubleshooting

"Permission Denied" when loading data

# Make sure you're authenticated
gcloud auth application-default login

"Dataset not found" error

The loader creates chinook_raw dataset. Make sure you ran:

python scripts/load_chinook_to_bigquery.py --project YOUR_PROJECT_ID

"Source not found" error in dbt

Check that your profile points to the correct project where you loaded the data.

Tests failing unexpectedly

# Store failures for investigation
dbt test --store-failures

# Then query the failure table in BigQuery

📚 Additional Resources

📝 License

This project is for educational purposes as part of the CBF Data Engineering curriculum.

Happy Testing! 🎉

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎵 Chinook Music Store - dbt Testing & Monitoring Labs

📋 Course Overview

🎯 What You'll Learn

🗄️ The Chinook Database - Real Production Data

Tables Included:

🚀 Getting Started

Prerequisites

Step 1: Clone the Repository

Step 2: Set Up Python Environment

Step 3: Authenticate with Google Cloud

Step 4: Load Chinook Data to BigQuery

Step 5: Configure dbt Profile

Step 6: Verify Everything Works

📁 Project Structure

🧪 Labs Overview

Deck 1: Introduction to Testing Tools

Deck 2: Testing Data Pipelines

Deck 3: Writing Unit and Integration Tests

Deck 4: Monitoring and Maintenance

🏃 Quick Commands

🆘 Troubleshooting

"Permission Denied" when loading data

"Dataset not found" error

"Source not found" error in dbt

Tests failing unexpectedly

📚 Additional Resources

📝 License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
analyses		analyses
dbt_packages		dbt_packages
labs		labs
logs		logs
macros/tests		macros/tests
models		models
scripts		scripts
seeds		seeds
target		target
tests		tests
README.md		README.md
dbt_project.yml		dbt_project.yml
package-lock.yml		package-lock.yml
packages.yml		packages.yml
profiles.yml.example		profiles.yml.example
requirements.txt		requirements.txt

cbfacademy/Data-Pipeline-Testing-and-Monitoring

Folders and files

Latest commit

History

Repository files navigation

🎵 Chinook Music Store - dbt Testing & Monitoring Labs

📋 Course Overview

🎯 What You'll Learn

🗄️ The Chinook Database - Real Production Data

Tables Included:

🚀 Getting Started

Prerequisites

Step 1: Clone the Repository

Step 2: Set Up Python Environment

Step 3: Authenticate with Google Cloud

Step 4: Load Chinook Data to BigQuery

Step 5: Configure dbt Profile

Step 6: Verify Everything Works

📁 Project Structure

🧪 Labs Overview

Deck 1: Introduction to Testing Tools

Deck 2: Testing Data Pipelines

Deck 3: Writing Unit and Integration Tests

Deck 4: Monitoring and Maintenance

🏃 Quick Commands

🆘 Troubleshooting

"Permission Denied" when loading data

"Dataset not found" error

"Source not found" error in dbt

Tests failing unexpectedly

📚 Additional Resources

📝 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages