A repo to demonstrate how to use dynamic task mapping in Apache Airflow to orchestrate dbt. This example uses BigQuery and LocalExecutor.
-
This repo uses Poetry for package management, ensure Poetry is installed on your system.
-
Authenticate to GCP with
gcloud
:gcloud auth application-default login
-
Create a virtual environment and install required dependencies:
poetry shell poetry install ```
-
Set the required environmental variables:
export AIRFLOW_HOME="$(pwd)/dev" export AIRFLOW__CORE__DAGS_FOLDER="$(pwd)/dags" export PYTHONPATH="${PYTHONPATH}:$(pwd)/plugins" export AIRFLOW__CORE__PLUGINS_FOLDER="$(pwd)/plugins" export AIRFLOW__CORE__LOAD_EXAMPLES="False" export DBT_PROFILES_DIR="$(pwd)/dbt_project" export DBT_PROJECT_DIR="$(pwd)/dbt_project" export DBT_SCHEMA="<YOUR_SCHEMA>" export GOOGLE_PROJECT_NAME="<YOUR_GCP_PROJECT>"
-
Build the Docker image used for dbt:
docker build --build-arg="DBT_SCHEMA=$DBT_SCHEMA" --build-arg="GOOGLE_PROJECT_NAME=$GOOGLE_PROJECT_NAME" --tag dbt-base:latest -f ./Dockerfile .
-
Test that the Docker image successfully builds dbt models:
docker run --volume ~/.config/gcloud:/root/.config/gcloud -e DBT_SCHEMA=$DBT_SCHEMA -e GOOGLE_PROJECT_NAME=$GOOGLE_PROJECT_NAME -it --rm dbt-base:latest poetry run dbt build
-
Tag and push the Docker image:
docker tag dbt-base:latest pgoslatara/dynamic-tasks:dbt-base docker push pgoslatara/dynamic-tasks:dbt-base
-
Install Minikube, steps here.
-
Start a local Minikube cluster:
minikube start --driver=docker
-
Enable the
gcp-auth
addon:minikube addons enable gcp-auth
-
[Optional] Enable all metrics and start the Minikube dashboard:
minikube addons enable metrics-server minikube dashboard
-
Set the required Airflow variable:
airflow db init airflow variables set DBT_PROJECT_DIR $(pwd)/dbt_project
-
Run dbt to generate the required
manifest.json
file:dbt build
-
Ensure your virtual environment is activated, start Airflow:
airflow standalone
-
View the DAGs in the web UI at http://localhost:8080: username:
admin
, password can be found in./standalone_admin_password.txt
.
To re-build your local dev environment, delete all files in ./dev
and repeat the above steps.
If you experience any issues with Minikube containers not authenticating to gcloud
you can refresh the credentials:
minikube addons enable gcp-auth --refresh
This repo accompanies the following talk:
- Using Dynamic Task Mapping to Orchestrate dbt, slides available here, recording available here.