Skip to content

Latest commit

 

History

History
98 lines (74 loc) · 3.35 KB

README.md

File metadata and controls

98 lines (74 loc) · 3.35 KB

Dynamic task mapping for dbt

A repo to demonstrate how to use dynamic task mapping in Apache Airflow to orchestrate dbt. This example uses BigQuery and LocalExecutor.

Local Setup

  1. This repo uses Poetry for package management, ensure Poetry is installed on your system.

  2. Authenticate to GCP with gcloud:

    gcloud auth application-default login
  3. Create a virtual environment and install required dependencies:

    poetry shell
    poetry install
    ```
    
  4. Set the required environmental variables:

    export AIRFLOW_HOME="$(pwd)/dev"
    export AIRFLOW__CORE__DAGS_FOLDER="$(pwd)/dags"
    export PYTHONPATH="${PYTHONPATH}:$(pwd)/plugins"
    export AIRFLOW__CORE__PLUGINS_FOLDER="$(pwd)/plugins"
    export AIRFLOW__CORE__LOAD_EXAMPLES="False"
    
    export DBT_PROFILES_DIR="$(pwd)/dbt_project"
    export DBT_PROJECT_DIR="$(pwd)/dbt_project"
    export DBT_SCHEMA="<YOUR_SCHEMA>"
    export GOOGLE_PROJECT_NAME="<YOUR_GCP_PROJECT>"
  5. Build the Docker image used for dbt:

    docker build --build-arg="DBT_SCHEMA=$DBT_SCHEMA" --build-arg="GOOGLE_PROJECT_NAME=$GOOGLE_PROJECT_NAME" --tag dbt-base:latest -f ./Dockerfile .
  6. Test that the Docker image successfully builds dbt models:

    docker run --volume ~/.config/gcloud:/root/.config/gcloud -e DBT_SCHEMA=$DBT_SCHEMA -e GOOGLE_PROJECT_NAME=$GOOGLE_PROJECT_NAME -it --rm dbt-base:latest poetry run dbt build
  7. Tag and push the Docker image:

    docker tag dbt-base:latest pgoslatara/dynamic-tasks:dbt-base
    docker push pgoslatara/dynamic-tasks:dbt-base
  8. Install Minikube, steps here.

  9. Start a local Minikube cluster:

    minikube start --driver=docker
  10. Enable the gcp-auth addon:

    minikube addons enable gcp-auth
  11. [Optional] Enable all metrics and start the Minikube dashboard:

    minikube addons enable metrics-server
    minikube dashboard
  12. Set the required Airflow variable:

    airflow db init
    airflow variables set DBT_PROJECT_DIR $(pwd)/dbt_project
  13. Run dbt to generate the required manifest.json file:

    dbt build
  14. Ensure your virtual environment is activated, start Airflow:

    airflow standalone
  15. View the DAGs in the web UI at http://localhost:8080: username: admin, password can be found in ./standalone_admin_password.txt.

To re-build your local dev environment, delete all files in ./dev and repeat the above steps.

If you experience any issues with Minikube containers not authenticating to gcloud you can refresh the credentials:

minikube addons enable gcp-auth --refresh

Talks

This repo accompanies the following talk: