Skip to content

xoniks/airflow-dynamic-dags

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dynamic DAGs in Apache Airflow

Stop creating 30 separate DAGs for similar tasks. This project demonstrates two fundamental patterns that solve the DAG proliferation problem in Apache Airflow.

🎯 The Problem

Growing data teams often end up with dozens of nearly-identical DAGs:

  • marketing_dag.py
  • sales_dag.py
  • operations_dag.py
  • campaign_summer_dag.py
  • ...30 more similar files

Result: Maintenance nightmare, code duplication, testing complexity.

✅ The Solution

Two dynamic patterns that handle 95% of use cases:

Pattern 1: Manual Execution with Dropdown

  • Use Case: Ad-hoc processing, debugging, one-off tasks
  • UI: Dynamic dropdown populated from config files
  • Trigger: Manual only

Pattern 2: Scheduled Multi-Config Processing

  • Use Case: Regular production workloads
  • UI: Checkbox selection for multiple configs
  • Trigger: Scheduled with parameter-driven execution

🚀 Quick Test (3 Steps)

# 1. Set permissions
echo "AIRFLOW_UID=$(id -u)" > .env  # Linux/Mac
# echo "AIRFLOW_UID=50000" > .env   # Windows

# 2. Start Airflow
docker-compose up -d

# 3. Test (wait 60 seconds)
# Open: http://localhost:8080
# Login: airflow / airflow

Test Both Patterns

Pattern 1 (config_processor):

  • Click DAG → "Trigger DAG w/ Config"
  • See dropdown with config files
  • Select marketing.yaml → realistic marketing metrics

Pattern 2 (scheduled_config_processor):

  • Click DAG → "Trigger DAG w/ Config"
  • See checkboxes for each config file
  • Select configs → see dynamic tasks with visual skip states

📁 Project Structure

airflow-dynamic-dag/
├── configs/                    # Your configuration files
│   ├── marketing.yaml         # Marketing campaign analytics
│   ├── operations.yaml        # Operations reporting  
│   └── sales.yaml            # Sales pipeline ETL
├── dags/
│   ├── config_processor_dag.py         # Pattern 1: Manual
│   ├── scheduled_multi_config_dag.py   # Pattern 2: Scheduled  
│   └── utils/                          # Shared utilities
│       ├── config_manager.py           # File discovery & YAML processing
│       ├── databricks_handler.py       # Job execution simulation
│       └── settings.py                 # Configuration
├── docker-compose.yaml        # Easy local testing
└── requirements.txt           # Simple: just PyYAML

🔥 Key Features

Dynamic UI Generation: Dropdowns/checkboxes populated from file discovery
Configuration-Driven: Same DAG, different behavior based on config
Realistic Simulations: Business-appropriate metrics (marketing, sales, ops)
Visual Feedback: Skip states show unselected configs in pink
Educational: Extensive documentation and learning objectives
Production-Ready Patterns: Clean, maintainable, scalable code

🎓 What You'll Learn

  • Dynamic parameter generation in Airflow
  • Configuration-driven pipeline design
  • File system scanning for UI population
  • Skip patterns and visual feedback
  • Modular DAG architecture
  • Real-world data engineering patterns

🔧 Configuration Examples

The project includes realistic business configurations:

  • Marketing: Campaign analytics with Google Ads, Facebook Ads, email metrics
  • Sales: Pipeline ETL with Salesforce, payment processing, lead scoring
  • Operations: Daily reporting with system monitoring, support metrics

Each configuration drives appropriate simulations and generates realistic business metrics.

📊 Expected Results

Marketing Processing:

🚀 SIMULATED DATA PROCESSING JOB
📊 Key Metrics:
   • campaigns_processed: 6
   • impressions_analyzed: 234,567
   • conversion_rate: 3.4%
   • total_spend: $18,500

Visual Task States:

  • ✅ Green: Selected and processed configs
  • ⏭️ Pink/Salmon: Skipped configs
  • 📊 Summary report with success rates

🏗️ Production Usage

For production deployment:

  1. Replace simulation with real jobs (Databricks, Spark, etc.)
  2. Store configs in S3/GCS instead of local filesystem
  3. Add proper error handling and retry logic
  4. Implement monitoring and alerting
  5. Use Airflow Connections for credentials

The patterns scale seamlessly from local development to enterprise production.

🎯 When to Use This

Perfect for:

  • Teams with many similar DAGs
  • Configuration-driven data processing
  • Department-specific analytics (marketing, sales, ops)
  • Batch processing with parameter variations
  • Educational/training environments

Not suitable for:

  • Completely different workflow structures
  • Real-time streaming processing
  • DAGs with fundamentally different scheduling needs

🤝 Contributing

This is an educational project. Feel free to:

  • Add more realistic configuration examples
  • Improve the simulation logic
  • Add new dynamic patterns
  • Enhance documentation

📖 Related Articles

Check out article.md for a comprehensive Medium article about these patterns, including:

  • Detailed problem analysis
  • Implementation deep-dive
  • Before/after comparisons
  • Production considerations
  • Complete code walkthrough

📝 License

MIT License - feel free to use these patterns in your own projects!

🙏 Acknowledgments

Inspired by real-world Airflow deployments and the need for maintainable, scalable DAG patterns in growing data teams.


Stop duplicating DAGs. Start thinking dynamically. 🚀

About

Dynamic DAGs in Apache Airflow

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages