Stop creating 30 separate DAGs for similar tasks. This project demonstrates two fundamental patterns that solve the DAG proliferation problem in Apache Airflow.
Growing data teams often end up with dozens of nearly-identical DAGs:
marketing_dag.pysales_dag.pyoperations_dag.pycampaign_summer_dag.py- ...30 more similar files
Result: Maintenance nightmare, code duplication, testing complexity.
Two dynamic patterns that handle 95% of use cases:
- Use Case: Ad-hoc processing, debugging, one-off tasks
- UI: Dynamic dropdown populated from config files
- Trigger: Manual only
- Use Case: Regular production workloads
- UI: Checkbox selection for multiple configs
- Trigger: Scheduled with parameter-driven execution
# 1. Set permissions
echo "AIRFLOW_UID=$(id -u)" > .env # Linux/Mac
# echo "AIRFLOW_UID=50000" > .env # Windows
# 2. Start Airflow
docker-compose up -d
# 3. Test (wait 60 seconds)
# Open: http://localhost:8080
# Login: airflow / airflowPattern 1 (config_processor):
- Click DAG → "Trigger DAG w/ Config"
- See dropdown with config files
- Select
marketing.yaml→ realistic marketing metrics
Pattern 2 (scheduled_config_processor):
- Click DAG → "Trigger DAG w/ Config"
- See checkboxes for each config file
- Select configs → see dynamic tasks with visual skip states
airflow-dynamic-dag/
├── configs/ # Your configuration files
│ ├── marketing.yaml # Marketing campaign analytics
│ ├── operations.yaml # Operations reporting
│ └── sales.yaml # Sales pipeline ETL
├── dags/
│ ├── config_processor_dag.py # Pattern 1: Manual
│ ├── scheduled_multi_config_dag.py # Pattern 2: Scheduled
│ └── utils/ # Shared utilities
│ ├── config_manager.py # File discovery & YAML processing
│ ├── databricks_handler.py # Job execution simulation
│ └── settings.py # Configuration
├── docker-compose.yaml # Easy local testing
└── requirements.txt # Simple: just PyYAML
✅ Dynamic UI Generation: Dropdowns/checkboxes populated from file discovery
✅ Configuration-Driven: Same DAG, different behavior based on config
✅ Realistic Simulations: Business-appropriate metrics (marketing, sales, ops)
✅ Visual Feedback: Skip states show unselected configs in pink
✅ Educational: Extensive documentation and learning objectives
✅ Production-Ready Patterns: Clean, maintainable, scalable code
- Dynamic parameter generation in Airflow
- Configuration-driven pipeline design
- File system scanning for UI population
- Skip patterns and visual feedback
- Modular DAG architecture
- Real-world data engineering patterns
The project includes realistic business configurations:
- Marketing: Campaign analytics with Google Ads, Facebook Ads, email metrics
- Sales: Pipeline ETL with Salesforce, payment processing, lead scoring
- Operations: Daily reporting with system monitoring, support metrics
Each configuration drives appropriate simulations and generates realistic business metrics.
Marketing Processing:
🚀 SIMULATED DATA PROCESSING JOB
📊 Key Metrics:
• campaigns_processed: 6
• impressions_analyzed: 234,567
• conversion_rate: 3.4%
• total_spend: $18,500
Visual Task States:
- ✅ Green: Selected and processed configs
- ⏭️ Pink/Salmon: Skipped configs
- 📊 Summary report with success rates
For production deployment:
- Replace simulation with real jobs (Databricks, Spark, etc.)
- Store configs in S3/GCS instead of local filesystem
- Add proper error handling and retry logic
- Implement monitoring and alerting
- Use Airflow Connections for credentials
The patterns scale seamlessly from local development to enterprise production.
Perfect for:
- Teams with many similar DAGs
- Configuration-driven data processing
- Department-specific analytics (marketing, sales, ops)
- Batch processing with parameter variations
- Educational/training environments
Not suitable for:
- Completely different workflow structures
- Real-time streaming processing
- DAGs with fundamentally different scheduling needs
This is an educational project. Feel free to:
- Add more realistic configuration examples
- Improve the simulation logic
- Add new dynamic patterns
- Enhance documentation
Check out article.md for a comprehensive Medium article about these patterns, including:
- Detailed problem analysis
- Implementation deep-dive
- Before/after comparisons
- Production considerations
- Complete code walkthrough
MIT License - feel free to use these patterns in your own projects!
Inspired by real-world Airflow deployments and the need for maintainable, scalable DAG patterns in growing data teams.
Stop duplicating DAGs. Start thinking dynamically. 🚀