Skip to content
This repository has been archived by the owner on Sep 27, 2022. It is now read-only.

Use of Apache Airflow to backfill and schedule the load and analysis of raw data into Redshift data warehouse

Notifications You must be signed in to change notification settings

idelfonsog2/data-scheduler-airflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Music data analyzer scheduler

This project showcases how to design and schedule a series of jobs/steps using Apache Airflow with the following purposes

  • Backfill data
  • Build a dimensional data model using python
  • load data from AWS S3 bucket to AWS Redshift Datawarehouse
  • run quality checks on the data
  • Use or create custom operators and available hooks to create reusable code

Running the DAG

You can run the DAG on your own machine using docker-compose. To use docker-compose, you must first install Docker. Once Docker is installed:

  1. Open a terminal in the same directory as docker-compose.yml an
  2. Run docker-compose up
  3. Wait 30-60 seconds
  4. Open http://localhost:8080 in Google Chrome (Other browsers occasionally have issues rendering the Airflow UI)
  5. Make sure you have configured the aws_credentials and redshift connections in the Airflow UI

When you are ready to quit Airflow, hit ctrl+c in the terminal where docker-compose is running. Then, type docker-compose down

img

About

Use of Apache Airflow to backfill and schedule the load and analysis of raw data into Redshift data warehouse

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages