This is data modeling with Postgres project for Udacity Data Engineering Nanodegree. In this project I create a database schema in Postgres database and ETL pipeline that would load JSON files into the database using Python and SQL in order to facilitate the analysis of this data. This JSON files represent a user activity logs collected by a music streaming app of an imaginary startup Sparkify.
- create_tables.py: Drop previous schema and creates empty tables
- sql_queries.py: Defines all queries used in the ETL pipeline
- etl.py: Loads data from the JSON files into the tables
artists
: Artists in the music databasesongs
: Songs in the music databaseusers
: Users of the appsongplays
: Records of song plays in log filestime
: Timestamps of records
The code is Python in the form of scripts and in a Jupyter Notebook and it uses:
jupyter notebook etl.ipynb
python create_tables.py
python etl.py