Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
mansik95 authored May 5, 2020
1 parent 4a619c8 commit e76de69
Showing 1 changed file with 24 additions and 9 deletions.
33 changes: 24 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,26 @@
# IMDB-Analysis
This repository contains analysis of IMDB data from multiple sources and analysis of movies/cast/box office revenues, movie brands and franchises
Talend Real-Time Data Platform 7.1
## IMDB Data Analysis Pipeline

SQL server Developer Edition
# Objective:
The aim of the project is to analyse the movies data from multiple sources such as IMDB MoviesLens, The Numbers and BoxOffice Mojo.com based on movies/cast/box office revenues, movie brands and franchises and perform ETL processes using Talend.

# Technologies Used:
ER/ Studio
SQL server Developer Edition
Microsoft SQL server Management Studio

Tableau

PowerBI

Talend Real-Time Data Platform 7.1
Tableau Desktop
Microsoft PowerBI

# Dataset Links:
https://datasets.imdbws.com/
https://www.boxofficemojo.com/franchise/?ref_=bo_nb_fr_secondarytab
https://www.boxofficemojo.com/brand/?ref_=bo_nb_frs_secondarytab
https://grouplens.org/datasets/movielens/25m/
https://www.the-numbers.com/movies/franchises
https://www.the-numbers.com/movies/franchise/Marvel-Cinematic-Universe#tab=summary
https://www.the-numbers.com/movie/Avengers-The-(2012)#tab=box-office

# Code Walkthrough:
Run following script in SSMS to setup the staging database

The Number - stage tables.sql
Expand All @@ -23,3 +34,7 @@ stg_ml_tables.sql
Open Talend and setup your database connections and input file connections

When the connections are successfull run the main job

Perform Visualizations in Tableau and PowerBI


0 comments on commit e76de69

Please sign in to comment.