The aim of the project is to analyse the movies data from multiple sources such as IMDB MoviesLens, The Numbers and BoxOffice Mojo.com based on movies/cast/box office revenues, movie brands and franchises and perform ETL processes using Talend.
ER/ Studio SQL server Developer Edition Microsoft SQL server Management Studio Talend Real-Time Data Platform 7.1 Tableau Desktop Microsoft PowerBI
https://datasets.imdbws.com/ https://www.boxofficemojo.com/franchise/?ref_=bo_nb_fr_secondarytab https://www.boxofficemojo.com/brand/?ref_=bo_nb_frs_secondarytab https://grouplens.org/datasets/movielens/25m/ https://www.the-numbers.com/movies/franchises https://www.the-numbers.com/movies/franchise/Marvel-Cinematic-Universe#tab=summary https://www.the-numbers.com/movie/Avengers-The-(2012)#tab=box-office
Run following script in SSMS to setup the staging database
The Number - stage tables.sql
stg imdb tables - core tables.sql
stg imdb tables expanded part 2.sql
stg_ml_tables.sql
Open Talend and setup your database connections and input file connections
When the connections are successfull run the main job
Perform Visualizations in Tableau and PowerBI