Skip to content

Create your own SQL Server database from the IMDB dataset

Notifications You must be signed in to change notification settings

sqlsunday/imdb-to-sqlserver

Repository files navigation

Create your own SQL Server database from the IMDB dataset

Important licensing information

IMDB makes this dataset available with a number of licensing terms. The long and short of it is, you can only use the data for your own, non-commercial use.

Source and licensing: https://developer.imdb.com/non-commercial-datasets/

Data files: https://datasets.imdbws.com/

How to download and create the IMDB database

You'll need SQL Server 2022 and up to 20 GB of free disk space to do this.

If you want to run it on a lower version, you'll have to rebuild/replace the STRING_SPLIT function calls in the last script.

  • Create a new SQL Server database.
    • I've called it IMDB, but you can name it whatever you want.
    • I strongly recommend setting the database to simple recovery model.
  • Create the database schema by running Create IMDB-schema.sql in the database.
  • Download and import the IMDB dataset using the Load-ImdbStaging.ps1 script. The script will download the files to your current working directory and perform the ingest from there. The download is just a few GB, but inserting the data may take up to a few hours depending on your environment.
  • Populate the relational tables using the Load IMDB relational tables.sql script.
  • If you want, you can now drop the staging tables in the Raw schema.

Approximate compressed table sizes

  • dbo.Attributes: 171 rows
  • dbo.Episodes: 8M rows, 95 MB
  • dbo.Genres: 28 rows
  • dbo.Professions: 48 rows
  • dbo.PrimaryProfessions: 14M rows, 160 MB
  • dbo.Principals: 13M rows, 350 MB
  • dbo.Titles: 10M rows, 255 MB
  • dbo.TitleCharacters: 5.3M rows, 96 MB
  • dbo.TitleGenres: 16M rows, 173 MB
  • dbo.TitleNames: 37M rows, 1.1 GB
  • dbo.TitleNameAttributes: 500k rows, 9 MB
  • dbo.TitlePrincipals: 28M rows, 460 MB
  • dbo.TitleTypes: 12 rows

About

Create your own SQL Server database from the IMDB dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published