Skip to content

y656/Movie-Recommender-Systems

Repository files navigation

Movie-Recommender-Systems

This repository contain different recommender systems applied on the famous MovieLens 20M dataset.
It contains three code files:

  1. Demographic Filtering
  2. Content Based Filtering
  3. Collaborative Filtering

Demographic Filtering

In the notebook I have applied weighted IMDB rating and recalculated scores of each movie and found the top scored movies.

The results are as follows:

Movie id Title Vote_Count Avg_Rating Genres Score
69 Shawshank Redemption, The (1994) 63366 4.446990 Crime/Drama 4.438219
843 Godfather, The (1972) 41355 4.364732 Crime/Drama 4.352553
49 Usual Suspects, The (1995) 47006 4.334372 Crime/Mystery/Thriller 4.324027
523 Schindler's List (1993) 50054 4.310175 Drama/War 4.300743
1195 Godfather: Part II, The (1974) 27398 4.275641 Crime/Drama 4.259330
887 Rear Window (1954) 17449 4.271334 Mystery/Thriller 4.246182
895 Casablanca (1942) 24349 4.258327 Drama/Romance 4.240446
1935 Seven Samurai (Shichinin no samurai) (1954) 11611 4.274180 Action/Adventure/Drama 4.236870
1169 One Flew Over the Cuckoo's Nest (1975) 29932 4.248079 Drama 4.233671
737 Dr. Strangelove or: How I Learned to Stop Worr... 23220 4.247287 Comedy/War 4.228841

The top most scored movie is Shawshank Redemption, The (1994) with a nice score of 4.43


Content Based Filtering

Now let us apply Content based filtering to Shawshank Redemption, The (1994) and find similar movies to it

The results for Content based filtering are:

  1. Casino(1995)

  2. Shanghai Triad (Yao a yao yao dao waipo qiao) ...

  3. Dead Man Walking (1995)

  4. Hate (Haine, La) (1995)

  5. Young Poisoner's Handbook, The (1995)

  6. Glass Shield, The (1994)

  7. Heavenly Creatures (1994)

  8. Little Odessa (1994)

  9. New Jersey Drive (1995)

  10. Once Were Warriors (1994)



The above systems are not personal hence any user using them will receive the same recommendations


Collaborative Filtering

Let us improvise it by using Collaborative Filtering. Lets say we want to recommend the user-19 some movies:


Top 10 Recommendations for UserId 19 are:
  1. Independence Day (a.k.a. ID4) (1996)

  2. Toy Story (1995)

  3. Twister (1996)

  4. Rock, The (1996)

  5. Mission: Impossible (1996)

  6. Willy Wonka & the Chocolate Factory (1971)

  7. Fargo (1996)

  8. Mr. Holland's Opus (1995)

  9. Broken Arrow (1996)

  10. Birdcage, The (1996)

These are generated by our model and now lets compare these with the movies which user-19 rated the best


Rating Title Genres
5.0 Birdcage, The (1996) Comedy
5.0 Fargo (1996) Comedy/Crime/Drama/Thriller
5.0 Sabrina (1995) Comedy/Romance
5.0 Eddie (1996) Comedy
5.0 Celtic Pride (1996) Comedy
5.0 White Squall (1996) Action/Adventure/Drama
5.0 Rumble in the Bronx (Hont faan kui) (1995) Action/Adventure/Comedy/Crime
5.0 Heat (1995) Action/Crime/Thriller
5.0 Toy Story (1995) Adventure
5.0 Mr. Holland's Opus (1995) Drama

Do not worry if the predicted top movies are not same as the user's top recommendations. Here, we completed the utility matrix and predicted which movies have got the highest rating. Based on a particular user, if the user has rated other movies which he did not rate previously, then we understand the user and estimate the ratings he will give to new ones. May be some new movies which the user did not rate has got good ratings, hence we recommend those.

Hence this system performs good at personalizing recommendations and understanding the user better.

Kudos, finally we are able to create a personalized system for users in our dataset.