This repository contain different recommender systems applied on the famous MovieLens 20M dataset.
It contains three code files:
- Demographic Filtering
- Content Based Filtering
- Collaborative Filtering
In the notebook I have applied weighted IMDB rating and recalculated scores of each movie and found the top scored movies.
The results are as follows:
Movie id | Title | Vote_Count | Avg_Rating | Genres | Score |
---|---|---|---|---|---|
69 | Shawshank Redemption, The (1994) | 63366 | 4.446990 | Crime/Drama | 4.438219 |
843 | Godfather, The (1972) | 41355 | 4.364732 | Crime/Drama | 4.352553 |
49 | Usual Suspects, The (1995) | 47006 | 4.334372 | Crime/Mystery/Thriller | 4.324027 |
523 | Schindler's List (1993) | 50054 | 4.310175 | Drama/War | 4.300743 |
1195 | Godfather: Part II, The (1974) | 27398 | 4.275641 | Crime/Drama | 4.259330 |
887 | Rear Window (1954) | 17449 | 4.271334 | Mystery/Thriller | 4.246182 |
895 | Casablanca (1942) | 24349 | 4.258327 | Drama/Romance | 4.240446 |
1935 | Seven Samurai (Shichinin no samurai) (1954) | 11611 | 4.274180 | Action/Adventure/Drama | 4.236870 |
1169 | One Flew Over the Cuckoo's Nest (1975) | 29932 | 4.248079 | Drama | 4.233671 |
737 | Dr. Strangelove or: How I Learned to Stop Worr... | 23220 | 4.247287 | Comedy/War | 4.228841 |
The top most scored movie is Shawshank Redemption, The (1994) with a nice score of 4.43
Now let us apply Content based filtering to Shawshank Redemption, The (1994) and find similar movies to it
The results for Content based filtering are:
-
Casino(1995)
-
Shanghai Triad (Yao a yao yao dao waipo qiao) ...
-
Dead Man Walking (1995)
-
Hate (Haine, La) (1995)
-
Young Poisoner's Handbook, The (1995)
-
Glass Shield, The (1994)
-
Heavenly Creatures (1994)
-
Little Odessa (1994)
-
New Jersey Drive (1995)
-
Once Were Warriors (1994)
The above systems are not personal hence any user using them will receive the same recommendations
Let us improvise it by using Collaborative Filtering. Lets say we want to recommend the user-19 some movies:
Top 10 Recommendations for UserId 19 are:
-
Independence Day (a.k.a. ID4) (1996)
-
Toy Story (1995)
-
Twister (1996)
-
Rock, The (1996)
-
Mission: Impossible (1996)
-
Willy Wonka & the Chocolate Factory (1971)
-
Fargo (1996)
-
Mr. Holland's Opus (1995)
-
Broken Arrow (1996)
-
Birdcage, The (1996)
These are generated by our model and now lets compare these with the movies which user-19 rated the best
Rating | Title | Genres |
---|---|---|
5.0 | Birdcage, The (1996) | Comedy |
5.0 | Fargo (1996) | Comedy/Crime/Drama/Thriller |
5.0 | Sabrina (1995) | Comedy/Romance |
5.0 | Eddie (1996) | Comedy |
5.0 | Celtic Pride (1996) | Comedy |
5.0 | White Squall (1996) | Action/Adventure/Drama |
5.0 | Rumble in the Bronx (Hont faan kui) (1995) | Action/Adventure/Comedy/Crime |
5.0 | Heat (1995) | Action/Crime/Thriller |
5.0 | Toy Story (1995) | Adventure |
5.0 | Mr. Holland's Opus (1995) | Drama |
Do not worry if the predicted top movies are not same as the user's top recommendations. Here, we completed the utility matrix and predicted which movies have got the highest rating. Based on a particular user, if the user has rated other movies which he did not rate previously, then we understand the user and estimate the ratings he will give to new ones. May be some new movies which the user did not rate has got good ratings, hence we recommend those.
Hence this system performs good at personalizing recommendations and understanding the user better.
Kudos, finally we are able to create a personalized system for users in our dataset.