Movie Recommendation System using PySpark, ALS, SQLLite (Movielens Dataset)
!pip install sqlite3
!pip install pyspark
All of the required modules and libraries are listed below:
- for machine learning processes
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml.recommendation import ALS
from pyspark.ml.tuning import TrainValidationSplit, ParamGridBuilder
- for dataframe and spark session processes
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.context import SparkContext
from datetime import date, timedelta, datetime
import time
- ratings.csv
userId, movieId, rating, timestamp
- tags.csv
userId, movieId, tag, timestamp
- movies.csv
movieId, title, year, genres
- links.csv
movieId, imdbId, tmdbId
- The file hierarchy required for the code to work properly is as shown in the figure below:
- All the necessary directions for the code are explained in detail in the code, and the code is divided into certain sections as shown below according to its function:
Movie recommendation system that filters the User_Id information according to the user we want and lists the most compatible movies for that user.
Please edit the filter .filter("User_Id = N") with the user's ID (N) according to the user you want.