🎬 Movie Recommender System

AI-Powered Content-Based Movie Recommendation Engine

Live Demo • Report Bug • Request Feature

📑 Table of Contents

Overview
Features
Demo
Architecture
Tech Stack
Quick Start
Installation
Dataset
Model Training
Project Structure
Performance
How to Use
API Reference
FAQ
Contributing
License
Contact

📖 Overview

A content-based movie recommendation system that suggests films based on similarity in genres, keywords, cast, crew, and plot. Built with Streamlit and powered by machine learning, it provides personalized recommendations with rich metadata from TMDB API.

Key Highlights

🎯 Content-Based Filtering using NLP and cosine similarity
🔴 Real-Time Data from TMDB API (posters, trailers, cast, ratings)
⚡ Fast Recommendations with pre-computed similarity matrix
📊 4,800+ Movies in the catalog
🎨 Interactive UI with trending movies, random suggestions, and viewing history

✨ Features

Feature	Description
Movie Search	Search from 4,800+ movies and get instant recommendations
Surprise Me	Random movie discovery with full details
Trending Movies	Weekly trending films from TMDB
Rich Metadata	Cast, crew, budget, revenue, ratings, runtime, trailers
Viewing History	Track and revisit recently viewed movies
Responsive Design	Mobile-friendly interface

🎥 Demo

Search for a movie and get instant recommendations with full details

Try it Live

👉 Launch Live Demo

What you can do:

Search through 4,800+ movies
Get 5 similar movie recommendations instantly
View detailed information (cast, crew, budget, ratings, trailers)
Discover trending movies weekly
Get random movie suggestions

🏗️ Architecture

System Overview

┌─────────────────┐
│   User Input    │
│  (Movie Title)  │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Streamlit App  │
│   (Frontend)    │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Recommender    │
│    Engine       │
│ (Cosine Sim.)   │
└────────┬────────┘
         │
         ├───────────────┬───────────────┐
         ▼               ▼               ▼
┌──────────────┐  ┌──────────┐  ┌──────────────┐
│ Similarity   │  │ TMDB API │  │ Local Cache  │
│ Matrix (pkl) │  │ (Live)   │  │ (Session)    │
└──────────────┘  └──────────┘  └──────────────┘

Recommendation Algorithm

Text Vectorization: Convert movie features (genres, keywords, cast, crew, overview) into vectors using CountVectorizer (5000 features)
Similarity Computation: Calculate cosine similarity between all movie pairs (4806 × 4806 matrix)
Recommendation: For a given movie, retrieve top 5 most similar movies based on cosine similarity scores

Cosine Similarity Formula:

similarity(A, B) = (A · B) / (||A|| × ||B||)

🔧 Tech Stack

Core Dependencies

Category	Technologies
Framework	Streamlit
ML/NLP	scikit-learn, NLTK (PorterStemmer)
Data Processing	Pandas, NumPy, Pickle
API	TMDB API, Requests
Deployment	Streamlit Cloud

🚀 Quick Start

# Clone the repository
git clone https://github.com/hk-kumawat/Movie-Recommender-System.git
cd Movie-Recommender-System

# Install dependencies
pip install -r requirements.txt

# Set up TMDB API key (see below)
mkdir .streamlit
echo '[tmdb]\napi_key = "YOUR_API_KEY"' > .streamlit/secrets.toml

# Run the application
streamlit run app.py

Access the app at: http://localhost:8501

⚙️ Installation

Prerequisites

Python 3.8 or higher
pip package manager
TMDB API key (Get one here)

Step-by-Step Setup

1. Clone the Repository

git clone https://github.com/hk-kumawat/Movie-Recommender-System.git
cd Movie-Recommender-System

2. Create Virtual Environment (Recommended)

python -m venv venv

# Windows
venv\Scripts\activate

# macOS/Linux
source venv/bin/activate

3. Install Dependencies

pip install -r requirements.txt

4. Configure TMDB API Key

Create .streamlit/secrets.toml:

[tmdb]
api_key = "your_tmdb_api_key_here"

How to get TMDB API Key:

Sign up at themoviedb.org
Go to Settings → API
Request API Key (select "Developer")
Copy your API key

5. Run the Application

streamlit run app.py

Troubleshooting

Issue	Solution
`ModuleNotFoundError`	Run `pip install -r requirements.txt`
API Key Error	Check `.streamlit/secrets.toml` format
Port Already in Use	Use `streamlit run app.py --server.port 8502`
NLTK Data Missing	Run `python -m nltk.downloader punkt stopwords`

📊 Dataset

Source: TMDb 5000 Movie Dataset (Kaggle)

Dataset Details

File	Records	Description
`tmdb_5000_movies.csv`	4,803	Movie metadata (title, overview, genres, keywords, budget, revenue)
`tmdb_5000_credits.csv`	4,803	Cast and crew information

Key Statistics:

Movies: 4,806 (after preprocessing)
Features: 5,000 (CountVectorizer)
Genres: 20 unique genres
Time Period: 1916-2017

Data Processing Pipeline

Raw Data
    ↓
Merge movies + credits
    ↓
Extract features (genres, keywords, cast, crew, overview)
    ↓
Text preprocessing (lowercase, remove spaces)
    ↓
Stemming (PorterStemmer)
    ↓
Combine into "tags" column
    ↓
Vectorize (CountVectorizer, max_features=5000)
    ↓
Compute cosine similarity matrix (4806 × 4806)
    ↓
Save model (movie_list.pkl, similarity.pkl)

🧠 Model Training

The recommendation model is trained using a content-based filtering approach. Here's how it works:

Training Process

1. Data Collection & Preprocessing

# Load datasets
movies = pd.read_csv('Dataset/tmdb_5000_movies.csv')
credits = pd.read_csv('Dataset/tmdb_5000_credits.csv')

# Merge on title
movies = movies.merge(credits, on='title')

# Extract relevant features
movies = movies[['movie_id', 'title', 'overview', 'genres', 'keywords', 'cast', 'crew']]

2. Feature Engineering

# Extract top 3 cast members
def convert_cast(text):
    return [actor['name'] for actor in ast.literal_eval(text)[:3]]

# Extract director from crew
def fetch_director(text):
    for person in ast.literal_eval(text):
        if person['job'] == 'Director':
            return [person['name']]
    return []

# Apply transformations
movies['cast'] = movies['cast'].apply(convert_cast)
movies['crew'] = movies['crew'].apply(fetch_director)
movies['genres'] = movies['genres'].apply(lambda x: [genre['name'] for genre in ast.literal_eval(x)])
movies['keywords'] = movies['keywords'].apply(lambda x: [kw['name'] for kw in ast.literal_eval(x)])

3. Text Processing

# Combine all features into tags
movies['tags'] = movies['overview'] + movies['genres'] + movies['keywords'] + movies['cast'] + movies['crew']

# Convert to string and lowercase
movies['tags'] = movies['tags'].apply(lambda x: ' '.join(x).lower())

# Apply stemming
from nltk.stem import PorterStemmer
ps = PorterStemmer()
movies['tags'] = movies['tags'].apply(lambda x: ' '.join([ps.stem(word) for word in x.split()]))

4. Vectorization & Similarity Computation

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Create count vectorizer
cv = CountVectorizer(max_features=5000, stop_words='english')
vectors = cv.fit_transform(movies['tags']).toarray()

# Compute cosine similarity matrix
similarity = cosine_similarity(vectors)

# Save models
pickle.dump(movies, open('model_files/movie_list.pkl', 'wb'))
pickle.dump(similarity, open('model_files/similarity.pkl', 'wb'))

Model Parameters

Parameter	Value	Description
`max_features`	5000	Maximum vocabulary size for CountVectorizer
`stop_words`	'english'	Remove common English words
`similarity_metric`	Cosine Similarity	Measure of similarity between vectors
`top_n_recommendations`	5	Number of recommendations to return

Training Environment

Notebook: Movie Recommender System.ipynb
Training Time: ~2 minutes (on standard CPU)
Model Size: 184 MB (similarity matrix)
Libraries: scikit-learn, NLTK, Pandas, NumPy

📁 Project Structure

Movie-Recommender-System/
│
├── app.py                          # Main Streamlit application
├── Movie Recommender System.ipynb  # Data preprocessing & model training
├── requirements.txt                # Python dependencies
├── .gitignore                      # Git ignore rules
├── LICENSE                         # MIT License
│
├── Dataset/                        # Raw movie data
│   ├── tmdb_5000_movies.csv
│   └── tmdb_5000_credits.csv
│
├── model_files/                    # Trained models
│   ├── movie_list.pkl             # Movie data (4806 movies)
│   └── similarity.pkl             # Cosine similarity matrix (4806×4806)
│
└── .streamlit/                     # Configuration (not in repo)
    └── secrets.toml               # TMDB API key

📈 Performance

Model Metrics

Metric	Value
Movies in Catalog	4,806
Feature Dimensions	5,000
Similarity Matrix Size	4,806 × 4,806
Average Recommendation Time	<2 seconds
Model Size	184 MB (similarity.pkl)

System Performance

API Response Time: ~1.2s (TMDB)
Recommendation Generation: ~0.8s
Memory Usage: ~500MB
Concurrent Users: 100+

🎯 How to Use

Web Application

Search Mode: Select a movie from the dropdown and click "Show Details & Recommendations"
Surprise Mode: Click "Surprise Me!" for a random movie suggestion
Trending: View weekly trending movies at the top
History: Access recently viewed movies from the sidebar

API Integration

import pickle
import pandas as pd

# Load models
movies = pickle.load(open('model_files/movie_list.pkl', 'rb'))
similarity = pickle.load(open('model_files/similarity.pkl', 'rb'))

# Get recommendations
def recommend(movie):
    index = movies[movies['title'] == movie].index[0]
    distances = sorted(list(enumerate(similarity[index])), reverse=True, key=lambda x: x[1])
    recommendations = []
    for i in distances[1:6]:
        recommendations.append(movies.iloc[i[0]].title)
    return recommendations

# Example
print(recommend('Avatar'))
# Output: ['Guardians of the Galaxy', 'Star Wars', 'Star Trek', ...]

📚 API Reference

Core Functions

`recommend(movie_title: str) -> list`

Returns top 5 similar movies based on content similarity.

Parameters:

movie_title (str): Title of the movie (must exist in dataset)

Returns:

List of dictionaries containing recommended movies with poster URLs and trailers

Example:

recommendations = recommend('The Dark Knight')
# Returns: [
#   {'title': 'The Dark Knight Rises', 'poster': '...', 'trailer': '...'},
#   {'title': 'Batman Begins', 'poster': '...', 'trailer': '...'},
#   ...
# ]

`get_movie_details(movie_id: int) -> dict`

Fetches comprehensive movie information from TMDB API.

Parameters:

movie_id (int): TMDB movie ID

Returns:

Dictionary with rating, cast, crew, budget, revenue, genres, etc.

Example:

details = get_movie_details(19995)  # Avatar
# Returns: {
#   'rating': 7.2,
#   'cast': [...],
#   'director': 'James Cameron',
#   'budget': '$237,000,000',
#   ...
# }

`get_trending_movies() -> list`

Gets current trending movies from TMDB API.

Returns:

List of top 5 trending movies with posters and IDs

`fetch_poster(movie_id: int) -> str`

Fetches movie poster URL from TMDB API.

Parameters:

movie_id (int): TMDB movie ID

Returns:

Full URL to movie poster (500px width)

`fetch_trailer(movie_id: int) -> str`

Fetches YouTube trailer URL from TMDB API.

Parameters:

movie_id (int): TMDB movie ID

Returns:

YouTube URL to official trailer (if available)

Configuration

Environment Variables:

TMDB_API_KEY = st.secrets["tmdb"]["api_key"]  # From .streamlit/secrets.toml

Session State:

st.session_state.history        # Recently viewed movies (list of IDs)
st.session_state.mode           # Current mode: 'search' or 'surprise'
st.session_state.selected_movie # Currently selected movie title

❓ FAQ

How does the recommendation system work?

It uses content-based filtering with cosine similarity. Movies are represented as vectors based on genres, cast, crew, keywords, and plot. Similar movies have vectors close together in this multi-dimensional space.

How many movies are in the database?

4,806 movies from the TMDb 5000 dataset, spanning 1916-2017.

Can I add my own movies?

Not directly. You would need to retrain the model with new data. See the Jupyter notebook for the training process.

Why do I need a TMDB API key?

The API key is required to fetch real-time data like posters, trailers, cast information, and ratings from The Movie Database.

What algorithm is used for recommendations?

Content-based filtering using CountVectorizer for text features and cosine similarity for computing movie similarity scores.

How accurate are the recommendations?

Accuracy depends on user preference, but the system achieves good results by considering multiple features (genres, cast, crew, plot, keywords).

Can this handle collaborative filtering?

No, this is purely content-based. It doesn't use user ratings or behavior data.

What if a movie title has special characters?

Use the exact title as it appears in the dropdown menu. The system is case-sensitive.

How often is the trending section updated?

Trending movies are fetched in real-time from TMDB API every time you load the page.

Can I deploy this on my own server?

Yes! It works on any platform that supports Streamlit (Streamlit Cloud, Heroku, AWS, etc.).

What are the system requirements?

Python 3.8+, ~500MB RAM, and the libraries in `requirements.txt`.

How do I update the movie database?

Download a new dataset, retrain the model using the Jupyter notebook, and replace the `.pkl` files.

Is there a rate limit on TMDB API?

Yes, TMDB has rate limits. The app uses retry logic with exponential backoff to handle this gracefully.

🤝 Contributing

Contributions are welcome! Here's how:

Fork the repository
Create a feature branch (git checkout -b feature/YourFeature)
Commit changes (git commit -m 'Add YourFeature')
Push to branch (git push origin feature/YourFeature)
Open a Pull Request

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

📬 Contact

Get in Touch!

Feel free to reach out for collaborations, questions, or feedback:

⭐ Star this repository if you found it helpful!

Made with ❤️ by Harshal Kumawat

⬆️ Back to top

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
Dataset		Dataset
model_files		model_files
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
Movie Recommender System.ipynb		Movie Recommender System.ipynb
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

License

hk-kumawat/Movie-Recommender-System

Folders and files

Latest commit

History

Repository files navigation

🎬 Movie Recommender System

AI-Powered Content-Based Movie Recommendation Engine

📑 Table of Contents

📖 Overview

Key Highlights

✨ Features

🎥 Demo

Try it Live

🏗️ Architecture

System Overview

Recommendation Algorithm

🔧 Tech Stack

Core Dependencies

🚀 Quick Start

⚙️ Installation

Prerequisites

Step-by-Step Setup

Troubleshooting

📊 Dataset

Dataset Details

Data Processing Pipeline

🧠 Model Training

Training Process

Model Parameters

Training Environment

📁 Project Structure

📈 Performance

Model Metrics

System Performance

🎯 How to Use

Web Application

API Integration

📚 API Reference

Core Functions

recommend(movie_title: str) -> list

get_movie_details(movie_id: int) -> dict

get_trending_movies() -> list

fetch_poster(movie_id: int) -> str

fetch_trailer(movie_id: int) -> str

Configuration

❓ FAQ

🤝 Contributing

📝 License

📬 Contact

Get in Touch!

⭐ Star this repository if you found it helpful!

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

`recommend(movie_title: str) -> list`

`get_movie_details(movie_id: int) -> dict`

`get_trending_movies() -> list`

`fetch_poster(movie_id: int) -> str`

`fetch_trailer(movie_id: int) -> str`

Packages