AI Image Caption Generator

An AI-powered image captioning application that generates natural language descriptions for images using a pretrained BLIP (Bootstrapped Language-Image Pretraining) model from Hugging Face.

This project provides both:

A Command-Line Interface (CLI) for quick caption generation
A Web-based interface using Gradio for interactive usage

Features

Automatically generates captions for images
Uses a pretrained Vision–Language Transformer (BLIP)
Supports both CLI and Web App usage
Simple and readable codebase, suitable for learning and experimentation

Model Details

Model Name: Salesforce/blip-image-captioning-base
Task: Image Captioning
Framework: PyTorch
Library: Hugging Face Transformers

BLIP models are trained jointly on images and text, allowing them to understand visual content and generate accurate natural-language descriptions.

Project Structure

├── CLI-App.py # Command-line image captioning script

├── web-app.py # Gradio-based web application

├── requirements.txt # Python dependencies

├── cat.jpeg # Sample image (replace with your own)

└── README.md

Installation

1️⃣ Clone the Repository

git clone https://github.com/adithya-umesh/ai-image-captioner.git
cd ai-image-captioner

2️⃣ (Optional but Recommended) Create a Virtual Environment

python -m venv venv
source venv/bin/activate        # Linux / macOS
venv\Scripts\activate           # Windows

3️⃣ Install Dependencies

pip install -r requirements.txt

Note: The first run will download the BLIP model weights, which may take a few minutes.

CLI Application Usage: Run the CLI app

python CLI-App.py

How it works:

Loads an image (cat.jpeg by default)

Converts it to RGB format

Processes it using the BLIP processor

Generates and prints a caption in the terminal

To caption a different image, edit the following line in CLI-App.py: img_path = "your_image.jpg"

Web Application Usage:

Run the web app

python web-app.py

Opens a Gradio interface in your browser Upload any image Instantly receive an AI-generated caption

🧩 Technologies Used:

Python
PyTorch
Hugging Face Transformers
Gradio
Pillow (PIL)
NumPy

📌 Example Output

Input Image: A cat sitting on a sofa
Generated Caption: A photo of a cat sitting on a couch in a living room

🔮 Future Improvements

Upgrade to blip-image-captioning-large for higher-quality captions
Add GPU (CUDA) support
Batch image captioning
Deploy as a public web service (Hugging Face Spaces)
Improve UI and add caption confidence scoring

📜 License

This project is intended for educational and experimental purposes.

Model rights belong to Salesforce Research and Hugging Face.

🙌 Acknowledgements

Salesforce Research for the BLIP model

Hugging Face for Transformers and model hosting

Gradio for rapid web UI development

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
test-files		test-files
.gitignore		.gitignore
CLI-App.py		CLI-App.py
GradioIntroduction.py		GradioIntroduction.py
README.md		README.md
Web-App.py		Web-App.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Image Caption Generator

Features

Model Details

Project Structure

Installation

1️⃣ Clone the Repository

🧩 Technologies Used:

📌 Example Output

🔮 Future Improvements

📜 License

🙌 Acknowledgements

About

Uh oh!

Releases

Packages

Languages

adithya-umesh/AI-Image-Captioner

Folders and files

Latest commit

History

Repository files navigation

AI Image Caption Generator

Features

Model Details

Project Structure

Installation

1️⃣ Clone the Repository

🧩 Technologies Used:

📌 Example Output

🔮 Future Improvements

📜 License

🙌 Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages