Skip to content

Image captioning app using BLIP (Vision–Language Transformer) with CLI and web UI.

Notifications You must be signed in to change notification settings

adithya-umesh/AI-Image-Captioner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Image Caption Generator

An AI-powered image captioning application that generates natural language descriptions for images using a pretrained BLIP (Bootstrapped Language-Image Pretraining) model from Hugging Face.

This project provides both:

  • A Command-Line Interface (CLI) for quick caption generation
  • A Web-based interface using Gradio for interactive usage

Features

  • Automatically generates captions for images
  • Uses a pretrained Vision–Language Transformer (BLIP)
  • Supports both CLI and Web App usage
  • Simple and readable codebase, suitable for learning and experimentation

Model Details

  • Model Name: Salesforce/blip-image-captioning-base
  • Task: Image Captioning
  • Framework: PyTorch
  • Library: Hugging Face Transformers

BLIP models are trained jointly on images and text, allowing them to understand visual content and generate accurate natural-language descriptions.


Project Structure

├── CLI-App.py # Command-line image captioning script

├── web-app.py # Gradio-based web application

├── requirements.txt # Python dependencies

├── cat.jpeg # Sample image (replace with your own)

└── README.md


Installation

1️⃣ Clone the Repository

git clone https://github.com/adithya-umesh/ai-image-captioner.git
cd ai-image-captioner

2️⃣ (Optional but Recommended) Create a Virtual Environment

python -m venv venv
source venv/bin/activate        # Linux / macOS
venv\Scripts\activate           # Windows

3️⃣ Install Dependencies

pip install -r requirements.txt

Note: The first run will download the BLIP model weights, which may take a few minutes.

CLI Application Usage: Run the CLI app

python CLI-App.py

How it works:

Loads an image (cat.jpeg by default)

Converts it to RGB format

Processes it using the BLIP processor

Generates and prints a caption in the terminal

To caption a different image, edit the following line in CLI-App.py: img_path = "your_image.jpg"

Web Application Usage:

Run the web app

python web-app.py

Opens a Gradio interface in your browser Upload any image Instantly receive an AI-generated caption


🧩 Technologies Used:

  • Python
  • PyTorch
  • Hugging Face Transformers
  • Gradio
  • Pillow (PIL)
  • NumPy

📌 Example Output

  • Input Image: A cat sitting on a sofa
  • Generated Caption: A photo of a cat sitting on a couch in a living room

🔮 Future Improvements

  • Upgrade to blip-image-captioning-large for higher-quality captions
  • Add GPU (CUDA) support
  • Batch image captioning
  • Deploy as a public web service (Hugging Face Spaces)
  • Improve UI and add caption confidence scoring

📜 License

This project is intended for educational and experimental purposes.

Model rights belong to Salesforce Research and Hugging Face.

🙌 Acknowledgements

Salesforce Research for the BLIP model

Hugging Face for Transformers and model hosting

Gradio for rapid web UI development


About

Image captioning app using BLIP (Vision–Language Transformer) with CLI and web UI.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages