An AI-powered image captioning application that generates natural language descriptions for images using a pretrained BLIP (Bootstrapped Language-Image Pretraining) model from Hugging Face.
This project provides both:
- A Command-Line Interface (CLI) for quick caption generation
- A Web-based interface using Gradio for interactive usage
- Automatically generates captions for images
- Uses a pretrained Vision–Language Transformer (BLIP)
- Supports both CLI and Web App usage
- Simple and readable codebase, suitable for learning and experimentation
- Model Name:
Salesforce/blip-image-captioning-base - Task: Image Captioning
- Framework: PyTorch
- Library: Hugging Face Transformers
BLIP models are trained jointly on images and text, allowing them to understand visual content and generate accurate natural-language descriptions.
├── CLI-App.py # Command-line image captioning script
├── web-app.py # Gradio-based web application
├── requirements.txt # Python dependencies
├── cat.jpeg # Sample image (replace with your own)
└── README.md
git clone https://github.com/adithya-umesh/ai-image-captioner.git
cd ai-image-captioner2️⃣ (Optional but Recommended) Create a Virtual Environment
python -m venv venv
source venv/bin/activate # Linux / macOS
venv\Scripts\activate # Windows3️⃣ Install Dependencies
pip install -r requirements.txtNote: The first run will download the BLIP model weights, which may take a few minutes.
CLI Application Usage: Run the CLI app
python CLI-App.pyHow it works:
Loads an image (cat.jpeg by default)
Converts it to RGB format
Processes it using the BLIP processor
Generates and prints a caption in the terminal
To caption a different image, edit the following line in CLI-App.py: img_path = "your_image.jpg"
Web Application Usage:
Run the web app
python web-app.pyOpens a Gradio interface in your browser Upload any image Instantly receive an AI-generated caption
- Python
- PyTorch
- Hugging Face Transformers
- Gradio
- Pillow (PIL)
- NumPy
- Input Image: A cat sitting on a sofa
- Generated Caption: A photo of a cat sitting on a couch in a living room
- Upgrade to blip-image-captioning-large for higher-quality captions
- Add GPU (CUDA) support
- Batch image captioning
- Deploy as a public web service (Hugging Face Spaces)
- Improve UI and add caption confidence scoring
This project is intended for educational and experimental purposes.
Model rights belong to Salesforce Research and Hugging Face.
Salesforce Research for the BLIP model
Hugging Face for Transformers and model hosting
Gradio for rapid web UI development