An OCR application that extracts text from images.
- Extract text from uploaded images
- Process multiple image formats (PNG, JPG, JPEG, GIF, WEBP)
- User-friendly Streamlit interface
- RESTful API endpoints
- Integration with Langchain for advanced text processing
- Together AI Vision model integration
- Python 3.12 or higher
- Poetry package manager
- Together AI API key
1.Clone the repository:
git clone https://github.com/yourusername/ImageTextExtractor.git
cd ImageTextExtractor
2.Install dependencies using Poetry:
poetry install
1.Start the FastAPI backend:
poetry run python main.py
2.In a new terminal, launch the Streamlit interface:
poetry run streamlit run ui.py
3.Open your browser and navigate to http://localhost:8501
4.Enter your Together AI API key
5.Upload an image and wait for the results
The application exposes a REST API endpoint for OCR processing.
Request:
- URL:
http://localhost:8000/ocr
- Method:
POST
- Content-Type:
multipart/form-data
Parameters:
file
: Image file (supported formats: PNG, JPG, JPEG, GIF, WEBP)api_key
: Together AI API keysystem_prompt
: (Optional) Custom prompt for the vision model
Example using curl:
curl -X POST http://localhost:8000/ocr \
-F "file=@/path/to/your/image.jpg" \
-F "api_key=your_together_ai_api_key" \
-F "system_prompt=Convert the provided image into text"
Response:
poetry run pytest
The application uses the following configurations (defined in config.py
):
LOGGING_LEVEL
: Default is "INFO"SUPPORTED_IMAGE_TYPES
: [".png", ".jpg", ".jpeg", ".gif", ".webp"]TOGETHER_MODEL_NAME
: "meta-llama/Llama-3.2-11B-Vision-Instruct-Turbo"
This project is licensed under the MIT License - see the LICENSE file for details.
- Together AI for providing the vision model
- Langchain for the AI integration framework
- Streamlit for the user interface
- FastAPI for the REST API implementation