A Python tool that leverages Google's Gemini 2.0 API to extract structured data from invoices and other PDF documents. This project demonstrates how to transform unstructured PDF documents into structured, machine-readable data using Gemini's multimodal capabilities.
Converting PDFs into structured or machine-readable text has traditionally been a major challenge. With Gemini 2.0's multimodal capabilities combined with structured output support, this tool can process and extract information from PDFs and other files with high accuracy, eliminating complex and painful manual or semi-automated data extraction processes.
- Extract key invoice data including invoice numbers, dates, and total amounts
- Parse line items with descriptions, quantities, and costs
- Convert unstructured PDF content into structured JSON data
- Easily adaptable to different invoice formats
- Python 3.8+
- Google API key with access to Gemini 2.0 models
-
Clone the repository:
git clone https://github.com/AmmarAhmedl200961/invoice-gemini-extracter.git cd invoice-gemini-extracter -
Set up a virtual environment:
python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Create a
.envfile with your Google API key:GOOGLE_API_KEY=your_api_key_here
-
Place your invoice PDF in the project directory (default filename:
invoice.pdf) -
Run the extraction script:
python main.py
-
The script will extract and display structured data from your invoice
from main import setup_API_client, upload_file, generate_content, InvoiceData
import os
# Set your API key
api_key = os.getenv('GOOGLE_API_KEY')
# Initialize the client
client = setup_API_client(api_key)
# Upload your invoice PDF
file = upload_file(client, "path/to/your/invoice.pdf")
# Extract the structured data
prompt = "Extract the structured data from the following PDF file."
model_id = "gemini-2.0-flash"
invoice_data = generate_content(client, model_id, prompt, file, InvoiceData)
# Use the extracted data
print(f"Invoice #{invoice_data.invoice_number}")
print(f"Date: {invoice_data.invoice_date}")
print(f"Total: ${invoice_data.total_gross_worth}")
for item in invoice_data.line_items:
print(f"- {item.item_quantity}x {item.item_description}: ${item.item_gross_worth}")When running the script with a sample invoice, you can expect output similar to:
Extracted Invoice: INV-2023-0042 on 2023-05-15 with total worth 1245.75
Item: Professional Consulting Services with quantity: 10 and worth: 1000.0
Item: Travel Expenses with quantity: 1 and worth: 150.0
Item: Office Supplies with quantity: 5 and worth: 95.75
Extract data from vendor invoices automatically to populate accounting systems, reducing manual data entry and errors.
Process employee expense reports by extracting line items from receipts and invoices for faster reimbursement and better tracking.
Extract data from delivery notes, bills of lading, and other supply chain documents to maintain real-time inventory and shipment tracking.
Extract key terms, dates, and conditions from contracts and agreements to maintain compliance and track obligations.
Google DeepMind's Gemini 2.0 offers significant advantages for document processing:
- Support for up to 1 million input tokens
- Multimodal capabilities (text, images, audio)
- Function calling and structured output support
- Affordable pricing ($0.1 per 1M input tokens)
- Various models to fit different needs:
- Gemini 2.0 Flash (General Available)
- Gemini 2.0 Flash-Lite (Cost-efficient)
- Gemini 2.0 Pro (Experimental)
This makes it an excellent choice for transforming PDFs from documents into structured, actionable data.