Skip to content

Invoice-Gemini-Extracter: Python tool to extract structured invoice data (fields and line items) from PDFs/images using OCR, preprocessing, and Google Gemini-powered extraction/normalization.

Notifications You must be signed in to change notification settings

AmmarAhm3d/invoice-gemini-extracter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Invoice Gemini Extracter

A Python tool that leverages Google's Gemini 2.0 API to extract structured data from invoices and other PDF documents. This project demonstrates how to transform unstructured PDF documents into structured, machine-readable data using Gemini's multimodal capabilities.

Overview

Converting PDFs into structured or machine-readable text has traditionally been a major challenge. With Gemini 2.0's multimodal capabilities combined with structured output support, this tool can process and extract information from PDFs and other files with high accuracy, eliminating complex and painful manual or semi-automated data extraction processes.

Features

  • Extract key invoice data including invoice numbers, dates, and total amounts
  • Parse line items with descriptions, quantities, and costs
  • Convert unstructured PDF content into structured JSON data
  • Easily adaptable to different invoice formats

Prerequisites

  • Python 3.8+
  • Google API key with access to Gemini 2.0 models

Installation

  1. Clone the repository:

    git clone https://github.com/AmmarAhmedl200961/invoice-gemini-extracter.git
    cd invoice-gemini-extracter
  2. Set up a virtual environment:

    python -m venv .venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Create a .env file with your Google API key:

    GOOGLE_API_KEY=your_api_key_here
    

Usage

  1. Place your invoice PDF in the project directory (default filename: invoice.pdf)

  2. Run the extraction script:

    python main.py
  3. The script will extract and display structured data from your invoice

Example Code

from main import setup_API_client, upload_file, generate_content, InvoiceData
import os

# Set your API key
api_key = os.getenv('GOOGLE_API_KEY')

# Initialize the client
client = setup_API_client(api_key)

# Upload your invoice PDF
file = upload_file(client, "path/to/your/invoice.pdf")

# Extract the structured data
prompt = "Extract the structured data from the following PDF file."
model_id = "gemini-2.0-flash"
invoice_data = generate_content(client, model_id, prompt, file, InvoiceData)

# Use the extracted data
print(f"Invoice #{invoice_data.invoice_number}")
print(f"Date: {invoice_data.invoice_date}")
print(f"Total: ${invoice_data.total_gross_worth}")
for item in invoice_data.line_items:
    print(f"- {item.item_quantity}x {item.item_description}: ${item.item_gross_worth}")

Expected Output

When running the script with a sample invoice, you can expect output similar to:

Extracted Invoice: INV-2023-0042 on 2023-05-15 with total worth 1245.75
Item: Professional Consulting Services with quantity: 10 and worth: 1000.0
Item: Travel Expenses with quantity: 1 and worth: 150.0
Item: Office Supplies with quantity: 5 and worth: 95.75

Use Cases

1. Accounting & Bookkeeping Automation

Extract data from vendor invoices automatically to populate accounting systems, reducing manual data entry and errors.

2. Expense Management

Process employee expense reports by extracting line items from receipts and invoices for faster reimbursement and better tracking.

3. Supply Chain Document Processing

Extract data from delivery notes, bills of lading, and other supply chain documents to maintain real-time inventory and shipment tracking.

4. Contract Analysis

Extract key terms, dates, and conditions from contracts and agreements to maintain compliance and track obligations.

Why Gemini 2.0?

Google DeepMind's Gemini 2.0 offers significant advantages for document processing:

  • Support for up to 1 million input tokens
  • Multimodal capabilities (text, images, audio)
  • Function calling and structured output support
  • Affordable pricing ($0.1 per 1M input tokens)
  • Various models to fit different needs:
    • Gemini 2.0 Flash (General Available)
    • Gemini 2.0 Flash-Lite (Cost-efficient)
    • Gemini 2.0 Pro (Experimental)

This makes it an excellent choice for transforming PDFs from documents into structured, actionable data.

License

MIT

About

Invoice-Gemini-Extracter: Python tool to extract structured invoice data (fields and line items) from PDFs/images using OCR, preprocessing, and Google Gemini-powered extraction/normalization.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages