An open-source tool to automate signature validation for ballot initiatives using OCR and fuzzy matching.
Note: Tool is tailored to DC petitions (example), and would need to be modified to fit other formats.
In 2024, voters sought to get "Ranked Choice Voting on the DC Ballot." To do so volunteers and staff had to walk throughout the neighborhoods of DC asking people if they were willing to sign their name in support of the measure ("They want to change how D.C. votes — one signature at a time", WaPo 2024)
After a person signs on the dotted line, each signature must then be linked by name and address to an actual D.C. voter. Though the Board of Elections will verify signatures when it receives petitions, Mintwood Strategies verifies them first.
Kris Furnish, the initiative’s field director, typed names from the signature sheets into a database that linked the signature to a voter and checked for duplicates. Only when the signature was validated in this internal system would the person who collected it be paid for it, she said.
It was slow, painstaking work. Furnish struggled with a person’s middle name that was illegible. She decided it was “Alexa.” That middle name popped up in the database linked to an address that the voter had provided.
The process of checking signatures for ballot initiatives is time consuming and boring. All that time could be better spent collecting more signatures or doing higher level political organizing around the issue that the ballot-initiative represents.
The goal of the Ballot Initiative project is to reduce the manual labor involved in the signature-checking process by automating the simplest aspects of that process. This repo collects some preliminary versions of code that allow users to upload PDF files of ballot initatives signatures and to validate whether these signatures are connected to names in a voter records file.
-
Extraction: Forms in PDF format are processed through an OCR engine (using generative AI) to crop text sections and extract data.
-
Identification: The engine identifies and extracts key information (tailored to DC Ballot Initiatives) related to validating signatures:
- Names
- Addresses
- Wards
- Dates
-
Matching: Extracted data names and addresses are passed through a Fuzzy Match engine (using Levenshtein distance) that compares against a CSV of voter records. Harmonic mean of the two scores is used as the net validation score.
-
Output: System outputs a table of results containing:
- Name (OCR and Record Match)
- Address (OCR and Record Match)
- Validation score
- Validation status
An alternate approach to get up and running is to use Github Codespaces to run this project in a dev container. Instructions can be found here.
- Python 3.12+
- UV for building the project and dependency management.
- API keys for at least one of the following1:
- PDF files of ballot initiative signatures
- Use fake data in
sample_data/fake_signed_petitions.pdffolder to test.
- Use fake data in
- Voter records file (access is limited - see note below)
- Use fake data in
sample_data/fake_voter_records.csvfolder to test.
- Use fake data in
-
Clone the repository2:
git clone https://github.com/Civic-Tech-Ballot-Inititiave/Ballot-Initiative.git cd ballot-initiative -
Create and activate a virtual environment:
# Initalise project and install dependencies uv sync --all-extras --dev # Activate virtual environment # On Windows: venv\Scripts\activate # On macOS/Linux: source venv/bin/activate
-
Configure and save settings:
- Make a copy of the
settings.example.tomlfile and rename it tosettings.toml. - Add your GenAI API keys to the
api_keyfield of the selected model - Add the name of the model to the
modelfield e.g.mistral-small-latestorgpt-4o-mini
- Make a copy of the
-
Start the FastAPI backend API:
uv run fastapi dev app/api.py
-
Start the frontend UI:
cd frontend npm run dev -
UI should be up and running on http://localhost:5173
-
Upload your files:
- PDF of signed petitions
- Voter records file
- Sample data is available in the
sample_datafolder for testing
-
Navigate to the project root folder
-
Run the following command:
uv run pytest
-
Navigate to the project root folder
-
Run the following command:
uv run fastapi dev app/api.py
- Onboarding Notebook (Colab): Comprehensive guide covering project goals and background, OCR implementation examples, and fuzzy matching implementation examples
- Streamlit Documentation - Framework used for the application interface
- DC Initiative and Referendum Process - Official process documentation
- Washington Post Article - Context and background
This project is open-sourced under the MIT License - see the LICENSE file for details.
Footnotes
-
The free tiers for these services typically have a low rate limit that can cause issues. Many services require adding a payment method to your account to increase rate limits. Please verify your account settings and usage limits before running the application. ↩
-
Optionally you may want to fork this repository ↩

