This is a demonstration and companion repo that shows how to process PDF form elements like checkboxes and radiobuttons with LLMWhisperer, which is a text extraction service that specifically targets large language models (LLMs).
PDF forms have checkboxes and radiobuttons that can be filled out by the user. These form elements are used to collect data from the user. In this repo, we will show how to extract these form elements using LLMWhisperer in a way that LLMs can understand.
LLMWhisperer can both recognize checkboxes and radiobuttons and also render them in the extracted text in a manner that LLMs can understand. This allows you to use an LLM to process forms in PDFs.
We'll call the LLMWhisperer API to extract text from a PDF form that contains checkboxes and radiobuttons. The extracted text will contain the form elements and their values rendered in a way LLMs can comprehend. We'll then parse this text to extract the values of the checkboxes and radiobuttons using Langchain in combination with Pydantic, finally generating structured JSON output. We'll use OpenAI's GPT-3 as the LLM in this example.
You should be able to run this on Linux or on a Mac. Windows is not supported, although one should be able to make it run there with minor changes.
You'll need keys for OpenAI and LLMWhisperer, which you can get for free. Please read the blog post for more information. Once you have the keys, please add them to the .env
file in the root of the project.
Clone this repo and change to the llmwhisperer-pdf-checkbox-processing
directory. We suggest you run the code after you've created a Python virtual environment. You can create a virtual environment by running the following command:
python3 -m venv .venv
Next, activate the virtual environment:
source .venv/bin/activate
Now, install the dependencies:
pip install -r requirements.txt
Next, copy the sample.env
file to .env
, edit the .env
file to add your OpenAI and LLMWhisperer keys:
cp sample.env .env
Finally, run the code:
python main.py
We will be using the following PDF form to demonstrate how to process checkboxes and radiobuttons with LLMWhisperer:
To be completed by the Lender:
Lender Loan No./Universal Loan Identifier 2101000077 Agency Case No.
Uniform Residential Loan Application
Verify and complete the information on this application. If you are applying for this loan with others, each additional Borrower must provide
information as directed by your Lender.
Section 1: Borrower Information. This section asks about your personal information and your income from
employment and other sources, such as retirement, that you want considered to qualify for this loan.
1a. Personal Information
Name (First, Middle, Last, Suffix) Social Security Number 500 - 60 2222
Amy America (or Individual Taxpayer Identification Number)
Alternate Names - List any names by which you are known or any names Date of Birth Citizenship
under which credit was previously received (First, Middle, Last, Suffix) (mm/dd/yyyy) [X] U.S. Citizen
03 30 1954 [ ] Permanent Resident Alien
[ ] Non-Permanent Resident Alien
Type of Credit List Name(s) of Other Borrower(s) Applying for this Loan
[X] I am applying for individual credit. (First, Middle, Last, Suffix) - Use a separator between names
[ ] )I am applying for joint credit. Total Number of Borrowers:
Each Borrower intends to apply for joint credit. Your initials:
Marital Status Dependents (not listed by another Borrower) Contact Information
[X] Married Number 2 Home Phone ( 1
[ ] Separated Ages Cell Phone ( 408 ) 111 - 2121
[ ] Unmarried Work Phone ( ) Ext.
(Single, Divorced, Widowed, Civil Union, Domestic Partnership, Registered
Reciprocal Beneficiary Relationship) Email [email protected]
Current Address
Street 4321 Cul de Sac ST Unit #
City Los Angeles State CA [X] ZIP 90210 Country
How Long at Current Address? 10 Years 2 Months Housing [ ] No primary housing expense [X] Own [X] Rent ($ 2200 /month)
If at Current Address for LESS than 2 years, list Former Address [X] Does not apply
Street Unit #
City State [X] ZIP Country
How Long at Former Address? Years Months Housing [ ] )No primary housing expense [ ] Own [ ] Rent ($ /month)
Mailing Address - if different from Current Address [X] Does not apply
Street Unit #
City State [X] ZIP Country
1b. Current Employment/Self-Employment and Income [ ] Does not apply
Gross Monthly Income
Employer or Business Name America Transportation Phone ( -
Unit # Base $ 7,000.00 /month
Street 12 Main Street
Overtime $ /month
City S Dennis State MA ZIP 02660 Country
Bonus $ /month
Position or Title FOUNDER Check if this statement applies: Commission $ /month
[ ] I am employed by a family member,
Start Date 1 / (mm/dd/yyyy)
property seller, real estate agent, or other Military
Entitlements $ /month
How long in this line of work? 15 Years Months party to the transaction.
Other $ /month
[X] Check if you are the Business [ ] I have an ownership share of less than 25%. Monthly Income (or Loss)
TOTAL $ 7,000.00/month
Owner or Self-Employed [X] I have an ownership share of 25% or more. $ 7,000.00
Uniform Residential Loan Application
Freddie Mac Form 65 . Fannie Mae Form 1003
<<<
{
"personal_details": {
"name": "Amy America",
"ssn": "500-60-2222",
"dob": "1954-03-30T00:00:00Z",
"citizenship": "U.S. Citizen"
},
"extra_details": {
"type_of_credit": "Individual",
"marital_status": "Married",
"cell_phone": "(408) 111-2121"
},
"current_address": {
"street": "4321 Cul de Sac ST",
"city": "Los Angeles",
"state": "CA",
"zip_code": "90210",
"residing_in_addr_since_years": 10,
"residing_in_addr_since_months": 2,
"own_house": true,
"rented_house": true,
"rent": 2200,
"mailing_address_different": false
},
"employment_details": {
"business_owner_or_self_employed": true,
"ownership_of_25_pct_or_more": true
}
}