Add Form Automation Agentic Workflow #226

sitammeur · 2026-02-05T07:24:43Z

Summary by CodeRabbit

New Features
- Interactive web UI to run end-to-end form-filling workflows with live logs and PDF preview/download
- Document scanning (OCR), intelligent data transformation, and PDF form population
- CLI/programmatic entrypoint to run the workflow
- Built-in W-9 form schema and schema listing/loading support
- Tools for OCR scanning and form-filling using external OCR/form services
- Example configuration (.env) and project metadata plus detailed README documentation

coderabbitai · 2026-02-05T07:26:03Z

📝 Walkthrough

Walkthrough

Adds a new form-filling automation system: typed models, Datalab-based document scanning and PDF filling tools, a multi-stage CrewAI flow (scan → transform → fill), a Streamlit UI with event logging, project metadata, and a W-9 schema.

Changes

Cohort / File(s)	Summary
Project config & docs `form-filling-automation/.env.example`, `form-filling-automation/pyproject.toml`, `form-filling-automation/README.md`	Added environment example, project metadata and dependencies, and a detailed README describing setup and usage.
Data models & schemas `form-filling-automation/models.py`, `form-filling-automation/schemas/w9.yaml`	New Pydantic models for form fields, schema, normalized form data, and flow state; loader/list helpers; added W-9 YAML schema.
Tools (Datalab integration) `form-filling-automation/tools.py`	Added DocumentScanTool and FormFillTool with input models, Datalab client calls, validation, file handling, and error reporting.
Workflow & orchestration `form-filling-automation/flow.py`, `form-filling-automation/main.py`	Implemented FormFillingFlow (initialize → extract → transform → fill → finalize), agent/task factories, JSON extraction helpers, and run_form_flow entrypoint (with alias).
Streamlit UI & helpers `form-filling-automation/app.py`	New Streamlit app with StreamlitLogListener, PDF preview, JSON extraction helper, sidebar for schema/upload, stepwise workflow orchestration and real-time event display.

Sequence Diagram

sequenceDiagram
    participant User
    participant UI as Streamlit UI
    participant Flow as FormFillingFlow
    participant ScanAgent as Scanner Agent
    participant TransformAgent as Transformer Agent
    participant FillAgent as Filler Agent
    participant Datalab as Datalab SDK
    participant FS as Filesystem

    User->>UI: upload document, blank form, select schema
    UI->>Flow: kickoff(document, blank_form, schema)
    Flow->>ScanAgent: run scan task
    ScanAgent->>Datalab: request OCR/convert
    Datalab-->>ScanAgent: extracted_text
    ScanAgent-->>Flow: extracted_text

    Flow->>TransformAgent: run transform task (text + schema)
    TransformAgent->>TransformAgent: map to structured form_data
    TransformAgent-->>Flow: form_data (JSON)

    Flow->>FillAgent: run fill task (form_data + blank_form)
    FillAgent->>Datalab: request PDF form fill
    Datalab->>FS: write filled PDF
    Datalab-->>FillAgent: success + path
    FillAgent-->>Flow: completed_form_path

    Flow-->>UI: result (completed_form_path, form_data)
    UI-->>User: display preview + download

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 I hopped through code, nibbling tasks with glee,
Scan, transform, and fill — a trifecta spree,
PDFs glow, fields lined up in a row,
Streamlit shows progress as breezes blow,
Cheers from a rabbit — forms done, ready to go! 🎉

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.61% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Add Form Automation Agentic Workflow' clearly and accurately summarizes the main change: introducing a complete form-filling automation system built with CrewAI agents.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 8

🤖 Fix all issues with AI agents

In `@form-filling-automation/app.py`:
- Around line 3-7: The code currently uses the relative string "schemas" which
breaks when the CWD is different; change to resolve the schemas directory
relative to app.py by computing base = os.path.dirname(__file__) and schema_dir
= os.path.join(base, "schemas") (or use Path(__file__).resolve().parent /
"schemas"), then replace any direct "schemas" usages (including the
discovery/loading calls around the code that currently references "schemas" near
the bottom of app.py) to use schema_dir so schema discovery works regardless of
working directory.
- Around line 324-329: The check currently treats any non-empty result.raw as
valid text (in crew.kickoff -> result.raw -> extracted_text), allowing OCR error
strings like "Error: ..." to pass; update the validation to first detect tool
errors by checking for an explicit error field on result (e.g., result.error or
result.status) and, if not available, reject raw strings that match common error
patterns (e.g., start with "Error", "ERR", "Traceback", or contain "exception")
before assigning extracted_text; if an error is detected, raise a ValueError
with the tool error details instead of proceeding.
- Around line 288-295: The code currently builds file paths using
user-controlled filenames (identity_file.name and blank_form_file.name) causing
path traversal risk; update the logic that constructs identity_path and
blank_form_path to use pathlib.Path(identity_file.name).name and
pathlib.Path(blank_form_file.name).name to strip any directory components before
joining with tmp_dir, and add the necessary pathlib import; reference the
variables identity_path, identity_file, blank_form_path, blank_form_file and
tmp_dir when making this change.

In `@form-filling-automation/flow.py`:
- Around line 25-28: The LLM instantiation uses
os.environ.get("OPENROUTER_API_KEY") which returns None when the environment
variable is missing; update the flow.py code around the llm = LLM(...) creation
to validate the OPENROUTER_API_KEY first (either use
os.environ["OPENROUTER_API_KEY"] to let Python raise a clear KeyError or
explicitly check and raise a descriptive error), then pass the validated
non-None key into LLM(...) so runtime failures are deterministic and the error
message clearly identifies the missing environment variable.
- Around line 179-182: The default output path logic in the block that sets
self.state.output_path currently uses string replace on
self.state.blank_form_path which fails for uppercase or missing extensions;
replace this with pathlib.Path handling: import Path, build p =
Path(self.state.blank_form_path), compute new_name = p.stem + "_completed" +
(p.suffix if p.suffix.lower() else ".pdf"), then set self.state.output_path =
str(p.with_name(new_name)); update the code in the same conditional that
references self.state.output_path and self.state.blank_form_path to use these
Path-based symbols.

In `@form-filling-automation/main.py`:
- Around line 3-6: The code coerces a default None output_path into an empty
string and passes that into FormFillingFlow/load_schema calls (see output_path
variable and FormFillingFlow usage around lines you modified), which can produce
invalid file writes; change the call sites so you either leave output_path as
None or compute a deterministic default path (e.g., derived from the loaded
schema name) and only pass a non-empty string when valid—i.e., avoid passing ""
by: if output_path is falsy keep None (do not pass output_path) or replace it
with a clearly derived default before invoking FormFillingFlow or downstream
functions that consume output_path.

In `@form-filling-automation/schemas/w9.yaml`:
- Around line 13-83: The schema currently makes all tax classification
checkboxes (individual_sole_proprietor, c_corporation, s_corporation,
partnership, trust_estate, llc, other) and TIN parts (ssn_1, ssn_2, ssn_3,
ein_1, ein_2) optional, allowing invalid W‑9s; update w9.yaml to require that at
least one classification is present (add an anyOf/oneOf validation referencing
those checkbox fields) and enforce a complete TIN by requiring either all SSN
parts (ssn_1, ssn_2, ssn_3) or all EIN parts (ein_1, ein_2) via an anyOf rule,
or implement equivalent workflow-level validation that checks these same
conditions before form generation.

In `@form-filling-automation/tools.py`:
- Around line 77-96: In _run, add validation for the output_path parameter
before any writes: check that output_path is provided (not None or empty
string), resolve it to a Path, ensure its parent directory exists and is
writable (or create the parent dir if intended), and return a clear error like
"Error: Invalid output_path" if the check fails; update the existing validation
block in the _run method (which already checks form_path, field_data, and
api_key) to perform this output_path validation immediately after those checks
using the output_path variable.

🧹 Nitpick comments (4)

form-filling-automation/models.py (2)

100-110: Non-dict field values are silently dropped.

In from_dict, if field_info is not a dict, it is silently ignored. This could mask malformed input data. Consider logging a warning or raising a validation error for unexpected data structures to aid debugging.

💡 Suggested improvement

     `@classmethod`
     def from_dict(cls, data: Dict[str, Any]) -> "GenericFormData":
         """Create from a dictionary, converting non-string values to strings."""
         normalized = {}
         for field_name, field_info in data.items():
             if isinstance(field_info, dict):
                 normalized[field_name] = {
                     "value": cls._to_string(field_info.get("value", "")),
                     "description": cls._to_string(field_info.get("description", "")),
                 }
+            else:
+                # Handle simple value format (just a value, no description)
+                normalized[field_name] = {
+                    "value": cls._to_string(field_info),
+                    "description": "",
+                }
         return cls(field_data=normalized)

177-203: Consolidate duplicate loops and improve error visibility.

The two near-identical loops for .yaml and .yml files can be combined. Also, the bare except Exception: continue silently swallows all errors, making it difficult to diagnose schema loading issues.

♻️ Proposed refactor

-    for file in schemas_path.glob("*.yaml"):
-        try:
-            schema = FormSchema.from_yaml(str(file))
-            schemas.append(
-                {
-                    "file": file.name,
-                    "path": str(file),
-                    "form_type": schema.form_type,
-                    "form_name": schema.form_name,
-                }
-            )
-        except Exception:
-            continue
-
-    for file in schemas_path.glob("*.yml"):
-        try:
-            schema = FormSchema.from_yaml(str(file))
-            schemas.append(
-                {
-                    "file": file.name,
-                    "path": str(file),
-                    "form_type": schema.form_type,
-                    "form_name": schema.form_name,
-                }
-            )
-        except Exception:
-            continue
+    import itertools
+    import logging
+    
+    logger = logging.getLogger(__name__)
+    
+    yaml_files = itertools.chain(
+        schemas_path.glob("*.yaml"),
+        schemas_path.glob("*.yml"),
+    )
+    for file in yaml_files:
+        try:
+            schema = FormSchema.from_yaml(str(file))
+            schemas.append(
+                {
+                    "file": file.name,
+                    "path": str(file),
+                    "form_type": schema.form_type,
+                    "form_name": schema.form_name,
+                }
+            )
+        except Exception as e:
+            logger.warning(f"Failed to load schema from {file}: {e}")
+            continue

form-filling-automation/flow.py (2)

224-226: Consider defensive null check for form_schema.

While initialize() validates form_schema, accessing it directly without a null check here could cause an AttributeError if the flow state is manipulated unexpectedly. A defensive check would improve robustness.
💡 Suggested improvement
-        schema = self.state.form_schema
+        schema = self.state.form_schema
+        if not schema:
+            self.state.errors.append("Form schema not available")
+            return "failed"
         transformer_agent = create_form_transformer_agent(schema)
340-349: Greedy regex may capture too much text.

The pattern \{[\s\S]*\} is greedy and matches from the first { to the last } in the entire text. If the text contains multiple JSON objects or trailing content, this could capture invalid JSON. The json.loads call will reject it, but using a non-greedy pattern \{[\s\S]*?\} might improve first-match accuracy.
💡 Suggested improvement
         # Try to find raw JSON object
-        brace_pattern = r"\{[\s\S]*\}"
+        brace_pattern = r"\{[\s\S]*?\}"
         matches = re.findall(brace_pattern, text)

coderabbitai · 2026-02-05T07:32:30Z

form-filling-automation/app.py

+import base64
+from datetime import datetime
+import json
+import os
+import tempfile


⚠️ Potential issue | 🟠 Major

Resolve the schema directory relative to app.py to avoid missing schemas.

Using "schemas" relies on the current working directory. If the app is launched from the repo root (a common streamlit run form-filling-automation/app.py pattern), schema discovery fails and the UI cannot proceed.

🔧 Suggested fix

import base64 from datetime import datetime import json import os import tempfile +from pathlib import Path ... - available_schemas = list_available_schemas("schemas") + schemas_dir = Path(__file__).resolve().parent / "schemas" + available_schemas = list_available_schemas(str(schemas_dir))

Also applies to: 190-191

🤖 Prompt for AI Agents

In `@form-filling-automation/app.py` around lines 3 - 7, The code currently uses the relative string "schemas" which breaks when the CWD is different; change to resolve the schemas directory relative to app.py by computing base = os.path.dirname(__file__) and schema_dir = os.path.join(base, "schemas") (or use Path(__file__).resolve().parent / "schemas"), then replace any direct "schemas" usages (including the discovery/loading calls around the code that currently references "schemas" near the bottom of app.py) to use schema_dir so schema discovery works regardless of working directory.

coderabbitai · 2026-02-05T07:32:31Z

form-filling-automation/app.py

+identity_path = os.path.join(tmp_dir, identity_file.name)
+with open(identity_path, "wb") as f:
+    f.write(identity_file.getvalue())
+
+blank_form_path = os.path.join(tmp_dir, blank_form_file.name)
+with open(blank_form_path, "wb") as f:
+    f.write(blank_form_file.getvalue())
+


⚠️ Potential issue | 🟠 Major

Sanitize uploaded filenames to prevent path traversal.

Uploaded file names are user‑controlled. Joining them directly into a temp path allows path traversal (e.g., ../../...) and writes outside the temp directory. Use Path(...).name to strip directory segments. (Requires pathlib.Path import.)

🔒 Suggested fix

-identity_path = os.path.join(tmp_dir, identity_file.name) +identity_name = Path(identity_file.name).name +identity_path = os.path.join(tmp_dir, identity_name) ... -blank_form_path = os.path.join(tmp_dir, blank_form_file.name) +blank_form_name = Path(blank_form_file.name).name +blank_form_path = os.path.join(tmp_dir, blank_form_name)

🤖 Prompt for AI Agents

In `@form-filling-automation/app.py` around lines 288 - 295, The code currently builds file paths using user-controlled filenames (identity_file.name and blank_form_file.name) causing path traversal risk; update the logic that constructs identity_path and blank_form_path to use pathlib.Path(identity_file.name).name and pathlib.Path(blank_form_file.name).name to strip any directory components before joining with tmp_dir, and add the necessary pathlib import; reference the variables identity_path, identity_file, blank_form_path, blank_form_file and tmp_dir when making this change.

coderabbitai · 2026-02-05T07:32:31Z

form-filling-automation/app.py

+            result = crew.kickoff(inputs={"document_file": identity_path})
+
+            extracted_text = result.raw if result.raw else ""
+            if not extracted_text:
+                raise ValueError("No text extracted from document")
+


⚠️ Potential issue | 🟠 Major

Detect tool error strings instead of treating them as extracted text.

The OCR tool returns error messages as strings. The current check only verifies non‑empty output, so "Error: ..." passes and the workflow continues as if successful.

🧰 Suggested fix

- extracted_text = result.raw if result.raw else "" - if not extracted_text: - raise ValueError("No text extracted from document") + extracted_text = result.raw if result.raw else "" + if not extracted_text: + raise ValueError("No text extracted from document") + if extracted_text.strip().startswith("Error:"): + raise ValueError(extracted_text)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

result = crew.kickoff(inputs={"document_file": identity_path})

extracted_text = result.raw if result.raw else ""

if not extracted_text:

raise ValueError("No text extracted from document")

result = crew.kickoff(inputs={"document_file": identity_path})

extracted_text = result.raw if result.raw else ""

if not extracted_text:

raise ValueError("No text extracted from document")

if extracted_text.strip().startswith("Error:"):

raise ValueError(extracted_text)

🤖 Prompt for AI Agents

In `@form-filling-automation/app.py` around lines 324 - 329, The check currently treats any non-empty result.raw as valid text (in crew.kickoff -> result.raw -> extracted_text), allowing OCR error strings like "Error: ..." to pass; update the validation to first detect tool errors by checking for an explicit error field on result (e.g., result.error or result.status) and, if not available, reject raw strings that match common error patterns (e.g., start with "Error", "ERR", "Traceback", or contain "exception") before assigning extracted_text; if an error is detected, raise a ValueError with the tool error details instead of proceeding.

coderabbitai · 2026-02-05T07:32:31Z

form-filling-automation/flow.py

+llm = LLM(
+    model="openrouter/minimax/minimax-m2.1",
+    api_key=os.environ.get("OPENROUTER_API_KEY"),
+)


⚠️ Potential issue | 🟡 Minor

Missing API key will cause runtime failures.

os.environ.get("OPENROUTER_API_KEY") returns None if the environment variable is not set. This could lead to cryptic errors when the LLM is invoked. Consider validating or using os.environ["OPENROUTER_API_KEY"] to fail fast with a clear KeyError.

🛡️ Proposed fix for early validation

+OPENROUTER_API_KEY = os.environ.get("OPENROUTER_API_KEY") +if not OPENROUTER_API_KEY: + raise EnvironmentError("OPENROUTER_API_KEY environment variable is required") + llm = LLM( model="openrouter/minimax/minimax-m2.1", - api_key=os.environ.get("OPENROUTER_API_KEY"), + api_key=OPENROUTER_API_KEY, )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

llm = LLM(

model="openrouter/minimax/minimax-m2.1",

api_key=os.environ.get("OPENROUTER_API_KEY"),

)

OPENROUTER_API_KEY = os.environ.get("OPENROUTER_API_KEY")

if not OPENROUTER_API_KEY:

raise EnvironmentError("OPENROUTER_API_KEY environment variable is required")

llm = LLM(

model="openrouter/minimax/minimax-m2.1",

api_key=OPENROUTER_API_KEY,

)

🤖 Prompt for AI Agents

In `@form-filling-automation/flow.py` around lines 25 - 28, The LLM instantiation uses os.environ.get("OPENROUTER_API_KEY") which returns None when the environment variable is missing; update the flow.py code around the llm = LLM(...) creation to validate the OPENROUTER_API_KEY first (either use os.environ["OPENROUTER_API_KEY"] to let Python raise a clear KeyError or explicitly check and raise a descriptive error), then pass the validated non-None key into LLM(...) so runtime failures are deterministic and the error message clearly identifies the missing environment variable.

coderabbitai · 2026-02-05T07:32:31Z

form-filling-automation/flow.py

+        if not self.state.output_path:
+            self.state.output_path = self.state.blank_form_path.replace(
+                ".pdf", "_completed.pdf"
+            )


⚠️ Potential issue | 🟡 Minor

Default output path generation may fail for non-.pdf files.

The .replace(".pdf", "_completed.pdf") approach won't modify paths that don't contain .pdf (e.g., uppercase .PDF or missing extension), resulting in the output potentially overwriting the input. Consider using Path for safer path manipulation.

🛡️ Proposed fix using Path

if not self.state.output_path: - self.state.output_path = self.state.blank_form_path.replace( - ".pdf", "_completed.pdf" - ) + from pathlib import Path + blank_path = Path(self.state.blank_form_path) + self.state.output_path = str( + blank_path.with_stem(blank_path.stem + "_completed") + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if not self.state.output_path:

self.state.output_path = self.state.blank_form_path.replace(

".pdf", "_completed.pdf"

)

if not self.state.output_path:

from pathlib import Path

blank_path = Path(self.state.blank_form_path)

self.state.output_path = str(

blank_path.with_stem(blank_path.stem + "_completed")

)

🤖 Prompt for AI Agents

In `@form-filling-automation/flow.py` around lines 179 - 182, The default output path logic in the block that sets self.state.output_path currently uses string replace on self.state.blank_form_path which fails for uppercase or missing extensions; replace this with pathlib.Path handling: import Path, build p = Path(self.state.blank_form_path), compute new_name = p.stem + "_completed" + (p.suffix if p.suffix.lower() else ".pdf"), then set self.state.output_path = str(p.with_name(new_name)); update the code in the same conditional that references self.state.output_path and self.state.blank_form_path to use these Path-based symbols.

coderabbitai · 2026-02-05T07:32:31Z

form-filling-automation/main.py

+from typing import Any, Dict, Optional
+
+from flow import FormFillingFlow
+from models import load_schema


⚠️ Potential issue | 🟠 Major

Avoid passing an empty output_path when the optional argument is omitted.

output_path defaults to None but is coerced to "", which is not a valid file path and can cause downstream fill to fail or write in unexpected locations. Prefer a deterministic default or keep None and let downstream handle it—just don’t pass an empty string.

✅ Suggested fix (derive a default path)

-from typing import Any, Dict, Optional +from pathlib import Path +from typing import Any, Dict, Optional ... - flow.state.output_path = output_path or "" + if output_path is None: + output_path = str( + Path(blank_form_path).with_name(f"{schema.form_type}_completed.pdf") + ) + flow.state.output_path = output_path

Also applies to: 37-41

🤖 Prompt for AI Agents

In `@form-filling-automation/main.py` around lines 3 - 6, The code coerces a default None output_path into an empty string and passes that into FormFillingFlow/load_schema calls (see output_path variable and FormFillingFlow usage around lines you modified), which can produce invalid file writes; change the call sites so you either leave output_path as None or compute a deterministic default path (e.g., derived from the loaded schema name) and only pass a non-empty string when valid—i.e., avoid passing "" by: if output_path is falsy keep None (do not pass output_path) or replace it with a clearly derived default before invoking FormFillingFlow or downstream functions that consume output_path.

coderabbitai · 2026-02-05T07:32:31Z

form-filling-automation/schemas/w9.yaml

+  - name: individual_sole_proprietor
+    description: "Individual/sole proprietor or single-member LLC checkbox for federal tax classification (Line 3a)"
+    required: false
+
+  - name: c_corporation
+    description: "C Corporation checkbox for federal tax classification (Line 3b)"
+    required: false
+
+  - name: s_corporation
+    description: "S Corporation checkbox for federal tax classification (Line 3c)"
+    required: false
+
+  - name: partnership
+    description: "Partnership checkbox for federal tax classification (Line 3d)"
+    required: false
+
+  - name: trust_estate
+    description: "Trust/estate checkbox for federal tax classification (Line 3e)"
+    required: false
+
+  - name: llc
+    description: "LLC checkbox with tax classification code (Line 3f)"
+    required: false
+
+  - name: other
+    description: "Other federal tax classification (Line 3g)"
+    required: false
+
+  - name: exempt_payee_code
+    description: "Exempt payee code if applicable (Line 4)"
+    required: false
+
+  - name: fatca_exemption_code
+    description: "Exemption from FATCA reporting code if applicable (Line 4)"
+    required: false
+
+  - name: address
+    description: "Address (number, street, and apt. or suite no.) (Line 5)"
+    required: true
+
+  - name: city_state_zip
+    description: "City, state, and ZIP code (Line 6)"
+    required: true
+
+  - name: requester_name_address
+    description: "Requester's name and address (optional) (Line 7)"
+    required: false
+
+  - name: account_numbers
+    description: "List account number(s) here (optional) (Line 8)"
+    required: false
+
+  - name: ssn_1
+    description: "Social security number first 3 digits (Part I - box 1)"
+    required: false
+
+  - name: ssn_2
+    description: "Social security number middle 2 digits (Part I - box 2)"
+    required: false
+
+  - name: ssn_3
+    description: "Social security number last 4 digits (Part I - box 3)"
+    required: false
+
+  - name: ein_1
+    description: "Employer identification number first 2 digits (Part I - box 1)"
+    required: false
+
+  - name: ein_2
+    description: "Employer identification number last 7 digits (Part I - box 2)"
+    required: false


⚠️ Potential issue | 🟠 Major

Require tax classification and TIN presence to avoid invalid W‑9 outputs.

W‑9 forms require a tax classification and a TIN (SSN or EIN). All classification checkboxes and SSN/EIN parts are optional here, so the workflow can complete without mandatory data and generate an invalid form. Add schema constraints or workflow validation to enforce at least one classification and a complete SSN/EIN before filling.

🤖 Prompt for AI Agents

In `@form-filling-automation/schemas/w9.yaml` around lines 13 - 83, The schema currently makes all tax classification checkboxes (individual_sole_proprietor, c_corporation, s_corporation, partnership, trust_estate, llc, other) and TIN parts (ssn_1, ssn_2, ssn_3, ein_1, ein_2) optional, allowing invalid W‑9s; update w9.yaml to require that at least one classification is present (add an anyOf/oneOf validation referencing those checkbox fields) and enforce a complete TIN by requiring either all SSN parts (ssn_1, ssn_2, ssn_3) or all EIN parts (ein_1, ein_2) via an anyOf rule, or implement equivalent workflow-level validation that checks these same conditions before form generation.

coderabbitai · 2026-02-05T07:32:31Z

form-filling-automation/tools.py

+    def _run(
+        self,
+        form_path: str,
+        field_data: Dict[str, Any],
+        output_path: str,
+        context: str = "Generic form",
+    ) -> str:
+        # Validate form file exists
+        if not Path(form_path).exists():
+            return f"Error: Form file not found: {form_path}"
+
+        # Validate field_data is not empty
+        if not field_data:
+            return "Error: No field data provided to fill the form"
+
+        # Validate API key
+        api_key = os.environ.get("DATALAB_API_KEY")
+        if not api_key:
+            return "Error: DATALAB_API_KEY environment variable not set"
+


⚠️ Potential issue | 🟠 Major

Validate output_path before attempting to save.

output_path can be empty (e.g., when callers pass None upstream). This results in invalid or unpredictable writes. Guard early with a clear error.

✅ Suggested fix

def _run( self, form_path: str, field_data: Dict[str, Any], output_path: str, context: str = "Generic form", ) -> str: + # Validate output_path + if not output_path or not str(output_path).strip(): + return "Error: output_path must be provided and non-empty" # Validate form file exists if not Path(form_path).exists(): return f"Error: Form file not found: {form_path}"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _run(

self,

form_path: str,

field_data: Dict[str, Any],

output_path: str,

context: str = "Generic form",

) -> str:

# Validate form file exists

if not Path(form_path).exists():

return f"Error: Form file not found: {form_path}"

# Validate field_data is not empty

if not field_data:

return "Error: No field data provided to fill the form"

# Validate API key

api_key = os.environ.get("DATALAB_API_KEY")

if not api_key:

return "Error: DATALAB_API_KEY environment variable not set"

def _run(

self,

form_path: str,

field_data: Dict[str, Any],

output_path: str,

context: str = "Generic form",

) -> str:

# Validate output_path

if not output_path or not str(output_path).strip():

return "Error: output_path must be provided and non-empty"

# Validate form file exists

if not Path(form_path).exists():

return f"Error: Form file not found: {form_path}"

# Validate field_data is not empty

if not field_data:

return "Error: No field data provided to fill the form"

# Validate API key

api_key = os.environ.get("DATALAB_API_KEY")

if not api_key:

return "Error: DATALAB_API_KEY environment variable not set"

🤖 Prompt for AI Agents

In `@form-filling-automation/tools.py` around lines 77 - 96, In _run, add validation for the output_path parameter before any writes: check that output_path is provided (not None or empty string), resolve it to a Path, ensure its parent directory exists and is writable (or create the parent dir if intended), and return a clear error like "Error: Invalid output_path" if the check fails; update the existing validation block in the _run method (which already checks form_path, field_data, and api_key) to perform this output_path validation immediately after those checks using the output_path variable.

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@form-filling-automation/app.py`:
- Around line 408-414: The call to crew.kickoff currently discards its return
value so any error string from FormFillTool is ignored; change the code around
crew.kickoff(inputs={...}) to capture the result (e.g., result =
crew.kickoff(...)), then validate that result indicates success (check for an
error string or falsy/exception payload) and handle failures by
logging/raising/returning an error to the UI instead of proceeding as success;
update any surrounding logic that assumes success (references: crew.kickoff,
FormFillTool, form_data, blank_form_path, output_path) to short-circuit on error
and propagate a clear error message to the caller.

In `@form-filling-automation/flow.py`:
- Around line 279-291: The result of crew.kickoff (the FormFillTool invocation)
is currently ignored, so capture its return value (e.g., result =
crew.kickoff(...)) and validate it: if the tool returns an error string (like
starting with "Error:" or otherwise indicating failure), set self.state.status
to "failed" (or appropriate error state), do not set
self.state.completed_form_path, and return or raise the error message; only on a
successful result should you set self.state.completed_form_path =
self.state.output_path, set self.state.status = "completed", and return "ok".
Ensure you reference and update the existing symbols crew.kickoff,
self.state.completed_form_path, self.state.status, and self.state.output_path
when implementing this check.

🧹 Nitpick comments (3)

form-filling-automation/flow.py (1)

299-313: Consider using structured logging instead of print statements.

The debug output uses print() which doesn't integrate with logging frameworks. For production use, consider logging.info() or similar.
form-filling-automation/app.py (2)
155-183: Duplicated JSON extraction logic.

This function is nearly identical to FormFillingFlow._extract_json in flow.py. Consider extracting to a shared utility module to avoid duplication and ensure consistent behavior.
♻️ Suggested approach

Create a shared utility, e.g., utils.py:
# utils.py
import json
import re

def extract_json_from_text(text: str) -> dict:
    """Extract JSON from text that may contain markdown code blocks."""
    # ... shared implementation
Then import in both flow.py and app.py:
from utils import extract_json_from_text
286-286: Temporary directory is never cleaned up.

tempfile.mkdtemp() creates a directory that persists after the script ends. Consider using a context manager or registering cleanup for long-running deployments.
♻️ Alternative using TemporaryDirectory context manager
# If restructuring is feasible:
with tempfile.TemporaryDirectory(prefix="form_agent_") as tmp_dir:
    # ... workflow code ...
Or register cleanup with atexit if context manager doesn't fit the Streamlit flow.

coderabbitai · 2026-02-05T07:54:08Z

form-filling-automation/app.py

+                crew.kickoff(
+                    inputs={
+                        "form_data": json.dumps(form_data.to_tool_format(), indent=2),
+                        "blank_form_path": blank_form_path,
+                        "output_path": output_path,
+                    }
+                )


⚠️ Potential issue | 🟠 Major

Missing result validation after form filling.

Similar to the issue in flow.py, the crew.kickoff result is discarded. If FormFillTool returns an error string, the UI will incorrectly report success.

🐛 Proposed fix to validate the fill result

- crew.kickoff( + result = crew.kickoff( inputs={ "form_data": json.dumps(form_data.to_tool_format(), indent=2), "blank_form_path": blank_form_path, "output_path": output_path, } ) + + # Check for tool errors + raw_result = result.raw if result.raw else "" + if raw_result.startswith("Error:"): + raise ValueError(raw_result)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

crew.kickoff(

inputs={

"form_data": json.dumps(form_data.to_tool_format(), indent=2),

"blank_form_path": blank_form_path,

"output_path": output_path,

}

)

result = crew.kickoff(

inputs={

"form_data": json.dumps(form_data.to_tool_format(), indent=2),

"blank_form_path": blank_form_path,

"output_path": output_path,

}

)

# Check for tool errors

raw_result = result.raw if result.raw else ""

if raw_result.startswith("Error:"):

raise ValueError(raw_result)

🤖 Prompt for AI Agents

In `@form-filling-automation/app.py` around lines 408 - 414, The call to crew.kickoff currently discards its return value so any error string from FormFillTool is ignored; change the code around crew.kickoff(inputs={...}) to capture the result (e.g., result = crew.kickoff(...)), then validate that result indicates success (check for an error string or falsy/exception payload) and handle failures by logging/raising/returning an error to the UI instead of proceeding as success; update any surrounding logic that assumes success (references: crew.kickoff, FormFillTool, form_data, blank_form_path, output_path) to short-circuit on error and propagate a clear error message to the caller.

coderabbitai · 2026-02-05T07:54:08Z

form-filling-automation/flow.py

+        crew.kickoff(
+            inputs={
+                "form_data": json.dumps(
+                    self.state.form_data.to_tool_format(), indent=2
+                ),
+                "blank_form_path": self.state.blank_form_path,
+                "output_path": self.state.output_path,
+            }
+        )
+
+        self.state.completed_form_path = self.state.output_path
+        self.state.status = "completed"
+        return "ok"


⚠️ Potential issue | 🟠 Major

Missing result validation after form filling.

The crew.kickoff result is discarded, so tool errors (returned as "Error: ..." strings by FormFillTool) go undetected. The flow will report success even if form filling failed.

🐛 Proposed fix to validate the fill result

- crew.kickoff( + result = crew.kickoff( inputs={ "form_data": json.dumps( self.state.form_data.to_tool_format(), indent=2 ), "blank_form_path": self.state.blank_form_path, "output_path": self.state.output_path, } ) + # Check for tool errors in the result + raw_result = result.raw if result.raw else "" + if raw_result.startswith("Error:"): + self.state.errors.append(raw_result) + return "failed" + self.state.completed_form_path = self.state.output_path self.state.status = "completed" return "ok"

🤖 Prompt for AI Agents

In `@form-filling-automation/flow.py` around lines 279 - 291, The result of crew.kickoff (the FormFillTool invocation) is currently ignored, so capture its return value (e.g., result = crew.kickoff(...)) and validate it: if the tool returns an error string (like starting with "Error:" or otherwise indicating failure), set self.state.status to "failed" (or appropriate error state), do not set self.state.completed_form_path, and return or raise the error message; only on a successful result should you set self.state.completed_form_path = self.state.output_path, set self.state.status = "completed", and return "ok". Ensure you reference and update the existing symbols crew.kickoff, self.state.completed_form_path, self.state.status, and self.state.output_path when implementing this check.

add form automation agentic workflow

85327d7

coderabbitai bot reviewed Feb 5, 2026

View reviewed changes

fix review issues

caa8480

coderabbitai bot reviewed Feb 5, 2026

View reviewed changes

Add Form Automation Agentic Workflow #226

Are you sure you want to change the base?

Add Form Automation Agentic Workflow #226

Uh oh!

Conversation

sitammeur commented Feb 5, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sitammeur commented Feb 5, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 5, 2026 •

edited

Loading