-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Add Form Automation Agentic Workflow #226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughAdds a new form-filling automation system: typed models, Datalab-based document scanning and PDF filling tools, a multi-stage CrewAI flow (scan → transform → fill), a Streamlit UI with event logging, project metadata, and a W-9 schema. Changes
Sequence DiagramsequenceDiagram
participant User
participant UI as Streamlit UI
participant Flow as FormFillingFlow
participant ScanAgent as Scanner Agent
participant TransformAgent as Transformer Agent
participant FillAgent as Filler Agent
participant Datalab as Datalab SDK
participant FS as Filesystem
User->>UI: upload document, blank form, select schema
UI->>Flow: kickoff(document, blank_form, schema)
Flow->>ScanAgent: run scan task
ScanAgent->>Datalab: request OCR/convert
Datalab-->>ScanAgent: extracted_text
ScanAgent-->>Flow: extracted_text
Flow->>TransformAgent: run transform task (text + schema)
TransformAgent->>TransformAgent: map to structured form_data
TransformAgent-->>Flow: form_data (JSON)
Flow->>FillAgent: run fill task (form_data + blank_form)
FillAgent->>Datalab: request PDF form fill
Datalab->>FS: write filled PDF
Datalab-->>FillAgent: success + path
FillAgent-->>Flow: completed_form_path
Flow-->>UI: result (completed_form_path, form_data)
UI-->>User: display preview + download
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🤖 Fix all issues with AI agents
In `@form-filling-automation/app.py`:
- Around line 3-7: The code currently uses the relative string "schemas" which
breaks when the CWD is different; change to resolve the schemas directory
relative to app.py by computing base = os.path.dirname(__file__) and schema_dir
= os.path.join(base, "schemas") (or use Path(__file__).resolve().parent /
"schemas"), then replace any direct "schemas" usages (including the
discovery/loading calls around the code that currently references "schemas" near
the bottom of app.py) to use schema_dir so schema discovery works regardless of
working directory.
- Around line 324-329: The check currently treats any non-empty result.raw as
valid text (in crew.kickoff -> result.raw -> extracted_text), allowing OCR error
strings like "Error: ..." to pass; update the validation to first detect tool
errors by checking for an explicit error field on result (e.g., result.error or
result.status) and, if not available, reject raw strings that match common error
patterns (e.g., start with "Error", "ERR", "Traceback", or contain "exception")
before assigning extracted_text; if an error is detected, raise a ValueError
with the tool error details instead of proceeding.
- Around line 288-295: The code currently builds file paths using
user-controlled filenames (identity_file.name and blank_form_file.name) causing
path traversal risk; update the logic that constructs identity_path and
blank_form_path to use pathlib.Path(identity_file.name).name and
pathlib.Path(blank_form_file.name).name to strip any directory components before
joining with tmp_dir, and add the necessary pathlib import; reference the
variables identity_path, identity_file, blank_form_path, blank_form_file and
tmp_dir when making this change.
In `@form-filling-automation/flow.py`:
- Around line 25-28: The LLM instantiation uses
os.environ.get("OPENROUTER_API_KEY") which returns None when the environment
variable is missing; update the flow.py code around the llm = LLM(...) creation
to validate the OPENROUTER_API_KEY first (either use
os.environ["OPENROUTER_API_KEY"] to let Python raise a clear KeyError or
explicitly check and raise a descriptive error), then pass the validated
non-None key into LLM(...) so runtime failures are deterministic and the error
message clearly identifies the missing environment variable.
- Around line 179-182: The default output path logic in the block that sets
self.state.output_path currently uses string replace on
self.state.blank_form_path which fails for uppercase or missing extensions;
replace this with pathlib.Path handling: import Path, build p =
Path(self.state.blank_form_path), compute new_name = p.stem + "_completed" +
(p.suffix if p.suffix.lower() else ".pdf"), then set self.state.output_path =
str(p.with_name(new_name)); update the code in the same conditional that
references self.state.output_path and self.state.blank_form_path to use these
Path-based symbols.
In `@form-filling-automation/main.py`:
- Around line 3-6: The code coerces a default None output_path into an empty
string and passes that into FormFillingFlow/load_schema calls (see output_path
variable and FormFillingFlow usage around lines you modified), which can produce
invalid file writes; change the call sites so you either leave output_path as
None or compute a deterministic default path (e.g., derived from the loaded
schema name) and only pass a non-empty string when valid—i.e., avoid passing ""
by: if output_path is falsy keep None (do not pass output_path) or replace it
with a clearly derived default before invoking FormFillingFlow or downstream
functions that consume output_path.
In `@form-filling-automation/schemas/w9.yaml`:
- Around line 13-83: The schema currently makes all tax classification
checkboxes (individual_sole_proprietor, c_corporation, s_corporation,
partnership, trust_estate, llc, other) and TIN parts (ssn_1, ssn_2, ssn_3,
ein_1, ein_2) optional, allowing invalid W‑9s; update w9.yaml to require that at
least one classification is present (add an anyOf/oneOf validation referencing
those checkbox fields) and enforce a complete TIN by requiring either all SSN
parts (ssn_1, ssn_2, ssn_3) or all EIN parts (ein_1, ein_2) via an anyOf rule,
or implement equivalent workflow-level validation that checks these same
conditions before form generation.
In `@form-filling-automation/tools.py`:
- Around line 77-96: In _run, add validation for the output_path parameter
before any writes: check that output_path is provided (not None or empty
string), resolve it to a Path, ensure its parent directory exists and is
writable (or create the parent dir if intended), and return a clear error like
"Error: Invalid output_path" if the check fails; update the existing validation
block in the _run method (which already checks form_path, field_data, and
api_key) to perform this output_path validation immediately after those checks
using the output_path variable.
🧹 Nitpick comments (4)
form-filling-automation/models.py (2)
100-110: Non-dict field values are silently dropped.In
from_dict, iffield_infois not adict, it is silently ignored. This could mask malformed input data. Consider logging a warning or raising a validation error for unexpected data structures to aid debugging.💡 Suggested improvement
`@classmethod` def from_dict(cls, data: Dict[str, Any]) -> "GenericFormData": """Create from a dictionary, converting non-string values to strings.""" normalized = {} for field_name, field_info in data.items(): if isinstance(field_info, dict): normalized[field_name] = { "value": cls._to_string(field_info.get("value", "")), "description": cls._to_string(field_info.get("description", "")), } + else: + # Handle simple value format (just a value, no description) + normalized[field_name] = { + "value": cls._to_string(field_info), + "description": "", + } return cls(field_data=normalized)
177-203: Consolidate duplicate loops and improve error visibility.The two near-identical loops for
.yamland.ymlfiles can be combined. Also, the bareexcept Exception: continuesilently swallows all errors, making it difficult to diagnose schema loading issues.♻️ Proposed refactor
- for file in schemas_path.glob("*.yaml"): - try: - schema = FormSchema.from_yaml(str(file)) - schemas.append( - { - "file": file.name, - "path": str(file), - "form_type": schema.form_type, - "form_name": schema.form_name, - } - ) - except Exception: - continue - - for file in schemas_path.glob("*.yml"): - try: - schema = FormSchema.from_yaml(str(file)) - schemas.append( - { - "file": file.name, - "path": str(file), - "form_type": schema.form_type, - "form_name": schema.form_name, - } - ) - except Exception: - continue + import itertools + import logging + + logger = logging.getLogger(__name__) + + yaml_files = itertools.chain( + schemas_path.glob("*.yaml"), + schemas_path.glob("*.yml"), + ) + for file in yaml_files: + try: + schema = FormSchema.from_yaml(str(file)) + schemas.append( + { + "file": file.name, + "path": str(file), + "form_type": schema.form_type, + "form_name": schema.form_name, + } + ) + except Exception as e: + logger.warning(f"Failed to load schema from {file}: {e}") + continueform-filling-automation/flow.py (2)
224-226: Consider defensive null check forform_schema.While
initialize()validatesform_schema, accessing it directly without a null check here could cause anAttributeErrorif the flow state is manipulated unexpectedly. A defensive check would improve robustness.💡 Suggested improvement
- schema = self.state.form_schema + schema = self.state.form_schema + if not schema: + self.state.errors.append("Form schema not available") + return "failed" transformer_agent = create_form_transformer_agent(schema)
340-349: Greedy regex may capture too much text.The pattern
\{[\s\S]*\}is greedy and matches from the first{to the last}in the entire text. If the text contains multiple JSON objects or trailing content, this could capture invalid JSON. Thejson.loadscall will reject it, but using a non-greedy pattern\{[\s\S]*?\}might improve first-match accuracy.💡 Suggested improvement
# Try to find raw JSON object - brace_pattern = r"\{[\s\S]*\}" + brace_pattern = r"\{[\s\S]*?\}" matches = re.findall(brace_pattern, text)
| import base64 | ||
| from datetime import datetime | ||
| import json | ||
| import os | ||
| import tempfile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolve the schema directory relative to app.py to avoid missing schemas.
Using "schemas" relies on the current working directory. If the app is launched from the repo root (a common streamlit run form-filling-automation/app.py pattern), schema discovery fails and the UI cannot proceed.
🔧 Suggested fix
import base64
from datetime import datetime
import json
import os
import tempfile
+from pathlib import Path
...
- available_schemas = list_available_schemas("schemas")
+ schemas_dir = Path(__file__).resolve().parent / "schemas"
+ available_schemas = list_available_schemas(str(schemas_dir))Also applies to: 190-191
🤖 Prompt for AI Agents
In `@form-filling-automation/app.py` around lines 3 - 7, The code currently uses
the relative string "schemas" which breaks when the CWD is different; change to
resolve the schemas directory relative to app.py by computing base =
os.path.dirname(__file__) and schema_dir = os.path.join(base, "schemas") (or use
Path(__file__).resolve().parent / "schemas"), then replace any direct "schemas"
usages (including the discovery/loading calls around the code that currently
references "schemas" near the bottom of app.py) to use schema_dir so schema
discovery works regardless of working directory.
| identity_path = os.path.join(tmp_dir, identity_file.name) | ||
| with open(identity_path, "wb") as f: | ||
| f.write(identity_file.getvalue()) | ||
|
|
||
| blank_form_path = os.path.join(tmp_dir, blank_form_file.name) | ||
| with open(blank_form_path, "wb") as f: | ||
| f.write(blank_form_file.getvalue()) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sanitize uploaded filenames to prevent path traversal.
Uploaded file names are user‑controlled. Joining them directly into a temp path allows path traversal (e.g., ../../...) and writes outside the temp directory. Use Path(...).name to strip directory segments. (Requires pathlib.Path import.)
🔒 Suggested fix
-identity_path = os.path.join(tmp_dir, identity_file.name)
+identity_name = Path(identity_file.name).name
+identity_path = os.path.join(tmp_dir, identity_name)
...
-blank_form_path = os.path.join(tmp_dir, blank_form_file.name)
+blank_form_name = Path(blank_form_file.name).name
+blank_form_path = os.path.join(tmp_dir, blank_form_name)🤖 Prompt for AI Agents
In `@form-filling-automation/app.py` around lines 288 - 295, The code currently
builds file paths using user-controlled filenames (identity_file.name and
blank_form_file.name) causing path traversal risk; update the logic that
constructs identity_path and blank_form_path to use
pathlib.Path(identity_file.name).name and
pathlib.Path(blank_form_file.name).name to strip any directory components before
joining with tmp_dir, and add the necessary pathlib import; reference the
variables identity_path, identity_file, blank_form_path, blank_form_file and
tmp_dir when making this change.
| result = crew.kickoff(inputs={"document_file": identity_path}) | ||
|
|
||
| extracted_text = result.raw if result.raw else "" | ||
| if not extracted_text: | ||
| raise ValueError("No text extracted from document") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Detect tool error strings instead of treating them as extracted text.
The OCR tool returns error messages as strings. The current check only verifies non‑empty output, so "Error: ..." passes and the workflow continues as if successful.
🧰 Suggested fix
- extracted_text = result.raw if result.raw else ""
- if not extracted_text:
- raise ValueError("No text extracted from document")
+ extracted_text = result.raw if result.raw else ""
+ if not extracted_text:
+ raise ValueError("No text extracted from document")
+ if extracted_text.strip().startswith("Error:"):
+ raise ValueError(extracted_text)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| result = crew.kickoff(inputs={"document_file": identity_path}) | |
| extracted_text = result.raw if result.raw else "" | |
| if not extracted_text: | |
| raise ValueError("No text extracted from document") | |
| result = crew.kickoff(inputs={"document_file": identity_path}) | |
| extracted_text = result.raw if result.raw else "" | |
| if not extracted_text: | |
| raise ValueError("No text extracted from document") | |
| if extracted_text.strip().startswith("Error:"): | |
| raise ValueError(extracted_text) | |
🤖 Prompt for AI Agents
In `@form-filling-automation/app.py` around lines 324 - 329, The check currently
treats any non-empty result.raw as valid text (in crew.kickoff -> result.raw ->
extracted_text), allowing OCR error strings like "Error: ..." to pass; update
the validation to first detect tool errors by checking for an explicit error
field on result (e.g., result.error or result.status) and, if not available,
reject raw strings that match common error patterns (e.g., start with "Error",
"ERR", "Traceback", or contain "exception") before assigning extracted_text; if
an error is detected, raise a ValueError with the tool error details instead of
proceeding.
| llm = LLM( | ||
| model="openrouter/minimax/minimax-m2.1", | ||
| api_key=os.environ.get("OPENROUTER_API_KEY"), | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing API key will cause runtime failures.
os.environ.get("OPENROUTER_API_KEY") returns None if the environment variable is not set. This could lead to cryptic errors when the LLM is invoked. Consider validating or using os.environ["OPENROUTER_API_KEY"] to fail fast with a clear KeyError.
🛡️ Proposed fix for early validation
+OPENROUTER_API_KEY = os.environ.get("OPENROUTER_API_KEY")
+if not OPENROUTER_API_KEY:
+ raise EnvironmentError("OPENROUTER_API_KEY environment variable is required")
+
llm = LLM(
model="openrouter/minimax/minimax-m2.1",
- api_key=os.environ.get("OPENROUTER_API_KEY"),
+ api_key=OPENROUTER_API_KEY,
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| llm = LLM( | |
| model="openrouter/minimax/minimax-m2.1", | |
| api_key=os.environ.get("OPENROUTER_API_KEY"), | |
| ) | |
| OPENROUTER_API_KEY = os.environ.get("OPENROUTER_API_KEY") | |
| if not OPENROUTER_API_KEY: | |
| raise EnvironmentError("OPENROUTER_API_KEY environment variable is required") | |
| llm = LLM( | |
| model="openrouter/minimax/minimax-m2.1", | |
| api_key=OPENROUTER_API_KEY, | |
| ) |
🤖 Prompt for AI Agents
In `@form-filling-automation/flow.py` around lines 25 - 28, The LLM instantiation
uses os.environ.get("OPENROUTER_API_KEY") which returns None when the
environment variable is missing; update the flow.py code around the llm =
LLM(...) creation to validate the OPENROUTER_API_KEY first (either use
os.environ["OPENROUTER_API_KEY"] to let Python raise a clear KeyError or
explicitly check and raise a descriptive error), then pass the validated
non-None key into LLM(...) so runtime failures are deterministic and the error
message clearly identifies the missing environment variable.
| if not self.state.output_path: | ||
| self.state.output_path = self.state.blank_form_path.replace( | ||
| ".pdf", "_completed.pdf" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default output path generation may fail for non-.pdf files.
The .replace(".pdf", "_completed.pdf") approach won't modify paths that don't contain .pdf (e.g., uppercase .PDF or missing extension), resulting in the output potentially overwriting the input. Consider using Path for safer path manipulation.
🛡️ Proposed fix using Path
if not self.state.output_path:
- self.state.output_path = self.state.blank_form_path.replace(
- ".pdf", "_completed.pdf"
- )
+ from pathlib import Path
+ blank_path = Path(self.state.blank_form_path)
+ self.state.output_path = str(
+ blank_path.with_stem(blank_path.stem + "_completed")
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if not self.state.output_path: | |
| self.state.output_path = self.state.blank_form_path.replace( | |
| ".pdf", "_completed.pdf" | |
| ) | |
| if not self.state.output_path: | |
| from pathlib import Path | |
| blank_path = Path(self.state.blank_form_path) | |
| self.state.output_path = str( | |
| blank_path.with_stem(blank_path.stem + "_completed") | |
| ) |
🤖 Prompt for AI Agents
In `@form-filling-automation/flow.py` around lines 179 - 182, The default output
path logic in the block that sets self.state.output_path currently uses string
replace on self.state.blank_form_path which fails for uppercase or missing
extensions; replace this with pathlib.Path handling: import Path, build p =
Path(self.state.blank_form_path), compute new_name = p.stem + "_completed" +
(p.suffix if p.suffix.lower() else ".pdf"), then set self.state.output_path =
str(p.with_name(new_name)); update the code in the same conditional that
references self.state.output_path and self.state.blank_form_path to use these
Path-based symbols.
| from typing import Any, Dict, Optional | ||
|
|
||
| from flow import FormFillingFlow | ||
| from models import load_schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid passing an empty output_path when the optional argument is omitted.
output_path defaults to None but is coerced to "", which is not a valid file path and can cause downstream fill to fail or write in unexpected locations. Prefer a deterministic default or keep None and let downstream handle it—just don’t pass an empty string.
✅ Suggested fix (derive a default path)
-from typing import Any, Dict, Optional
+from pathlib import Path
+from typing import Any, Dict, Optional
...
- flow.state.output_path = output_path or ""
+ if output_path is None:
+ output_path = str(
+ Path(blank_form_path).with_name(f"{schema.form_type}_completed.pdf")
+ )
+ flow.state.output_path = output_pathAlso applies to: 37-41
🤖 Prompt for AI Agents
In `@form-filling-automation/main.py` around lines 3 - 6, The code coerces a
default None output_path into an empty string and passes that into
FormFillingFlow/load_schema calls (see output_path variable and FormFillingFlow
usage around lines you modified), which can produce invalid file writes; change
the call sites so you either leave output_path as None or compute a
deterministic default path (e.g., derived from the loaded schema name) and only
pass a non-empty string when valid—i.e., avoid passing "" by: if output_path is
falsy keep None (do not pass output_path) or replace it with a clearly derived
default before invoking FormFillingFlow or downstream functions that consume
output_path.
| - name: individual_sole_proprietor | ||
| description: "Individual/sole proprietor or single-member LLC checkbox for federal tax classification (Line 3a)" | ||
| required: false | ||
|
|
||
| - name: c_corporation | ||
| description: "C Corporation checkbox for federal tax classification (Line 3b)" | ||
| required: false | ||
|
|
||
| - name: s_corporation | ||
| description: "S Corporation checkbox for federal tax classification (Line 3c)" | ||
| required: false | ||
|
|
||
| - name: partnership | ||
| description: "Partnership checkbox for federal tax classification (Line 3d)" | ||
| required: false | ||
|
|
||
| - name: trust_estate | ||
| description: "Trust/estate checkbox for federal tax classification (Line 3e)" | ||
| required: false | ||
|
|
||
| - name: llc | ||
| description: "LLC checkbox with tax classification code (Line 3f)" | ||
| required: false | ||
|
|
||
| - name: other | ||
| description: "Other federal tax classification (Line 3g)" | ||
| required: false | ||
|
|
||
| - name: exempt_payee_code | ||
| description: "Exempt payee code if applicable (Line 4)" | ||
| required: false | ||
|
|
||
| - name: fatca_exemption_code | ||
| description: "Exemption from FATCA reporting code if applicable (Line 4)" | ||
| required: false | ||
|
|
||
| - name: address | ||
| description: "Address (number, street, and apt. or suite no.) (Line 5)" | ||
| required: true | ||
|
|
||
| - name: city_state_zip | ||
| description: "City, state, and ZIP code (Line 6)" | ||
| required: true | ||
|
|
||
| - name: requester_name_address | ||
| description: "Requester's name and address (optional) (Line 7)" | ||
| required: false | ||
|
|
||
| - name: account_numbers | ||
| description: "List account number(s) here (optional) (Line 8)" | ||
| required: false | ||
|
|
||
| - name: ssn_1 | ||
| description: "Social security number first 3 digits (Part I - box 1)" | ||
| required: false | ||
|
|
||
| - name: ssn_2 | ||
| description: "Social security number middle 2 digits (Part I - box 2)" | ||
| required: false | ||
|
|
||
| - name: ssn_3 | ||
| description: "Social security number last 4 digits (Part I - box 3)" | ||
| required: false | ||
|
|
||
| - name: ein_1 | ||
| description: "Employer identification number first 2 digits (Part I - box 1)" | ||
| required: false | ||
|
|
||
| - name: ein_2 | ||
| description: "Employer identification number last 7 digits (Part I - box 2)" | ||
| required: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Require tax classification and TIN presence to avoid invalid W‑9 outputs.
W‑9 forms require a tax classification and a TIN (SSN or EIN). All classification checkboxes and SSN/EIN parts are optional here, so the workflow can complete without mandatory data and generate an invalid form. Add schema constraints or workflow validation to enforce at least one classification and a complete SSN/EIN before filling.
🤖 Prompt for AI Agents
In `@form-filling-automation/schemas/w9.yaml` around lines 13 - 83, The schema
currently makes all tax classification checkboxes (individual_sole_proprietor,
c_corporation, s_corporation, partnership, trust_estate, llc, other) and TIN
parts (ssn_1, ssn_2, ssn_3, ein_1, ein_2) optional, allowing invalid W‑9s;
update w9.yaml to require that at least one classification is present (add an
anyOf/oneOf validation referencing those checkbox fields) and enforce a complete
TIN by requiring either all SSN parts (ssn_1, ssn_2, ssn_3) or all EIN parts
(ein_1, ein_2) via an anyOf rule, or implement equivalent workflow-level
validation that checks these same conditions before form generation.
| def _run( | ||
| self, | ||
| form_path: str, | ||
| field_data: Dict[str, Any], | ||
| output_path: str, | ||
| context: str = "Generic form", | ||
| ) -> str: | ||
| # Validate form file exists | ||
| if not Path(form_path).exists(): | ||
| return f"Error: Form file not found: {form_path}" | ||
|
|
||
| # Validate field_data is not empty | ||
| if not field_data: | ||
| return "Error: No field data provided to fill the form" | ||
|
|
||
| # Validate API key | ||
| api_key = os.environ.get("DATALAB_API_KEY") | ||
| if not api_key: | ||
| return "Error: DATALAB_API_KEY environment variable not set" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate output_path before attempting to save.
output_path can be empty (e.g., when callers pass None upstream). This results in invalid or unpredictable writes. Guard early with a clear error.
✅ Suggested fix
def _run(
self,
form_path: str,
field_data: Dict[str, Any],
output_path: str,
context: str = "Generic form",
) -> str:
+ # Validate output_path
+ if not output_path or not str(output_path).strip():
+ return "Error: output_path must be provided and non-empty"
# Validate form file exists
if not Path(form_path).exists():
return f"Error: Form file not found: {form_path}"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _run( | |
| self, | |
| form_path: str, | |
| field_data: Dict[str, Any], | |
| output_path: str, | |
| context: str = "Generic form", | |
| ) -> str: | |
| # Validate form file exists | |
| if not Path(form_path).exists(): | |
| return f"Error: Form file not found: {form_path}" | |
| # Validate field_data is not empty | |
| if not field_data: | |
| return "Error: No field data provided to fill the form" | |
| # Validate API key | |
| api_key = os.environ.get("DATALAB_API_KEY") | |
| if not api_key: | |
| return "Error: DATALAB_API_KEY environment variable not set" | |
| def _run( | |
| self, | |
| form_path: str, | |
| field_data: Dict[str, Any], | |
| output_path: str, | |
| context: str = "Generic form", | |
| ) -> str: | |
| # Validate output_path | |
| if not output_path or not str(output_path).strip(): | |
| return "Error: output_path must be provided and non-empty" | |
| # Validate form file exists | |
| if not Path(form_path).exists(): | |
| return f"Error: Form file not found: {form_path}" | |
| # Validate field_data is not empty | |
| if not field_data: | |
| return "Error: No field data provided to fill the form" | |
| # Validate API key | |
| api_key = os.environ.get("DATALAB_API_KEY") | |
| if not api_key: | |
| return "Error: DATALAB_API_KEY environment variable not set" |
🤖 Prompt for AI Agents
In `@form-filling-automation/tools.py` around lines 77 - 96, In _run, add
validation for the output_path parameter before any writes: check that
output_path is provided (not None or empty string), resolve it to a Path, ensure
its parent directory exists and is writable (or create the parent dir if
intended), and return a clear error like "Error: Invalid output_path" if the
check fails; update the existing validation block in the _run method (which
already checks form_path, field_data, and api_key) to perform this output_path
validation immediately after those checks using the output_path variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@form-filling-automation/app.py`:
- Around line 408-414: The call to crew.kickoff currently discards its return
value so any error string from FormFillTool is ignored; change the code around
crew.kickoff(inputs={...}) to capture the result (e.g., result =
crew.kickoff(...)), then validate that result indicates success (check for an
error string or falsy/exception payload) and handle failures by
logging/raising/returning an error to the UI instead of proceeding as success;
update any surrounding logic that assumes success (references: crew.kickoff,
FormFillTool, form_data, blank_form_path, output_path) to short-circuit on error
and propagate a clear error message to the caller.
In `@form-filling-automation/flow.py`:
- Around line 279-291: The result of crew.kickoff (the FormFillTool invocation)
is currently ignored, so capture its return value (e.g., result =
crew.kickoff(...)) and validate it: if the tool returns an error string (like
starting with "Error:" or otherwise indicating failure), set self.state.status
to "failed" (or appropriate error state), do not set
self.state.completed_form_path, and return or raise the error message; only on a
successful result should you set self.state.completed_form_path =
self.state.output_path, set self.state.status = "completed", and return "ok".
Ensure you reference and update the existing symbols crew.kickoff,
self.state.completed_form_path, self.state.status, and self.state.output_path
when implementing this check.
🧹 Nitpick comments (3)
form-filling-automation/flow.py (1)
299-313: Consider using structured logging instead of print statements.The debug output uses
print()which doesn't integrate with logging frameworks. For production use, considerlogging.info()or similar.form-filling-automation/app.py (2)
155-183: Duplicated JSON extraction logic.This function is nearly identical to
FormFillingFlow._extract_jsoninflow.py. Consider extracting to a shared utility module to avoid duplication and ensure consistent behavior.♻️ Suggested approach
Create a shared utility, e.g.,
utils.py:# utils.py import json import re def extract_json_from_text(text: str) -> dict: """Extract JSON from text that may contain markdown code blocks.""" # ... shared implementationThen import in both
flow.pyandapp.py:from utils import extract_json_from_text
286-286: Temporary directory is never cleaned up.
tempfile.mkdtemp()creates a directory that persists after the script ends. Consider using a context manager or registering cleanup for long-running deployments.♻️ Alternative using TemporaryDirectory context manager
# If restructuring is feasible: with tempfile.TemporaryDirectory(prefix="form_agent_") as tmp_dir: # ... workflow code ...Or register cleanup with
atexitif context manager doesn't fit the Streamlit flow.
| crew.kickoff( | ||
| inputs={ | ||
| "form_data": json.dumps(form_data.to_tool_format(), indent=2), | ||
| "blank_form_path": blank_form_path, | ||
| "output_path": output_path, | ||
| } | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing result validation after form filling.
Similar to the issue in flow.py, the crew.kickoff result is discarded. If FormFillTool returns an error string, the UI will incorrectly report success.
🐛 Proposed fix to validate the fill result
- crew.kickoff(
+ result = crew.kickoff(
inputs={
"form_data": json.dumps(form_data.to_tool_format(), indent=2),
"blank_form_path": blank_form_path,
"output_path": output_path,
}
)
+
+ # Check for tool errors
+ raw_result = result.raw if result.raw else ""
+ if raw_result.startswith("Error:"):
+ raise ValueError(raw_result)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| crew.kickoff( | |
| inputs={ | |
| "form_data": json.dumps(form_data.to_tool_format(), indent=2), | |
| "blank_form_path": blank_form_path, | |
| "output_path": output_path, | |
| } | |
| ) | |
| result = crew.kickoff( | |
| inputs={ | |
| "form_data": json.dumps(form_data.to_tool_format(), indent=2), | |
| "blank_form_path": blank_form_path, | |
| "output_path": output_path, | |
| } | |
| ) | |
| # Check for tool errors | |
| raw_result = result.raw if result.raw else "" | |
| if raw_result.startswith("Error:"): | |
| raise ValueError(raw_result) |
🤖 Prompt for AI Agents
In `@form-filling-automation/app.py` around lines 408 - 414, The call to
crew.kickoff currently discards its return value so any error string from
FormFillTool is ignored; change the code around crew.kickoff(inputs={...}) to
capture the result (e.g., result = crew.kickoff(...)), then validate that result
indicates success (check for an error string or falsy/exception payload) and
handle failures by logging/raising/returning an error to the UI instead of
proceeding as success; update any surrounding logic that assumes success
(references: crew.kickoff, FormFillTool, form_data, blank_form_path,
output_path) to short-circuit on error and propagate a clear error message to
the caller.
| crew.kickoff( | ||
| inputs={ | ||
| "form_data": json.dumps( | ||
| self.state.form_data.to_tool_format(), indent=2 | ||
| ), | ||
| "blank_form_path": self.state.blank_form_path, | ||
| "output_path": self.state.output_path, | ||
| } | ||
| ) | ||
|
|
||
| self.state.completed_form_path = self.state.output_path | ||
| self.state.status = "completed" | ||
| return "ok" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing result validation after form filling.
The crew.kickoff result is discarded, so tool errors (returned as "Error: ..." strings by FormFillTool) go undetected. The flow will report success even if form filling failed.
🐛 Proposed fix to validate the fill result
- crew.kickoff(
+ result = crew.kickoff(
inputs={
"form_data": json.dumps(
self.state.form_data.to_tool_format(), indent=2
),
"blank_form_path": self.state.blank_form_path,
"output_path": self.state.output_path,
}
)
+ # Check for tool errors in the result
+ raw_result = result.raw if result.raw else ""
+ if raw_result.startswith("Error:"):
+ self.state.errors.append(raw_result)
+ return "failed"
+
self.state.completed_form_path = self.state.output_path
self.state.status = "completed"
return "ok"🤖 Prompt for AI Agents
In `@form-filling-automation/flow.py` around lines 279 - 291, The result of
crew.kickoff (the FormFillTool invocation) is currently ignored, so capture its
return value (e.g., result = crew.kickoff(...)) and validate it: if the tool
returns an error string (like starting with "Error:" or otherwise indicating
failure), set self.state.status to "failed" (or appropriate error state), do not
set self.state.completed_form_path, and return or raise the error message; only
on a successful result should you set self.state.completed_form_path =
self.state.output_path, set self.state.status = "completed", and return "ok".
Ensure you reference and update the existing symbols crew.kickoff,
self.state.completed_form_path, self.state.status, and self.state.output_path
when implementing this check.
Summary by CodeRabbit