UN-1824 [FIX] Return HTTP 409 when tool image not found in container registry#1757
UN-1824 [FIX] Return HTTP 409 when tool image not found in container registry#1757hari-kuriakose merged 16 commits intomainfrom
Conversation
…calls inside the try block in runner.py
for more information, see https://pre-commit.ci
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughEnd-to-end detection and propagation of "tool image not found in registry" across API, runner, tool-sandbox, and workflow layers; new exception types and error_code propagation added; API maps tool-not-found to HTTP 500 and adjusts execution/status error handling and conditional metadata/metrics pruning. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant API as API Deployment
participant Sandbox as Tool-Sandbox
participant Runner as Runner/Docker
participant Registry as Container Registry
Client->>API: POST execution request
activate API
API->>Sandbox: request tool run
activate Sandbox
Sandbox->>Runner: HTTP request to runner
activate Runner
Runner->>Registry: pull tool image
activate Registry
Registry-->>Runner: ImageNotFound / 404 / pull stream error
deactivate Registry
Runner->>Runner: raise ToolImageNotFoundError and return structured JSON (error_code)
Runner-->>Sandbox: HTTP error response (JSON with error_code)
deactivate Runner
Sandbox->>API: raise ToolNotFoundInRegistryError / propagate error
deactivate Sandbox
API->>API: contains_tool_not_found_error() detects pattern
API-->>Client: 500 Internal Server Error with status/result
deactivate API
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@runner/src/unstract/runner/clients/docker_client.py`:
- Around line 177-205: The pull-stream error handling in the client.api.pull
loop currently raises ToolImageNotFoundError for any stream "error"; update the
logic in the loop that inspects line.get("error") so it discriminates based on
errorDetail and message: extract error_msg = line.get("error") and err_detail =
line.get("errorDetail", {}), then if err_detail.get("code") == 404 or "manifest
unknown" in error_msg.lower() or "not found" in error_msg.lower() raise
ToolImageNotFoundError(repository, image_tag); otherwise log the full error
(include error_msg and err_detail) and re-raise or propagate a generic exception
(so auth 401, rate-limit 429, network errors are not misclassified as
not-found); keep using image_name_with_tag, repository, image_tag and
ToolImageNotFoundError to locate the code to change.
🧹 Nitpick comments (1)
backend/api_v2/api_deployment_views.py (1)
54-88: Prefer expliciterror_codechecks over string matching.
Since downstream results now carryerror_code, checking it directly avoids brittle text matching and future message changes.♻️ Suggested refinement
- if isinstance(response, dict): - error = response.get("error") - result = response.get("result", []) - else: - error = getattr(response, "error", None) - result = getattr(response, "result", []) or [] + if isinstance(response, dict): + error = response.get("error") + error_code = response.get("error_code") + result = response.get("result", []) + else: + error = getattr(response, "error", None) + error_code = getattr(response, "error_code", None) + result = getattr(response, "result", []) or [] + + if error_code == ToolNotFoundInRegistry.ERROR_CODE: + return True ... - if isinstance(item, dict): - file_error = item.get("error", "") + if isinstance(item, dict): + if item.get("error_code") == ToolNotFoundInRegistry.ERROR_CODE: + return True + file_error = item.get("error", "")
harini-venkataraman
left a comment
There was a problem hiding this comment.
Added some comments, otherwise, LGTM
…t from ToolRunException
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
runner/src/unstract/runner/runner.py (2)
532-556:⚠️ Potential issue | 🟠 MajorToolImageNotFoundError is shadowed by ToolRunException.
ToolImageNotFoundErrorinherits fromToolRunException, so the current order means the specific handler never runs anderror_codeis dropped. Swap the order to preserve the 409 mapping.✅ Fix by reordering exception handlers
- except ToolRunException as te: - self.logger.error( - f"Error while running docker container {container_name}: {te}", - stack_info=True, - exc_info=True, - ) - result = { - "type": "RESULT", - "result": None, - "error": str(te.message), - "status": "ERROR", - } - except ToolImageNotFoundError as e: + except ToolImageNotFoundError as e: self.logger.error( f"Tool image not found in container registry: {e.image_name}:{e.image_tag}", stack_info=True, exc_info=True, ) result = { "type": "RESULT", "result": None, "error": str(e.message), "error_code": ToolImageNotFoundError.ERROR_CODE, "status": "ERROR", } + except ToolRunException as te: + self.logger.error( + f"Error while running docker container {container_name}: {te}", + stack_info=True, + exc_info=True, + ) + result = { + "type": "RESULT", + "result": None, + "error": str(te.message), + "status": "ERROR", + }
483-512:⚠️ Potential issue | 🟠 MajorCapture sidecar-run container and sidecar handles—cleanup is currently a no-op.
The
run_container_with_sidecar()method returnstuple[ContainerInterface, ContainerInterface | None], but the current code ignores this return value. Thecontainerandsidecarvariables, initialized asNoneat the start of the function, are never updated in the sidecar path. This causes the cleanup block at the end (if container: container.cleanup(...)andif sidecar: sidecar.cleanup(...)) to skip execution for sidecar-based runs, potentially leaking containers if the client doesn't handle cleanup internally. Update the sidecar call to capture both handles:Suggested fix
if sidecar_config: - self.client.run_container_with_sidecar(container_config, sidecar_config) + container, sidecar = self.client.run_container_with_sidecar( + container_config, sidecar_config + )
🤖 Fix all issues with AI agents
In `@backend/api_v2/exceptions.py`:
- Around line 95-124: The ToolNotFoundInRegistry exception currently sets
status_code = 500 but the API contract requires a 409 Conflict for missing tool
images; change the class's status_code from 500 to 409 (leave ERROR_CODE,
default_detail and the __init__ logic intact) so ToolNotFoundInRegistry returns
HTTP 409 Conflict to clients consistently with the stated behavior.
🧹 Nitpick comments (1)
backend/api_v2/exceptions.py (1)
127-170: Field name is correct but consider defensive handling for non-dict items in ExecutionResponse.result.The
ExecutionResponse.resultfield (from dto.py) is correctly accessed in the code. However, sinceresultis typed asAny | None, items in the result list could theoretically be non-dict objects. The current code at lines 165–170 only processes dict items, which aligns with existing patterns inremove_result_metadata_keys()andremove_result_metrics()methods. Consider extracting error via attribute access for defensive compatibility with potential DTO objects in the result array:Suggested defensive update
- if isinstance(result, list): - for item in result: - if isinstance(item, dict): - if _check_error_code(item.get("error")): - return True + if isinstance(result, list): + for item in result: + if isinstance(item, dict): + item_error = item.get("error") + else: + item_error = getattr(item, "error", None) + if _check_error_code(item_error): + return True
…oolRunException ToolImageNotFoundError is a subclass of ToolRunException, so it must be caught first to avoid being unreachable dead code (SonarCloud python:S1045). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
runner/src/unstract/runner/runner.py (1)
465-509:⚠️ Potential issue | 🔴 CriticalCapture the return value from
run_container_with_sidecar()to enable sidecar cleanup.Line 508 calls
self.client.run_container_with_sidecar(...)but discards the return tuple, leaving thesidecarvariable asNone. This makes the cleanup at line 569 (if sidecar: sidecar.cleanup(...)) unreachable for the sidecar branch, orphaning the sidecar container lifecycle. Assign the return value:container, sidecar = self.client.run_container_with_sidecar(container_config, sidecar_config).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@runner/src/unstract/runner/runner.py` around lines 465 - 509, The sidecar return value from self.client.run_container_with_sidecar is being ignored so the local sidecar variable remains None and sidecar.cleanup never runs; change the call in the branch where sidecar_config is truthy to capture the returned tuple (e.g., container, sidecar = self.client.run_container_with_sidecar(container_config, sidecar_config)) so the existing cleanup logic that calls sidecar.cleanup(...) can execute; update any subsequent uses that expect the container variable as needed (references: run_container_with_sidecar, sidecar, container, and the later sidecar.cleanup call).
🧹 Nitpick comments (1)
runner/src/unstract/runner/runner.py (1)
498-502: Narrow the label-parser exception scope.Line 501 catches
Exception, which can hide unrelated runtime problems in this block. Restrict to parsing errors fromast.literal_eval.🔧 Proposed fix
- except Exception as e: + except (SyntaxError, ValueError) as e: self.logger.info(f"Invalid labels for logging: {e}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@runner/src/unstract/runner/runner.py` around lines 498 - 502, The current broad except in the label-parsing block hides unrelated errors; narrow it to only handle parsing errors from ast.literal_eval by catching ValueError and SyntaxError instead of Exception for the block that reads Env.TOOL_CONTAINER_LABELS and sets container_config["labels"]; keep the same self.logger.info(...) call for the parse failure so unrelated runtime exceptions will propagate.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@runner/src/unstract/runner/runner.py`:
- Around line 299-300: The shell command currently builds settings JSON into a
single-quoted string using .replace("'", "\\'") which fails for JSON containing
apostrophes; update the code that constructs the command string (the line that
builds f"python main.py --command RUN --settings '{settings_json}' --log-level
DEBUG") to safely escape the settings by importing shlex and using
shlex.quote(settings_json) instead of manual replace, i.e., call
shlex.quote(settings_json) when interpolating settings_json into the command;
ensure you add the shlex import at the top of runner.py and remove the brittle
.replace usage.
---
Outside diff comments:
In `@runner/src/unstract/runner/runner.py`:
- Around line 465-509: The sidecar return value from
self.client.run_container_with_sidecar is being ignored so the local sidecar
variable remains None and sidecar.cleanup never runs; change the call in the
branch where sidecar_config is truthy to capture the returned tuple (e.g.,
container, sidecar = self.client.run_container_with_sidecar(container_config,
sidecar_config)) so the existing cleanup logic that calls sidecar.cleanup(...)
can execute; update any subsequent uses that expect the container variable as
needed (references: run_container_with_sidecar, sidecar, container, and the
later sidecar.cleanup call).
---
Nitpick comments:
In `@runner/src/unstract/runner/runner.py`:
- Around line 498-502: The current broad except in the label-parsing block hides
unrelated errors; narrow it to only handle parsing errors from ast.literal_eval
by catching ValueError and SyntaxError instead of Exception for the block that
reads Env.TOOL_CONTAINER_LABELS and sets container_config["labels"]; keep the
same self.logger.info(...) call for the parse failure so unrelated runtime
exceptions will propagate.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to Reviews > Disable Cache setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (1)
runner/src/unstract/runner/runner.py
Replace brittle manual .replace("'", "\\'") with shlex.quote() which
correctly handles all shell-special characters in the settings JSON string.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Test ResultsSummary
Runner Tests - Full Report
SDK1 Tests - Full Report
|
|



What
Why
execution_status: "ERROR"in the response body, or HTTP 422 with a generic error messageHow
ToolImageNotFoundErrorexception in runner to catch Docker image pull failuresrunner.pyto properly catch the exception during image pullerror_codefield toRunnerContainerRunResponseDTO to propagate specific error typesToolNotFoundInRegistryErrorexception in tool-sandbox that's raised when error_code matchesToolNotFoundInRegistryAPI exception (HTTP 409) in backendapi_deployment_views.pyto detect "not found in container registry" pattern in both top-level and file-level errors_process_final_outputto preserve original error message when secondary exceptions occur during finalizationCan this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)
Database Migrations
Env Config
Relevant Docs
Related Issues or PRs
Dependencies Versions
Notes on Testing
.env:Screenshots
Checklist
I have read and understood the Contribution Guidelines.