Skip to content

LLM as a Judge checking Debugger/Executor tool calls #41

@Grigorij-Dudnik

Description

@Grigorij-Dudnik

Sometimes Executor or Debugger agents could provide wrong lines when calling "replace_code" or "insert_code" tools. We can use fast low-cost llms (as 3.5-haiku or gpt-4o-mini) for checking if code going to be inserted will not break some old code.

Currently we have syntax checker functions (src/utilities/syntax_checker_functions.py) checking if change not going to break syntax of code. It creates copy of file we changing, intorduces change, checks syntax of that temporary file, and if syntax is ok, allows to introduce change to original file.

Such dumb syntax checking can find most of the bad changes, but will not find bad changes that breaking syntax.

We need LLM as a Judge, that will see file before and after change, will see what actually agent wants to change (knowing last agent message or plan for example) and will be able to evaluate if lines to change been selected good.

Such "smart" check shouldbe done after "dumb" check by sntax checkers.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions