LLM as a Judge checking Debugger/Executor tool calls

Sometimes Executor or Debugger agents could provide wrong lines when calling "replace_code" or "insert_code" tools. We can use fast low-cost llms (as 3.5-haiku or gpt-4o-mini) for checking if code going to be inserted will not break some old code.

Currently we have syntax checker functions (src/utilities/syntax_checker_functions.py) checking if change not going to break syntax of code. It creates copy of file we changing, intorduces change, checks syntax of that temporary file, and if syntax is ok, allows to introduce change to original file.

Such dumb syntax checking can find most of the bad changes, but will not find bad changes that breaking syntax. 

We need LLM as a Judge, that will see file before and after change, will see what actually agent wants to change (knowing last agent message or plan for example) and will be able to evaluate if lines to change been selected good.

Such "smart" check shouldbe done after "dumb" check by sntax checkers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM as a Judge checking Debugger/Executor tool calls #41

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

LLM as a Judge checking Debugger/Executor tool calls #41

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions