-
Notifications
You must be signed in to change notification settings - Fork 65
Description
Sometimes Executor or Debugger agents could provide wrong lines when calling "replace_code" or "insert_code" tools. We can use fast low-cost llms (as 3.5-haiku or gpt-4o-mini) for checking if code going to be inserted will not break some old code.
Currently we have syntax checker functions (src/utilities/syntax_checker_functions.py) checking if change not going to break syntax of code. It creates copy of file we changing, intorduces change, checks syntax of that temporary file, and if syntax is ok, allows to introduce change to original file.
Such dumb syntax checking can find most of the bad changes, but will not find bad changes that breaking syntax.
We need LLM as a Judge, that will see file before and after change, will see what actually agent wants to change (knowing last agent message or plan for example) and will be able to evaluate if lines to change been selected good.
Such "smart" check shouldbe done after "dumb" check by sntax checkers.