Feedback for `replace_regex` tool #145

chknd1nner · 2025-06-02T12:11:11Z

chknd1nner
Jun 2, 2025

TLDR: It's amazing! But the new prompt that tells the LLM to trust the tool and not verify can cause problems.

Excellent work on the new tool. I now understand why you went down the regex path and how it's way better than my line based token efficient editing tool.

It's fast too. It sped through my tests at incredible speed thanks to the efficiency of regex.

The only issue I found is that it will accept anything as input for the replacement lines, even bad code. And the new instructions tells the LLM not to verify, but to trust the tools "ok" output. Once again, I propose my git diff verification strategy as a token light weight way for the LLM to verify no syntax or indentation errors were inadvertently introduced.

I had Claude write up a memory for my testing session. Here it is:

ReplaceRegexTool Testing Results and Validation Strategy

Overview

The new replace_regex tool represents a major advancement in token-efficient code editing, providing surgical editing capabilities that can achieve 80-90% token savings compared to traditional symbol-based approaches.

Tool Architecture

Primary Class: ReplaceRegexTool (src/serena/agent.py:1404)
Support Class: EditedFileContext - robust context manager for atomic file operations
Key Features:
- Full Python regex support with DOTALL and MULTILINE flags
- Multiple occurrence protection (configurable)
- Atomic operations (all-or-nothing)
- Comprehensive error feedback

Test Results Summary

Performed 5 comprehensive tests following developer guidance:

Simple Unique Replacement ✅ - Modified single print statement with direct escaped regex
Context-Aware Targeting ✅ - Used surrounding context to target specific occurrence among similar ones
Multiple Occurrence Replacement ✅ - Bulk updated all method docstrings with allow_multiple_occurrences=True
Large Chunk with Wildcards ✅ - Replaced 12-line error handling section using .*? wildcards
Targeted Edit Within Method ✅ - Modified specific return statement using context boundaries

Token Efficiency: Achieved ~90% reduction compared to reading/replacing entire symbols.

Critical Discovery: The Replacement Content Validation Gap

The Problem

While the tool excellently validates pattern matching (no matches, multiple matches, file errors), it performs NO validation of replacement content quality:

❌ No syntax checking
❌ No indentation validation
❌ No semantic coherence checks
❌ Blindly accepts any replacement text

Demonstrated Risk

Successfully inserted completely broken code (wrong indentation + syntax errors) that returned "OK" from the tool, creating a false sense of success.

The Token Efficiency Paradox

Use replace_regex: Save 80% tokens ✅
Skip verification: Save 20% more tokens ✅
Later discover errors: Spend 150% tokens debugging ❌
Net result: Worse than traditional approach!

Universal Mitigation Strategy: The Git Diff Approach

Primary Validation: `git diff`

git diff filename.py

Advantages:

Shows exactly what changed with context
Language agnostic (works for any file type)
Visual verification catches obvious errors instantly
Minimal token cost (~10-15 lines of diff output)
Universal availability (git is everywhere)

Secondary Validation: Syntax Checking

python -m py_compile filename.py  # For Python
# Other languages have similar tools

Advantages:

Validates actual compilation
Specific error messages
Minimal token cost (~1-2 lines output)

Recommended Validation Workflow

Execute replace_regex operation
Run git diff filename for visual verification
Optionally run syntax check for language-specific validation
Total validation cost: ~15 tokens vs 100+ for full file read
Efficiency retained: ~85% instead of 90%

Best Practices for Regex Patterns

End Pattern Uniqueness Strategies

Next Method Boundary (Most Reliable):

def target_method.*?(?=    def next_method)

Specific Content End:

def method.*?specific_unique_line.*?return result

Context + Generic Return:

(unique_preceding_context.*?)def method.*?return result

Wildcard Usage

Use .*? (non-greedy) for spanning large sections
Anchor with unique start/end patterns
Size doesn't matter - wildcards handle 1 character to 1000+ lines equally

Strategic Guidelines

Pattern Crafting: Invest time in precise regex patterns to avoid multiple matches
Trust Pattern Matching: The tool's pattern validation is excellent
Don't Trust Replacement Content: Always validate using git diff approach
Progressive Refinement: Use tool error feedback to refine patterns
Fail Fast: Better to get "multiple matches" error than to guess

Impact Assessment

This tool + validation strategy combination creates a robust, token-efficient editing pipeline that:

Achieves massive token savings (80-85% reduction)
Maintains high confidence through visual validation
Works across all programming languages
Provides safety nets against common LLM editing errors

The git diff validation approach solves the "replacement content blind spot" and makes the replace_regex tool suitable for production use in token-constrained environments.

MischaPanch · 2025-06-02T12:45:11Z

MischaPanch
Jun 2, 2025
Maintainer

Thanks for the feedback! I'd rather address the verification problems with an automated linting for the new code, the language servers should be able to do that. Only if the linting fails some info on the generated code, possibly the diff, should be returned. What do you think?

MischaPanch
Jun 2, 2025
Maintainer

Could you pls provide the examples where it generated wrong syntax?

2 replies

chknd1nner Jun 2, 2025
Author

Indentation is probably the main danger. I've seen Claude break my code when working on my tool by using a wrong indentation. class class_name indented by 8 instead of 4 for example when using the old replace_lines tool.

I think syntax error would be pretty rare, but it CAN happen. I have seen wrong syntax generated by an LLM, but only rarely.

replace_regex Tool - Syntax Validation Gap Issue

Summary

The replace_regex tool successfully validates pattern matching but does not validate the syntax or indentation correctness of replacement content. This can lead to broken code being silently introduced.

Issue Demonstration

Original Code (Valid)

def get_result(self):
    """Get the current result."""
    return self.current_result

Regex Replacement Executed

replace_regex(
    regex=r"def get_result\(self\):.*?return self\.current_result",
    repl="""def get_result(self):
        \"\"\"Get the current result.\"\"\"
    return self.current_result  # WRONG INDENTATION!
        extra_random_line_with_bad_syntax === 123 ???""",
    relative_path="test_calculator.py"
)

Tool Response

"OK"

✅ The tool reported success despite introducing syntax errors.

Resulting Code (Invalid)

def get_result(self):
    """Get the current result."""
return self.current_result  # WRONG INDENTATION! (4 spaces instead of 8)
    extra_random_line_with_bad_syntax === 123 ???  # SYNTAX ERROR

Problems Introduced

Indentation Error: The return statement uses 4 spaces instead of 8, breaking Python's indentation rules
Syntax Error: The line extra_random_line_with_bad_syntax === 123 ??? contains invalid Python syntax
Mixed Indentation: Inconsistent spacing throughout the method

Verification of Breakage

Running Python's syntax checker on the resulting file:

$ python -m py_compile test_calculator.py
  File "test_calculator.py", line 31
    return self.current_result  # WRONG INDENTATION!
    ^
IndentationError: unindent does not match any outer indentation level

Root Cause

The replace_regex tool validates:

✅ Pattern matching correctness
✅ Single vs multiple occurrence handling
✅ File accessibility

But does not validate:

❌ Syntax correctness of replacement content
❌ Proper indentation alignment
❌ Language-specific semantic validity

opcode81 Jun 2, 2025
Maintainer

Did an issue such as the one you describe above actually occur, or is this theoretical?

MischaPanch · 2025-06-02T21:15:54Z

MischaPanch
Jun 2, 2025
Maintainer

Yeah, I get it, it's probably because I talk so much about NOT adding indentation for the symbolic editing tools in the same prompt that the LLM got confused. I pushed an update of the prompt about an hour ago, specifically introducing to use indentation in the regex tool. Could you pls try again with the same example?

The model is smart enough to not screw up indentation, if problems still persist, we should be able to solve them with prompting

3 replies

MischaPanch Jun 2, 2025
Maintainer

@opcode81 and I discussed the diff approach but it would blow up the tokens really fast. If we really fail to get reliable performance with prompting, we can consider some kind of linting, though it's debatable if it will work (language and language server version mismatch, frameworks and so on make it hard)

chknd1nner Jun 2, 2025
Author

Ah. It's not going to work with my example because when testing, I deliberately prompted it to introduce an indentation and syntax error. The goal of the test was to determine if the tool would accept a bad input (which it did).

To test whether the prompt fix actually changes LLM behaviour to not make these indentation mistakes–that would require a LOT of real world beta testing over time to see if it happens again. But to be honest, I don't think LLMs will ever generate perfect code 100% of the time.

That's why some form of verification is needed. Let's not get stuck in the technical implementation details like git diff and focus on the high level concept. After a tool assisted edit -> LLM needs a way of verifying the edited file is still good. How should this occur in a token efficient manner?

MischaPanch Jun 2, 2025
Maintainer

That's a question for the future, after we see more examples of such failings. My current answer is: lint the generated code and return the diff or some info only if the linter gives an error. The tool will not accept the edit in such a case and roll it back. Serena is well equipped for doing that since language errors have inbuilt support for linting.

But we would implement this or something else only once the problem is apparent, otherwise it's overengineering.

The next item on the roadmap is a large evaluation on SWE Bench tasks. There we'll see if this problem or other editing problems actually appear. Btw, if you wanna Help out with the experiments there, let me know! We will do them manually in the first stage

Uh oh!

Feedback for replace_regex tool #145

Uh oh!

chknd1nner Jun 2, 2025

ReplaceRegexTool Testing Results and Validation Strategy

Overview

Tool Architecture

Test Results Summary

Critical Discovery: The Replacement Content Validation Gap

The Problem

Demonstrated Risk

The Token Efficiency Paradox

Universal Mitigation Strategy: The Git Diff Approach

Primary Validation: git diff

Secondary Validation: Syntax Checking

Recommended Validation Workflow

Best Practices for Regex Patterns

End Pattern Uniqueness Strategies

Wildcard Usage

Strategic Guidelines

Impact Assessment

Replies: 3 comments · 5 replies

Uh oh!

Uh oh!

MischaPanch Jun 2, 2025 Maintainer

Uh oh!

MischaPanch Jun 2, 2025 Maintainer

Uh oh!

Uh oh!

chknd1nner Jun 2, 2025 Author

replace_regex Tool - Syntax Validation Gap Issue

Summary

Issue Demonstration

Original Code (Valid)

Regex Replacement Executed

Tool Response

Resulting Code (Invalid)

Problems Introduced

Verification of Breakage

Root Cause

Uh oh!

opcode81 Jun 2, 2025 Maintainer

Uh oh!

MischaPanch Jun 2, 2025 Maintainer

Uh oh!

MischaPanch Jun 2, 2025 Maintainer

Uh oh!

chknd1nner Jun 2, 2025 Author

Uh oh!

MischaPanch Jun 2, 2025 Maintainer

Feedback for `replace_regex` tool #145

chknd1nner
Jun 2, 2025

Primary Validation: `git diff`

Replies: 3 comments 5 replies

MischaPanch
Jun 2, 2025
Maintainer

MischaPanch
Jun 2, 2025
Maintainer

chknd1nner Jun 2, 2025
Author

opcode81 Jun 2, 2025
Maintainer

MischaPanch
Jun 2, 2025
Maintainer

MischaPanch Jun 2, 2025
Maintainer

chknd1nner Jun 2, 2025
Author

MischaPanch Jun 2, 2025
Maintainer