Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

linediff number-prefixed diff format #2174

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dceluis
Copy link

@dceluis dceluis commented Oct 28, 2024

Hi there,

Would a new diff format be of interest to the project?

I wanted to write a simple format that would allow me to simplify the line-matching logic. It took me a while to get it to work reliably on gpt-4o-mini (it's what I could extensively test with). More advanced models should be able to comply without such a huge system prompt.

Pros:

  • Simple rules by leveraging LLMs recitation capabilities.
  • Client-side: Simpler line matching logic.

Cons:

  • More verbose, although a lot of that is the big system reminder I had to add so that 4o-mini followed the specification.

You can also have a look at https://github.com/dceluis/ln-diff for a deeper look into the decisions behind the format.

Benchmark for context:
image

I understand the implementation is nowhere near mergeable but I figured I might show it anyways for easier consideration.

Cheers!

@CLAassistant
Copy link

CLAassistant commented Oct 28, 2024

CLA assistant check
All committers have signed the CLA.

@Finndersen
Copy link

Hi @dceluis,
When thinking about building my own AI coding agent project, I also assumed that a line-number based editing approach would make sense. However, according to the Aider docs:

GPT is terrible at working with source code line numbers. This is a general observation about any use of line numbers in editing formats, backed up by many quantitative benchmark experiments.

Has this not been the case in your experimentation?

@dceluis
Copy link
Author

dceluis commented Jan 7, 2025

Hi @dceluis,
When thinking about building my own AI coding agent project, I also assumed that a line-number based editing approach would make sense. However, according to the Aider docs:

GPT is terrible at working with source code line numbers. This is a general observation about any use of line numbers in editing formats, backed up by many quantitative benchmark experiments.

Has this not been the case in your experimentation?

Yer & no.

by far the biggest problem i found was convincing the LLM to construct diffs that don't assume that patch hunks will be applied sequentially.
(so the second hunk will have to reference the source code lines +/- the number of lines the previous hunk added/deleted)

this is straightforward to implement in software but LLMs get confused pretty quickly.

my hypothesis was that a coding LLM would generate more correct patches if instructed to reference the original line numbers no matter what.

it was surprisingly hard to convince 4o-mini to do this, though. since there must be conflicting references in the models' training database, https://github.com/google/diff-match-patch/wiki/Unidiff#3-rolling-context . it took a huge prompt but it worked and the logic for parsing the patches is arguably much simpler.

stronger models have less issues, so you might try those and get better results.

so I think the docs are mostly true but could be updated, this is why I made this PR too, to contribute my findings. & still use linediff on a day to day basis, although not through aider :)

https://x.com/dceluis/status/1854601543963525576?t=OuuEDHsjo9Bsr10LBft3og&s=19

https://github.com/dceluis/kznllm.nvim/tree/main

@Finndersen
Copy link

by far the biggest problem i found was convincing the LLM to construct diffs that don't assume that patch hunks will be applied sequentially.

Do you mean that the LLM would always assume that changes would be applied sequentially, so that later changes would have line numbers that don't match the original file?

my hypothesis was that a coding LLM would generate more correct patches if instructed to reference the original line numbers no matter what.

Yes I would've thought so too.. but if it seems to want to account for prior changes, what if you just let it do that? Does it do it accurately?

@Finndersen
Copy link

I wonder if a tool-based approach using line numbers would work well... I read how it's generally less effective due to having to do JSON escaping of the code content, but I think it only reduced the performance by a small amount. Could be worth trying

@Finndersen
Copy link

Also, regarding your diff format:

The REMOVE section line numbers and contents must match the SOURCE file exactly

Isn't it kind of redundant to require both line numbers AND content to match? Wouldn't it be best to have just one or the other? Could just use a number range and not repeat the original content. Or does LLM struggle with this?

@dceluis
Copy link
Author

dceluis commented Jan 7, 2025

by far the biggest problem i found was convincing the LLM to construct diffs that don't assume that patch hunks will be applied sequentially.

Do you mean that the LLM would always assume that changes would be applied sequentially, so that later changes would have line numbers that don't match the original file?

Yes, it tends to do that. And this also hurst its ability to produce code, since a wrong line could be interpreted as a source or destination line.

my hypothesis was that a coding LLM would generate more correct patches if instructed to reference the original line numbers no matter what.

Yes I would've thought so too.. but if it seems to want to account for prior changes, what if you just let it do that? Does it do it accurately?

Not very accurately, hence the Aider doc's notice.

I wonder if a tool-based approach using line numbers would work well... I read how it's generally less effective due to having to do JSON escaping of the code content, but I think it only reduced the performance by a small amount. Could be worth trying

I wouldn't discard it but my prior is that since there isn't much source code in the training database represented as JSON the models would have more difficulties producing quality code.

@dceluis
Copy link
Author

dceluis commented Jan 7, 2025

Also, regarding your diff format:

The REMOVE section line numbers and contents must match the SOURCE file exactly

Isn't it kind of redundant to require both line numbers AND content to match? Wouldn't it be best to have just one or the other? Could just use a number range and not repeat the original content. Or does LLM struggle with this?

for what I could see in my tests having both the line numbers AND the source line helps the models reason. more context is good. also, there's empty lines so it helps clear out many ambiguities.

it's the same reason why i set the destination lines to NOT have line numbers so that the models do not get confused.

I predict that with better models you would need less of these prompting tricks

@Finndersen
Copy link

Finndersen commented Jan 7, 2025

But if it is putting in the exact source line content then the numbers aren't even required at all right? (Assuming the content is unique in the file). Then it basically just becomes like the existing diff method used by aider.

Have you tried an approach of having the LLM only provide the line numbers to replace? I'm tempted to try it out

@dceluis
Copy link
Author

dceluis commented Jan 8, 2025

But if it is putting in the exact source line content then the numbers aren't even required at all right? (Assuming the content is unique in the file). Then it basically just becomes like the existing diff method used by aider.

Have you tried an approach of having the LLM only provide the line numbers to replace? I'm tempted to try it out

I have, although I wish i had documented more of what did and didn't work. IIRC it performs a bit worse because printing out the number + code turns the source-referencing into a recitation exercise: the simplest rule I could enforce to reduce confusion.

But don't refrain from running the benchmark yourself, with any tweaks you think will improve results. I'm most interested in learning what you find!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants