Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement new agent using AutoCodeRover's approach #942

Closed
foragerr opened this issue Apr 9, 2024 · 7 comments
Closed

Implement new agent using AutoCodeRover's approach #942

foragerr opened this issue Apr 9, 2024 · 7 comments
Labels
agent framework Strategies for prompting, agent, etc enhancement New feature or request severity:medium Affecting multiple users
Milestone

Comments

@foragerr
Copy link
Collaborator

foragerr commented Apr 9, 2024

AutoCodeRover from NUS claims 22% on swe-bench-lite.
Their approach constructs an AST from a repo codebase to identify where in the code a patch needs to be applied.

Implement an agent based on ACR's approach.

https://arxiv.org/abs/2404.05427
https://github.com/nus-apr/auto-code-rover

@foragerr foragerr added the enhancement New feature or request label Apr 9, 2024
@rbren rbren added agent framework Strategies for prompting, agent, etc severity:medium Affecting multiple users labels Apr 9, 2024
@foragerr foragerr mentioned this issue Apr 10, 2024
@neubig neubig added this to the 2024-05 milestone Apr 22, 2024
@ghost
Copy link

ghost commented Apr 29, 2024

It now supports running on GitHub and local issues!

@neubig
Copy link
Contributor

neubig commented May 9, 2024

I don't think implementing autocoderover is high-priority given that we have better performance! https://xwang.dev/blog/2024/opendevin-codeact-1.0-swebench/

@neubig neubig closed this as not planned Won't fix, can't repro, duplicate, stale May 9, 2024
@dsemba
Copy link

dsemba commented May 11, 2024

AutoCodeRover authors actually claim to resolve ~22% issues of SWE-bench lite. Why is the blog post https://xwang.dev/blog/2024/opendevin-codeact-1.0-swebench/ saying AutoCodeRover achieves just 16%?

@neubig
Copy link
Contributor

neubig commented May 11, 2024

@dsemba see here: #1693 (comment)

@yuntongzhang
Copy link

Sorry for commenting on this closed issue and thank you for your interests in AutoCodeRover!

I would like to update on the pass@1 and pass@3 scores in the original AutoCodeRover paper. Turns out that the SWE-bench evaluation environment used in our original experiments gives underestimated scores due to missing of system-level dependencies. Some correct patches were deemed as wrong after running the SWE-bench acceptance tests in that environment.

Thanks to the SWE-bench-docker project, our original patches were re-evaluated, and the actual pass@1 score is 19% (instead of 16%), and the pass@3 score is 26% (instead of 22%). More details can be found here.

The 19% pass@1 score is also reflected on SWE-bench leaderboard.

@0xdevalias
Copy link

0xdevalias commented Jun 25, 2024

I don't think implementing autocoderover is high-priority given that we have better performance! xwang.dev/blog/2024/opendevin-codeact-1.0-swebench

@neubig I wouldn't necessarily claim that having a higher overall score means that OpenDevin couldn't benefit even more from techniques used in AutoCodeRover (or some other tool).

IMO, to 'properly' make that assessment you would need to be able to isolate/test how well their methods (eg. AST construction/search) compare against OpenDevin's equivalent methods. It may be that OpenDevin currently does better due to other parts, but could benefit from the new technique used here.

Though perhaps you have already looked deeper than the above comment suggests, and so have a more 'evidenced' view as to why you don't think there would be improvements to be gained.

  • https://arxiv.org/abs/2404.05427
  • [..snip..] In our approach called AutoCodeRover, LLMs are combined with sophisticated code search capabilities, ultimately leading to a program modification or patch. In contrast to recent LLM agent approaches from AI researchers and practitioners, our outlook is more software engineering oriented. We work on a program representation (abstract syntax tree) as opposed to viewing a software project as a mere collection of files. Our code search exploits the program structure in the form of classes/methods to enhance LLM's understanding of the issue's root cause, and effectively retrieve a context via iterative search.[..snip..]


In the space of AST parsing / better code 'repomaps', see also:

@0xdevalias
Copy link

0xdevalias commented Jun 25, 2024

Also, looking at the repo, looks like AutoCodeRover is now much higher than OpenDevin on SWE-Bench lite (at least based on the 22% reported in the linked blog post):


Edit: Maybe these results are more relevant/up to date than that OpenDevin blog post though?

Which seems like OpenDevin CodeActAgent (v1.3) + gpt-4o-2024-05-13 gets 26.67% on SWE-Bench lite; so still getting beaten by the latest AutoCodeRover.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent framework Strategies for prompting, agent, etc enhancement New feature or request severity:medium Affecting multiple users
Projects
None yet
Development

No branches or pull requests

6 participants