Implement new agent using AutoCodeRover's approach #942

foragerr · 2024-04-09T13:00:28Z

AutoCodeRover from NUS claims 22% on swe-bench-lite.
Their approach constructs an AST from a repo codebase to identify where in the code a patch needs to be applied.

Implement an agent based on ACR's approach.

https://arxiv.org/abs/2404.05427
https://github.com/nus-apr/auto-code-rover

ghost · 2024-04-29T06:35:38Z

It now supports running on GitHub and local issues!

neubig · 2024-05-09T12:54:18Z

I don't think implementing autocoderover is high-priority given that we have better performance! https://xwang.dev/blog/2024/opendevin-codeact-1.0-swebench/

dsemba · 2024-05-11T18:05:43Z

AutoCodeRover authors actually claim to resolve ~22% issues of SWE-bench lite. Why is the blog post https://xwang.dev/blog/2024/opendevin-codeact-1.0-swebench/ saying AutoCodeRover achieves just 16%?

neubig · 2024-05-11T18:08:56Z

@dsemba see here: #1693 (comment)

yuntongzhang · 2024-06-24T15:12:25Z

Sorry for commenting on this closed issue and thank you for your interests in AutoCodeRover!

I would like to update on the pass@1 and pass@3 scores in the original AutoCodeRover paper. Turns out that the SWE-bench evaluation environment used in our original experiments gives underestimated scores due to missing of system-level dependencies. Some correct patches were deemed as wrong after running the SWE-bench acceptance tests in that environment.

Thanks to the SWE-bench-docker project, our original patches were re-evaluated, and the actual pass@1 score is 19% (instead of 16%), and the pass@3 score is 26% (instead of 22%). More details can be found here.

The 19% pass@1 score is also reflected on SWE-bench leaderboard.

0xdevalias · 2024-06-25T05:38:39Z

I don't think implementing autocoderover is high-priority given that we have better performance! xwang.dev/blog/2024/opendevin-codeact-1.0-swebench

@neubig I wouldn't necessarily claim that having a higher overall score means that OpenDevin couldn't benefit even more from techniques used in AutoCodeRover (or some other tool).

IMO, to 'properly' make that assessment you would need to be able to isolate/test how well their methods (eg. AST construction/search) compare against OpenDevin's equivalent methods. It may be that OpenDevin currently does better due to other parts, but could benefit from the new technique used here.

Though perhaps you have already looked deeper than the above comment suggests, and so have a more 'evidenced' view as to why you don't think there would be improvements to be gained.

https://arxiv.org/abs/2404.05427
[..snip..] In our approach called AutoCodeRover, LLMs are combined with sophisticated code search capabilities, ultimately leading to a program modification or patch. In contrast to recent LLM agent approaches from AI researchers and practitioners, our outlook is more software engineering oriented. We work on a program representation (abstract syntax tree) as opposed to viewing a software project as a mere collection of files. Our code search exploits the program structure in the form of classes/methods to enhance LLM's understanding of the issue's root cause, and effectively retrieve a context via iterative search.[..snip..]

In the space of AST parsing / better code 'repomaps', see also:

Explore using stack graphs for better code search / navigation / context / repo map / etc #742

0xdevalias · 2024-06-25T05:46:27Z

Also, looking at the repo, looks like AutoCodeRover is now much higher than OpenDevin on SWE-Bench lite (at least based on the 22% reported in the linked blog post):

https://github.com/nus-apr/auto-code-rover
- [June 20, 2024] AutoCodeRover now achieves 30.67% efficacy (pass@1) on SWE-bench-lite!
https://www.swebench.com/
- It's currently sitting at number 3 on the leaderboard.
- Submission for AutoCodeRover-v20240408 swe-bench/experiments#11
  - The re-evaluated pass@1 score is 19% on SWE-bench-Lite. This PR contains the re-evaluated results from one of the original runs with AutoCodeRover-v20240408.
- Add AutoCodeRover-v20240620 results swe-bench/experiments#31
  - In the past month we have been developing AutoCodeRover, and now it's achieving 92/300 (30.67%) on lite.
  - Second, I noticed that your README mentioned that this version of ACR is not open source. Is there a plan to release the code? It'd be a great service to the community + I'm sure a lot of people would be really interested in understanding the strong results 😄
  - Sure, we will certainly update our citation of SWE-agent! The code may be released at a later date with a report.

Edit: Maybe these results are more relevant/up to date than that OpenDevin blog post though?

https://huggingface.co/spaces/OpenDevin/evaluation

Which seems like OpenDevin CodeActAgent (v1.3) + gpt-4o-2024-05-13 gets 26.67% on SWE-Bench lite; so still getting beaten by the latest AutoCodeRover.

foragerr added the enhancement New feature or request label Apr 9, 2024

rbren added agent framework Strategies for prompting, agent, etc severity:medium Affecting multiple users labels Apr 9, 2024

foragerr mentioned this issue Apr 10, 2024

Wishlist. #993

Closed

neubig added this to OpenDevin Priority Roadmap Apr 22, 2024

neubig added this to the 2024-05 milestone Apr 22, 2024

neubig closed this as not planned Won't fix, can't repro, duplicate, stale May 9, 2024

github-project-automation bot moved this to Done in OpenDevin Priority Roadmap May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement new agent using AutoCodeRover's approach #942

Implement new agent using AutoCodeRover's approach #942

foragerr commented Apr 9, 2024

ghost commented Apr 29, 2024

neubig commented May 9, 2024

dsemba commented May 11, 2024

neubig commented May 11, 2024

yuntongzhang commented Jun 24, 2024

0xdevalias commented Jun 25, 2024 •

edited

Loading

0xdevalias commented Jun 25, 2024 •

edited

Loading

Implement new agent using AutoCodeRover's approach #942

Implement new agent using AutoCodeRover's approach #942

Comments

foragerr commented Apr 9, 2024

ghost commented Apr 29, 2024

neubig commented May 9, 2024

dsemba commented May 11, 2024

neubig commented May 11, 2024

yuntongzhang commented Jun 24, 2024

0xdevalias commented Jun 25, 2024 • edited Loading

0xdevalias commented Jun 25, 2024 • edited Loading

0xdevalias commented Jun 25, 2024 •

edited

Loading

0xdevalias commented Jun 25, 2024 •

edited

Loading