All notable changes to this project will be documented in this file.
Introducing the initial release of SWE-Bench, a novel benchmark that introduces "software engineering as a task". Given a codebase and an issue, a model is tasked with writing a .patch
file that addresses the desired changes.
Please view the README.md
for information on how to run the repository, and check out our paper, SWE-bench: Can Language Models Resolve Real-World GitHub Issues?, for full details on the project.
We will maintain a leaderboard on the SWE-bench public website. We will release details soon on how to submit your generations for evaluation to be included on the leaderboard.