How can the result be reformed into the one for SWE-bench evaluation? #75

27yw · 2024-11-11T12:35:29Z

Very wonderful work.
I notice that swe-bench evaluation requires files including

eval.sh: The evaluation script
patch.diff: The model's generated prediction
report.json: Summary of evaluation outcomes for this instance
run_instance.log: A log of SWE-bench evaluation steps
test_output.txt: An output of running eval.sh on patch.diff

And in auto code rover we only get the json and patch.diff
how can we get test_output.txt?

Thanks a lot!

The text was updated successfully, but these errors were encountered:

crhf · 2024-11-11T14:29:19Z

Hi! You would need to first transform the json into jsonl (with a simple python script for example), then evaluate the jsonl with SWE-bench's containerized evaluation. Then in SWE-bench/logs/ you will find these files.

minhnhatle104 · 2025-01-24T00:53:24Z

Hi @crhf, When I run AutoCodeRover on SWE-lite ( using docker image). I receive a file predictions_for_swebench.json

You mean using this file --> transform to jsonl --> evalute with SWE-bench containerized evaluation
For example:

python -m swebench.harness.run_evaluation \
    --dataset_name princeton-nlp/SWE-bench_Lite \
    --predictions_path  **predictions_for_swebench.jsonl**\
    --max_workers 1
   --run_id evalution

the field --predictions will be predictions_for_swebench.jsonl. Is it correct ?

27yw changed the title ~~How can the result form into the one for SWE-bench evaluation?~~ How can the result be reformed into the one for SWE-bench evaluation? Nov 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can the result be reformed into the one for SWE-bench evaluation? #75

How can the result be reformed into the one for SWE-bench evaluation? #75

27yw commented Nov 11, 2024

crhf commented Nov 11, 2024

minhnhatle104 commented Jan 24, 2025

How can the result be reformed into the one for SWE-bench evaluation? #75

How can the result be reformed into the one for SWE-bench evaluation? #75

Comments

27yw commented Nov 11, 2024

crhf commented Nov 11, 2024

minhnhatle104 commented Jan 24, 2025