Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ldv regression benchmarks to incremental benchmarks #35

Draft
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

jerhard
Copy link
Member

@jerhard jerhard commented Aug 11, 2022

This PR imports benchmarks that were taken from https://gitlab.com/sosy-lab/software/regression-verification-tasks/-/tree/new-svcomp-spec. Further information on these benchmarks can be found here.

The benchmarks were processed with incremental-ldv.py to generate the patch files and create the benchmark sets in index/sets/ldv/. The file index/sets/ldv/combined.yaml contains all benchmarks that are imported; the other yaml files in index/sets/ldv/ contain the benchmarks per directory.

Problems

There are some significant problems with this benchmark set. In particular:

  • Most of the benchmarks have syntax errors that cause Goblint to be unable to parse them.
  • Those benchmarks that Goblint runs on have very little live code. I checked ten benchmarks and the reported number of live code on these varies between 20 and 230 lines of code. @michael-schwarz and I inspected one benchmark manually and came to the conclusion that indeed very little code is live.
  • Due to the little live code, run times on many of the working benchmarks are within ~3 seconds
  • The patches tend to be large, and the patches are patches of CIL-files. While there are some patches where Goblint only detects 1-2 functions to be changed, a considerable amount of the patches that Goblint can run on results in >50 functions to be changed.

@sim642 sim642 marked this pull request as draft August 11, 2022 11:27
@sim642 sim642 added the new benchmark New benchmark to analyze label Aug 13, 2022
@sim642
Copy link
Member

sim642 commented Aug 13, 2022

  • Most of the benchmarks have syntax errors that cause Goblint to be unable to parse them.

Would be useful to have a CIL issue about this parsing error.

@michael-schwarz
Copy link
Member

Iirc they were actually errors (the type of the implementation not matching the one given in the prototype in the same file) etc that GCC also flags. If we want to do something here, I don't think we should patch CIL to accept invalid programs, but rather fix the benchmarks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new benchmark New benchmark to analyze
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants