Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ldv-benchmarks races benchmarking #618

Merged
merged 9 commits into from
Apr 1, 2022
Merged

ldv-benchmarks races benchmarking #618

merged 9 commits into from
Apr 1, 2022

Conversation

sim642
Copy link
Member

@sim642 sim642 commented Mar 1, 2022

Goblint changes for benchmarking goblint/bench#3.

Changes

  1. Fix ValueDomain.is_safe_cast to correctly handle casts between signed and unsigned. This caused unsoundness in some invariant (test added).

Problems

  • Some benchmarks would require make sizes of primitive types configurable with current machine as default #54, because CIL inserts builtin function signatures (based on Machdep) which are incompatible with those preprocessed into ldv-benchmarks:
    old type = TFun(TInt(short, ), : TInt(short, ), )
    new type = TFun(TInt(unsigned short, ), : TInt(unsigned short, ), )
    /home/debian/klever-inst/klever-work/native-scheduler/scheduler/jobs/b8b212dd331653bfe7f8cbf43b3041a7/klever-core-work-dir/job/vtg/drivers/net/wireless/intersil/orinoco/orinoco_usb.ko/linux:concurrency safety/emg/environment_model.c:19: Error: Declaration of __builtin_bswap16 does not match previous declaration from /mnt/goblint-svcomp/sv-comp/goblint/includes/sv-comp.c:35 (different integer types).
    
    Our sv-comp.c stub doesn't explicitly define or include any __builtin_bswap16, so it must be coming from CIL's builtins.
  • ldv-commits-races/tasks/f1a8a3f-1.cil.i unsound. Uses custom pthread_create_N.
  • ldv-commits-races/tasks/4036523-1.i unsound. Uses custom pthread_create_N.
  • linux-4.2.6-races/Unsafes/u__linux-concurrency_safety__drivers---net---irda---ksdazzle-sir.ko.cil.i unsound. Model appears to be wrong for usb_endpoint_dir_in (bitwise op replaced with tautological unsigned comparison).
  • linux-4.18-races/Unsafes/u__linux-concurrency_safety__drivers---net---ethernet---8390---ax88796.ko.cil.i unsound. Model appears to be wrong for ax_reset_8390 (bitwise op replaced with tautological unsigned comparison).

@sim642 sim642 added sv-comp SV-COMP (analyses, results), witnesses benchmarking labels Mar 1, 2022
@sim642 sim642 self-assigned this Mar 1, 2022
@sim642
Copy link
Member Author

sim642 commented Mar 2, 2022

Ok, I think it's hopeless to get any meaningful results on these benchmarks for now for two reasons (more info here: goblint/bench#3 (comment)):

  1. They assume the analyzer handles many LDV-specific model functions, e.g. pthread_create_N, ldv_mutex_model_lock, ldv_spin_model_lock, etc. These don't contain definitions in terms of standard pthread functions in the model.
  2. They have somehow been combined with Frama-C, which has incorrectly replaced some bitwise checks on unsigned integers with negative value checks that Goblint immediately detects as impossible.

For these reasons I think almost all of the results I got from running these are invalid since they are unsound in one way or another, leaving most code dead, even if the SV-COMP result happens to come out right.

Nevertheless, we should merge this PR since it contains one base unsoundness fix and an unsoundness test for var_eq.

@sim642
Copy link
Member Author

sim642 commented Mar 3, 2022

Besides the region analysis causing unsoundness (#107), which doesn't show in the verdicts, it's also completely unsuitable for interactive benchmarking since the region partitioning is stored under a single global unknown and it's not clear whether this could be any different.

@sim642
Copy link
Member Author

sim642 commented Mar 9, 2022

The unsoundness fix contained here is actually relevant on sv-benchmarks as well. From the same kernel module there's a benchmark ldv-challenges/linux-3.14_linux-kernel-locking-spinlock_drivers-net-ethernet-amd-pcnet32.cil, which without intervals has 60s timeout, but with intervals gives immediate true result (which just happens to be correct).

@sim642
Copy link
Member Author

sim642 commented Apr 1, 2022

Since this is small and contains a possibly important unsoundness fix, I'll just merge it.

@sim642 sim642 merged commit c9bb7b4 into master Apr 1, 2022
@sim642 sim642 deleted the ldv-races branch April 1, 2022 07:09
sim642 added a commit that referenced this pull request Apr 7, 2022
sim642 added a commit that referenced this pull request Apr 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmarking bug sv-comp SV-COMP (analyses, results), witnesses unsound
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant