-
Notifications
You must be signed in to change notification settings - Fork 9
Violation witness clarification #38
Comments
For point 1)A witness is represented as an automaton and it gives several possible program paths (see here), e.g., the automaton gives transitions for a program. The level of granularity for transitions is currently chosen to match program statements, because of an agreement of tool developers and validators. I am not sure where this is documented, but the overall understanding should be clear. For point 2)Violation witnesses for concurrent programs of course rely on scheduling information, and it is represented as sequence of transitions in the witness, i.e., which thread applies which operation at which position in the trace. This is the most basic source of information for developers and shows the exact control flow that leads to a property violation. Also for data races, also the trace towards a property violation should be available, and it can be represented as sequence of program operations (currently neglecting read- and write operations). An SMT-based encoding of program traces is also a good idea, but you need to give more details.
Of course, there is the possibility to extend the specification and, e.g., enrich the automaton with additional information based on SMT. For example, there is currently no possibility for some invariants to be annotated as pure C-expressions, because of missing quantifiers. |
Which standard are you referring to? You mean this repository? Generally this repository tries to give a formal description of the witness format. For the semantics I would look into the corresponding papers. For violation witnesses there are state-space guards and sink states in the witness automaton, their semantics is to restrict state space, cf. http://research.stahlbauer.net/FSE-2015-Testification.pdf :
The only thing that is really under specified is how the CFA for a program is constructed.
We usually verify against the C standard(and GCC when it comes to implementation-defined behavior). The C standard itself does not state whether this are two memory accesses. In general for a program without data races the C standard usually guarantees that the outcome is identical to one with a fixed order of statements (which might not be unique -> race conditions). But I remember having discussed this before with you, some of the
So my guess is with If you want to analyze programs with data races by assuming some memory model (x86-TSO?) then yes, a extension of the witness format would be needed, and at that point we could also be open to a completely new format (I see SMT just as one of many proposals, this should be discussed once it is clear what we actually want. To me it sounds like an SMT formula also would have the same problem, namely that you need to know about the interleaving). I think it is most productive if you could provide an example program and example SMT witness such that we can figure out where the requirements are that are not fulfilled by the current format. |
I think my description was not clear enough. To clarify I'm not proposing to change the witness format. As Martin said, this repo seems to focus mostly on the witness format (syntax). What it is not 100% clear to me is what must be the meaning of e.g. a graph edge. My goal is to convert a witness graph into an SMT formula. This formula in conjunction with the formula representing all executions of the program and the formula representing the reachability condition would give me a BMC-based witness validator. No changes are required by developers of other tools, I just want to understand the meaning of each graph element (I need this to faithfully convert the graph into the formula) and hopefully clarify what is assumed by current validators (for technical reasons) and what is imposed by the standard (witness semantics).
There are different ways of representing a program path. An automaton is one way, an SMT formula is another. My concern is that most papers seems to assume that program paths are explicitly represent as automata, which is clearly not true for all tools.
I understand this agreement for technical reasons, but I don't think the C standard imposes this (see below my answer to Martin's comment) and I think we should be more general. One possibility would be to add a new graph tag
I don't fully agree with this. Suppose you have an assertion
I know that many tools are exploiting this empty witnesses (which kind of defeats the goals of having validation in the first place). However I think there are witnesses in the middle of the spectrum (left-hand-side: empty witnesses, right-hand-side: fully scheduling information). For example, for data races, it could be useful if the witness could say which two instructions form the data race (not as useful as the full scheduling, but definitely more useful than an empty witness). So, can I add this information in the tag of an edge without CPAChecker trying to match it with some program transition?
As I said, I'm not proposing to change the format to SMT, but rather to understand what is the meaning of the graph elements so it would allow me to convert such witnesses into a logical formula.
BMC techniques convert the program to the SSA form. The witness assumption would still refer to variables. It is the job of the validator developer to convert this to the corresponding SSA variable.
This will definitely be a limitation of BMC-based-validation since BMC tools unroll loops. In principle we would have to do the same trick that verification tools do and thus the SMT formula generated from the witness will just represent bounded witness-executions.
The 3 questions above have the same answer. I hope by now it is clear, the witness format will not change so non-SMT-based verification tools should interpret witnesses in the same way they are doing it now.
The C standard does not say this are two accesses, but it does not say they are not either. My understanding is that for the program below, a witness saying "Starts executing line 3, execute line 5, finish executing line 5" should be valid, and unfortunately current validator cannot validate such witness.
Notice that this notion of atomicity is independent of races. The program above suffers this "problem" but does not have a race.
These two programs have the same executions, the first has a race, the second does not (according to C11)
|
I'm working on extending Dartagnan for validation (currently with emphasis in violation witnesses for concurrency reachability where we only have one validator) and I have a couple of questions about (violation) witnesses.
Every paper I checked talk about (1) witness automata, validation as (2) product of program automaton and witness automaton and that the witness should (3) should restrict the search space. I understand these decisions from an historical perspective: the first verifiers / validators that were developed represent programs as automata. However, since currently SVCOMP have many tools not based on automata, such decisions start to seem rather restrictive. My understanding is however that these restrictions (or assumptions) are not really imposed by the standard, but rather by the tools implementing validation.
The standard says that the witnesses must conform to the GraphML format, thus syntactically following (1), but AFAIK it does not say anything about which semantics to give to the XML graph.
My idea is to convert the XML witness into an SMT formula (there are many ways of doing this with different advantages, see below) which in conjunction with the normal reachability formula would reduce the search space for the SMT solver (thus fulfilling (3)).
I think encoding the witness as an SMT formula (in my mind this is having BMC-validation) would overcome some of the limitations we currently have with validation:
In general, the
x++
instructions would result (in assembly or LLVM IR) in two memory accesses, one reading the initial value from memory and one writing to memory the value after adding 1. A verifier that reasons at this level can generate a witness with this scheduling3 -> 5 ->3
(sincebegin/end_atomic()
are not used, this context switch is possible).Unfortunately CPAChecker (the only validator for concurrency currently available) assumes each C statement to be atomic and thus cannot validate the witness from above.
I think the SMT encoding of witnesses is very flexible. For example, if the verifier knows which statements are executed, but not in which order, we might encode that those lines should be executed but do not specify in which order. This would still reduce the search space, but not as much as interpreting also the order.
The XML graph induces an order but we can opt not to give it any meaning. This is where I think the difference I mentioned above between syntax and semantics of the graph needs to be clarified in the standard.
I think this flexibility would be useful too for witnesses in a (hopefully soon to come) category about weak memory models where instruction re-ordering and not-atomicity are possible.
Looking forward to see some discussion about the topic :D
The text was updated successfully, but these errors were encountered: