OP8: Turning Abstract Interpretation into Symbolic Execution

Etienne Kneuss

For symbolic execution, the key property is reachability: we are looking at path constraints to see if those reaching error conditions are satisfiable. For this reason we will use as an abstract domain formulas that express the path constraints. Given the target application of Astrée, the formulas need to be in a theory that supports numerical operations, arrays, tupples and is decidable. In order to faithfully represent computations and as is common in symbolic execution engines, we will use bit-vectors. AI also requires us to define a partial order on that abstract domain (typically a complete lattice). The partial order on formulas is given by (logical implication); we thus have T(true) as top element and (false) as bottom element. The join (least upper bound) operator is the logical disjunction operator .

As for the transfer functions, they are naturally defined given the abstract domain and the semantic of each statement. For a language like C, the transfer function for each statement is given by the corresponding formula supported by the logic. We assume that the language is already transformed into the equivalent SSA form. For instance the statement x14 := y13 with pre-condition P will yield the postcondition (P ∧ x14 = y13). Array assignments can be handled conservatively as in SE. Arithmetic operations are translated to the corresponding operation on bit-vectors. This has the advantage of accounting precisely for overflows. The statement x18 := Φ(x12, x14) obtained from the SSA transformation can be naturally translated to P ∧ (x18 = x12 ∨ x18 = x14). Branching statements are handled by simply taking the conjunction of the branch condition with the pre-condition. Another interesting statement to consider is assert(C). During the actual fix-point computation, we treat them as assume(C) statements, and thus we generate P ∧ C as post-condition. When the fix-point is computed, we look at all the assert statements and invoke a solver for P ∧ ¬ C. Solving this formula and obtaining a class of inputs satisfying it will indicate that the assertion may fail using such inputs. It remains for us to exclude false positives by testing or applying additional reasoning.

Note that this lattice is of infinite height, and consequently it does not satisfy the finite ascending chains condition which is necessary to ensure the termination of the fix-point algorithm in the presence of loops. We must thus rely on a widening operator to enforce (and more generally speed up) termination. Strictly speaking, symbolic execution would not terminate in the presence of such loops as well, so implementing a strict symbolic execution engine using AI would require us to let it in this ill state. If we decide to diverge from it for the sake of termination, we can implement a widening operator ω. A simple widening operator would be to turn the reachability condition into T (true), effectively introducing many false positives.

One of the big advantages of using AI to develop such a system is that it will allow us to trade precision for (fast) termination in the presence of loops. In many applications, loops are frequent and termination of the analysis is a desired property. Also, when AI terminates and none of the verification conditions generated from the assertions could be violated, we have a guarantee that no concrete execution can violate these assertions (ie. there are no false negatives). Whether it terminates and how fast it does it is directly dependent on the widening function used, which leaves a lot of room for fine-tuning. However, one of the main disadvantages of AI is that it can generate a lot of false positives.