OP7: Environment Be Damned!

Radu Banabic

KLEE models the environment so that it can symbolically execute on all possible return values from system calls. This is approach represents an approximation to avoid symbolically executing the entire kernel, which is prohibitively expensive. However, due to the complex nature of system calls and their side effects, the models end up being quite expensive and incomplete.

S2E handles the problem of system or library calls differently, by executing them concretely with a set of parameters instantiated from the path constraint. S2E adds the selected values as constraints in future paths. If the chosen parameters and their associated constraint cause a path to be missed in later, S2E backtracks and chooses a different value for the parameters, based on the constraint causing the backtrack.

However, the chosen concrete parameters affect future execution through the return value. The library call is made with concrete parameters, resulting in a concrete return value. If the caller code ever checks the return value in a branch, S2E explore a single path.

The current S2E approach for the problem of concrete return values is to include the libraries which are most relevant to the system under test in the symbolic execution. However, symbolically executing a library is hard and expensive. Symbolically executing the library in place would mean redoing the symbolic execution each time a library call is made.

Another solution to this problem would be to symbolically execute certain library methods or system calls a priori in order to generate a "dictionary" linking constraints on the return values to constraints of the call parameters. Static analysis on the system under test would reveal which method calls are executed (and perhaps in which order, for more complex symbex). Symbolic execution would be used to generate a dictionary: every possible set of return values (set of constraints) would be linked to a set of constraints on the input parameters. By executing the library call with a set of input parameters satisfying a set of constraints from the dictionary, we expect the return value to satisfy the respective constraints associated to that entry in the dictionary.

In S2E, this approach would be used when "backtracking" from a point in the symbolic execution to a previous library call. Now, the same approach as the one mentioned in the S2E paper for changing the parameters to a system call would be used for the return value. If choosing a concrete value λ = 5 for an input parameter of a system call would result in a return value γ = 12, and S2E encounters a branch γ < 3 further in the execution, it could check in the dictionary of the system call for the constraints on λ which would generate a return value γ smaller than 3, add those constraints to the path constraints at the time of the call and regenerate the input parameters.

This method is an approximation to find parameters of calls. Library calls are still executed concretely; therefore their side effects are not missed.