OP7: Environment Be Damned!

Cristian Zamfir

One of the limitations of the way S2E models the environment stems from the frequent transitions between the concrete and the symbolic domains. In performing the transition from concrete to symbolic, S2E converts memory from concrete to symbolic at the boundary of the domain of interest. For instance, if the domain of interest is the security module of an application, S2E marks some of the arguments passed to the functions of this library as symbolic. The question is how to appropriately mark these arguments as symbolic, so that enough coverage can be achieved in the entire security module, without generating false positives.

In the following code example:

if (title == "director") 
   account_type = DIRECTOR; 
   account_type = NORMAL; 
... ... ... 
credentials += account_type; 
string key = changeSecretKey(credentials);
the security module will manipulate a set of credentials passed by the program to the library. To explore relevant paths in the library, S2E will mark the credentials parameter as symbolic. The problem with this approach is that it could cause false positives if the changeSecretKey(...) function assumes some preconditions about the credentials parameter. A false positive means that S2E could trigger an assert that would otherwise not be triggered in practice. This approach could overwhelm developers with false warnings and decreases the benefits of using S2E, and even make S2E comparable to a static analysis tool in terms of number of false warnings. No clear solution to this aspect is given in the S2E paper.

A solution to this problem is to dynamically track all the constraints on all the memory values in the system. When performing a transition from concrete to symbolic, S2E could automatically apply the gathered constraints to the values that are about to be marked symbolic (in our case, to the credentials parameter). This means that only values that are feasible, given the execution path that executes the call to changeSecretKey, would be generated. Taking into account only one path is somehow limited, therefore, an optimization is to generate constraints on the symbolic parameters for multiple execution paths. For instance, in the code example above, S2E should take into account both paths, that set the variable account_type to either NORMAL or DIRECTOR, and create new constraints on the credentials parameter, that take both paths into account. If not enough coverage is generated using this method, S2E could fall back to the default approach, at the risk of some false postitives.

The limitation described does not appear in other approaches that have a less limited modeling of the environment, because in those cases, the symbolic data provided by the environment is marked at well defined boundaries (such as command line arguments), where the inputs are known to be completely unconstrained. S2E can obviate the need to create models for the environment and can be selective with respect to which parts of the system run symbolically, at the cost of false positives. Thus, to get rid of false positives, S2E needs to automatically infer some domain­specific constraints on the the values that are converted from concrete to symbolic.