OP10: Eliminating Redundancy

Vitaly Chipounov

Bouncer uses function summaries to speed up the computation of the set of inputs that would trigger a particular vulnerability in a program. Summaries are precomputed for frequent library functions, such as strcmp or strlen. When a program calls these functions with symbolic arguments, Bouncer automatically appends precomputed constraints to the current path instead of re-executing the functions. This saves time and resources, since there is no more forking involved, as this would happen when executing the loops contained in such functions.

There are two issues with the Bouncer’s approach: the summaries are written manually from templates and they might not capture the entire constraint set of the original function. Manual writing is impractical for large libraries and also requires keeping the summaries up to date, not to mention errors and inaccuracies plaguing such a manual task. The second issue might lead to an increased rate of false negatives if the summary is not thorough enough.

Instead of writing the summaries manually, we extract them automatically by symbolically executing the library functions independently and saving the resulting execution tree. For instance, strlen could be called with a pointer referring to a buffer of symbolic size with symbolic content. The call would generate a tree up to some fixed depth. Then, the execution tree and the constraints are saved to disk. When strlen is invoked in the context of the program, the right path in the execution tree is retrieved, sparing possibly expensive constraint solving.

Coping with infinite trees can be done using on-demand expansion. In the case of strlen with symbolic pointer, the number of execution paths is infinite. Most of the time, however, it is called with an argument having a bounded size (e.g., <50 characters). In this case, the initial symbolic execution would explore the function up to the average size of the argument, and for the exceptional case, it would complement the summary by invoking the symbolic execution engine on the paths that are missing.

The new code paths can be cached for later reuse. On-demand expansion can be done at multiple granularities. Building large execution trees beforehand might be wasteful both in terms of memory and CPU resources, especially when the concerned function is rarely called by the program. A similar waste can happen when only a few code paths are executed in general. A static analysis pass could determine how the function is usually called in order to generate a representative summary. For rare functions, the summaries can be built only in case they are needed, rather than a-priori.