OP10: Engineering Higher-Order Test Generation


Let us start with a discussion on what kind of functions should be concretized. One large class of such functions are the functions/machine instructions which are out of scope of our theorem prover. For example, floating point functions like log, exp, sqrt. As these are rather easy to identify, we will skip this discussion and rather elaborate on functions which can be fully expressed in the theorem prover, but for which it is a good idea to abstract them. We believe there are two major categories of such functions: a) functions with very complex expressions (i.e. hash functions) b) functions with too many paths (functions containing unbounded loops and/or complex logic). In both these cases, it is theoretically possible to use symbolic execution, however it is impractical because of the analysis time. Our suggestion is to use a combination of the following techniques to decide whether the functions are too complex and thus should be concretized: 

For detecting complex expressions, we provide the following heuristics: 1. solver timeouts. Each time a solver takes too much time, we will decide if the current function seems to be the culprit and we will mark it as a candidate for concretization. We will use statistical methods for outlier detection (a simple threshold will not work as it will start cutting off all long paths). In case our engine is EXE-like, we can track solver time for prefixes of each path and just see when the complexity starts to grow more rapidly. In case of DART-style symbex, our proposal is to fire parallel solver queries at function boundaries if the original query takes too much time. 2. static analysis of code. One of the ideas here is to capture different features from source code expressions (number of multiplications, size of expressions, mixing of variables, etc.) and try to use machine learning to recognize hard-to-solve patterns. We will collect samples from real code, run the solver on time and learn the expected time. Another alternative is to take an SMT solver, feed the expressions to it and get the size of the problem passed to an underlying SAT. The intuition here is that SMT solvers try to optimize SAT queries as much as possible before sending them to the SAT solver and thus the size passed to the SAT solver is a good estimate of hardness. 

For path-exploding functions we suggest to calculate function summaries first. If during the summary generation we observe too many paths with different outcomes, the function is a good candidate. In practice, a single threshold will not work and we suggest to use thresholds dependent on the nesting level of functions. 

The second part of the OP asks for capturing all input-output pairs, but that is clearly infeasible. Thus, we suggest sampling the function. One simple way is to sample random inputs and corresponding outputs. A more clever way is to determine all paths up to some depth and then sample independently on each path. Sampling on a path corresponds to solving the path constraint combined with constraints to force new solutions. As the solutions can be heavily influenced by the SMT solver, we may moreover try to fix a few random bits of the input/output to obtain a more uniform distribution. 

To summarize, we believe that function concretization is beneficial for functions creating too much complexity by creating either complex expressions or too many paths. For both of these cases we provide suitable techniques, which can be applied to perform an almost fully-automatic analysis.