OP12: Now, Which Technology to Use?

Trevis Alleyne

Given the three main considerations of the start-up—high-performance Java/C++, >1 MLOC, and network interactions—the best choice of technology to license would be a scalable dynamic test generation tool like SMART. First, SMART has been tested on programs in the C programming language and so it can presumably be adapted to other high-level programming languages, such as Java and C++. Second, SMART uses function summaries to address the scalability problems usually associated with dynamic test generation, which allows it to test programs approaching >1 MLOC. For example, the authors previously evaluated DART on a library consisting of 30,000 lines of C code, which represents a minimum bound for SMART since they show it to be significantly more efficient than DART. Finally, since SMART is a dynamic method, it can analyze the startup's software running in its native environment, which would make it possible to test network interactions and I/O.

None of the tools, however, covered in the course are perfect and SMART is no exception. The need to change SMART to analyze C++ and Java programs and to improve its impressive-though-not-perfect scalability are two disadvantages. Aside from these potentially non-trivial changes, the main limitations of SMART are that its theorem prover only handles linear integer constraints, and although this allows for graceful degradation of symbolic execution, SMART devolves to random testing when expressions fall outside of theories decidable by the theorem prover. Despite these drawbacks, however, the wealth of methods that use dynamic test generation or a similar approach, eg. EXE, SAGE, KLEE, demonstrates that it is an effective and scalable technique for automated testing.

Considering the other candidates, static analysis is an approach that comes close, but in the end might not be well-suited to the task. Static analyzers can currently detect bugs written in real-world programming languages, and therefore could certainly be adapted to Java and C++. For example, SATURN targets C programs and CALYSTO, since it accepts the compiler's intermediate representation, is in theory language-independent. Static analyzers, moreover, are known to scale well to large programs. The SATURN paper describes an evaluation of the Linux kernel (5 MLOC) and CALYSTO was evaluated on hundreds of thousands of lines of production, open-source applications. The problem with static analysis, however, is that it would be very difficult or impossible to test the program's interaction with the environment, i.e. the network. SATURN, CALYSTO, and even C2BP, SDV, and ASTRÉE are therefore unsuitable.

Some of the remaining technologies discussed in the course are not as appropriate as dynamic test generation, yet still possess features that would complement a tool like SMART. Two of these approaches are concurrency testing and performance analysis. Concurrency testing tools like CHESS and FUSION might not be well-suited since they require the user to write a test suite by hand, which can be time-consuming, and they were not validated on programs as large as the SMART benchmarks. Concurrency testing , however, could compliment the final internal tool by allowing the user to also find bugs due to multiple threads—a situation we might have in a high-performance program. Likewise, performance analysis tools like SPECTROSCOPE and MACEPC are narrowly targeted to bugs that increase the latency of a program and so would not be a good choice for the tool to be built. Nevertheless, a performance analysis during some phase of the testing process could certainly be helpful in improving our world-changing Java/C++ application.