OP1: A Smarter Quality Metric For Testing

Andrew Becker

Over the years, many different test suite quality metrics have been proposed. Line and basic-block coverage are two of the perennial favorites, largely due to the ease with which they can be measured. However, these metrics have serious fundamental flaws that at best limit their applicability and at worst contribute to software bugs by luring developers into a false sense of security.

The most serious problem with line or basic-block coverage as a metric for test suite quality is that most bugs are entirely or partially path-sensitive, i.e., the bug is only observed when a specific path is taken through the program's control flow graph. A test suite might provide 100% block/line coverage of the following two examples, yet never encounter the obvious bugs:

 Ex. 1				Ex. 2				
----------------------------------------------------------------
 void bar_1(){		    |   #define S_MAX 256		 
     lock(&something);      |   #define SMAX 128		 
 }			    |				 
			    |   void foo(int s, int data){	 
 void bar_2(){		    |       char buf[SMAX];		 
     lock(&something);      |	    //...			 
 }			    |	    if(s < S_MAX){		 
			    | 	        buf[s] = data;	 
 void foo(){		    | 	    }			 
     if(a) bar_1();	    |	    //...			 
     if(b) bar_2();	    |   }				 
 }			    | 				 

An improved metric might measure the coverage of the feasible paths through the control flow graph, with loops being considered covered if executed 0, 1, and m times for some m>1. A test suite with 100% feasible-path coverage would encounter the deadlock in Ex. 1 if the bodies of both conditionals lie along a feasible path. However, a test suite with 100% feasible-path coverage may well miss the bug in Ex. 2 (for example, if the test suite only assigns values less than 128 to s).

If it is not practical to determine if a particular path is feasible or not (imagine a condition like md5(str) == 12, or a condition which depends on a value calculated in a loop), the safest action is to consider the path feasible and not covered, even if one or more test cases cover the body of that conditional.

A further improvement would incorporate the knowledge that many bugs lurk around border cases of non-boolean constraints. A robust feasible path coverage metric would only count a feasible path as covered if every (non-boolean) constraint along that path has been tested with values which lie near the middle and at both the minimum and maximum of the range of the constraint function.

Still, this metric would not guarantee finding some trivial bugs that are not path-sensitive. The problem might be mitigated by adding more requirements for considering a feasible path as being covered – perhaps one for each type of "unsafe" operation. For example, one might be that any array access must be guaranteed by the path constraint to be within bounds. If it is not practical to determine if each array access is guaranteed to be within bounds (imagine buf[ md5(str) ]), then give up and consider the path as feasible and not covered.

Acknowledgement: I discussed the reading material with Jonas Wagner.