Was Dijkstra Wrong or Right?

Giuliano Losa

I am convinced Dijkstra was right.

By testing we can only ensure that a finite set of executions is bug-free, even using the latest techniques. Given the specification of a program, testing traditionally consists of running the program on a finite set of inputs and verifying that the observed behaviour conforms to the specification. A deviation from the specification is a bug.

More generally, one could view testing as exercising different executions of a program (depending on input, scheduling,...). Symbolic representations of the state space efficiently reduce its size, but it's still only possible to explore a finite part of the space, limited by the available memory and time. However, proving a program means assuring that it's behaviour is correct for all possible executions.

One could rephrase the question as "is it possible to ensure that all bugs are caught (i.e., all executions match the specification) by examining only a finite set of executions"? Two major problems arise: some real systems have orders of magnitude more executions than what we can test, while other real systems may even have an infinite set of possible executions (even when represented symbolically). For example, decisions in BFT protocols can be delayed forever by an attacker. In this case, is it possible to test a finite number of executions and prove the absence of bugs in the protocol? By doing this, we are likely to miss the critical situation where some counters used in the protocol simultaneously overflow because this happens in a particular set of executions which are very long. Is it possible to find a finite set of executions such that all remaining ones are isomorphic to one tested execution ? Certainly not, or else we could solve the halting problem.

One could argue that a real system will not exercise all of its possible executions, so we need only check those that will indeed occur. This approach suffers from important problems: First, how does one determine the set of executions that should be tested? Usage of a system cannot be predicted in general, as customer's needs are specified only informally and bound to change. Second, software engineering heavily relies on abstraction and, in order to manage complexity, it is crucial to be able to trust our building blocks. We therefore must test all the layers underneath our program, in order to prove that the program will work correctly.