OP6: Data Races - ­­To Tolerate Or Not To Tolerate?

Baris Kasikci

Data races are at the heart of some of the worst concurrency bugs. Multithreaded software is full of races. As both the hardware and the software become increasingly parallel, it is expected that the number of races will increase in multithreaded software. For example, loading a single web page in a recent version of Firefox,on a modern, dual socket 8-core Intel machine, can flag up to 1000 races when a state-of-the-art race detector is used [Serebryany, WBIA’09]. Industrial practitioners report that debugging even one such race and fixing it takes weeks, or even up to a month [Godefroid et al., NDSS’08]. Studies show that up to around 90% of these races are harmless and are present in the code for performance reasons. The remaining 10% are typically harmful races which violate some specification of correct program operation. Eliminating races introduce performance penalties to such an extent that, in a recent version of memcached, developers did not fix a potentially harmful data race, even though it could lead to lost updates, simply because the performance penalty would be unbearable [memcached issue 127].

However a recent study points out that the new C++ memory model classifies all races at the source level as bugs [Boehm, HotPar’11]. In other words, according to this new memory model, all races at the source code level are harmful races. It is fair to expect that soon, most languages will follow this convention set by C++. Does this mean we should not tolerate races at all? The following two facts hint quite the opposite:

First, races can still be present at the assembly level, as compilers typically must allow races to perform classical optimizations (code motion, instruction reordering, ...). Races at the assembly level may become especially problematic under weak memory consistency models. Under such memory models, reads and writes to different memory locations can be reordered and cache latencies combined with write-buffering introduce asynchronous updates of shared variables amongt different cores. Even if source level races are eliminated, the generated assembly code may end up having races and behave unexpectedly, especially under weak memory models.

Second, since the new C++ memory model has been just recently introduced, most programmers are oblivious to it; not to mention the plethora of software that was written before this new specification. Therefore, for practical purposes, races must be tolerated even if this is not desired, or forbidden by some specification.

In an ideal world, it might have been possible to consider data races as bugs and eliminate all of them, however in the current situation that was just described, it is not realistic to consider them as bugs regardless of whether they are benign or harmful. It is therefore necessary to be able to efficiently identify harmful races in a program and fix only those ones. In this way it may be possible to improve the reliability of multithreaded programs without sacrificing too much performance.