3.1.7 Software Testing Protocol Comparisons

James Yen, David Banks, Charles Hagwood, Raghu Kacker
Statistical Engineering Division, ITLPaul Black, Len Gallagher, Lynne Rosenthal
Software Diagnostics and Conformance Testing Division, ITL

Increasing computerization leads to an ever increasing dependence on software and the testing of software. An issue that arises repeatedly in software testing is whether to conclude testing and release the software or to continue testing the software. Such a decision depends on a variety of factors. Safety-critical software, such as for air traffic control, should merit more painstaking testing than video game programs. There may be economic factors, such as a shrinking market window due to competition from other software vendors.

It would thus be useful to have some sort of decision-making rule or protocol that takes into account the above factors. There are several such protocols, including the classical Starr's stopping rule, which is asymptotically optimal given a rather restrictive set of conditions. Starr's rule continues searching for bugs until the average cost of searching for the next bug found exceeds the average payoff from finding it. Most such stopping protocols are dependent on estimating how many total bugs exist in the software, given that a certain number of bugs have been uncovered by a given time t. Such estimates are often based on mathematical models of the distribution of the bug discovery times and usually involve maximum likelihood estimates or modifications of maximum likelihood methods or the method of moments.

An ideal way to compare different protocols would be an empirical comparison with real programs and programmers. But such a process would be very long and expensive, as well as having limited applicability to differing situations. Thus it makes sense to test protocols using simulation. The bug simulator was developed as a tool to make comparing testing protocols much easier, quicker, and cheaper. The bug simulator does not create actual code that might compile; instead it creates lists of mathematical objects that are abstract representations of bugs and their characteristics, together with enough contextual information that the protocols can operate. Specifically, the simulation creates a list of records, each of which represents a single bug. The attributes of the record indicate various characteristics of that bug, such as how hard the bug is to catch and the cost of not finding it. A debugging process is then simulated using these characteristics and the protocols to be compared. Capture times take the form of the number of discrete search passes performed on the bug list. Preliminary results of experiments using the bug simulator show that while Starr's rule may do quite poorly for situations widely divergent from its governing assumptions, it does perform relatively well in a variety of situations.

$\begin{figure} \epsfig{file=/proj/sedshare/panelbk/99/data/projects/it/abugsim.ps,width=6.0in}\end{figure}$

Figure 6: The plot shows the progress of a simulated bug search until it is terminated by Starr's stopping rule.

Date created: 7/20/2001
Last updated: 7/20/2001
Please email comments on this WWW page to sedwww@nist.gov.