3.1.2 Statistical Methods for Software Validation

David L. Banks, Charles Hagwood, Raghu Kacker, James Yen

Statistical Engineering Division, ITL

Will Dashiell, Leonard Gallagher, Lynne Rosenthal

Software Diagnostics and Conformance Testing Division, ITL

Software reliability is the central problem of the Information Revolution. Faulty software can cost fortunes and lives; also, the worldwide effort to validate commercial code ties up an enormous investment of high-level human resources. The scale of this problem means that even modest statistical progress to reduce the testing burden while maintaining current performance levels will enable great technological advance.

Recently, new statistical methods have been proposed for conformance testing; these reduce costs, quantify uncertainty, or both. Our research program compares these methods, invents better ones, and determines which cocktail of techniques is most useful for specific classes of problems. We also evaluate the benefit that best-practice methods can confer, to support management assessment of software costs, and risks.

Currently, we have focused upon the four most promising directions in recent software validation: (1) Coverage designs (Dalal and Mallows, 1997), that allocate testing effort across modules or functions so as to ensure the joint exercise of all subsets of fixed size; (2) Usage models (Trammell, 1996), which attempt to allocate test effort according to usage data or mission criticality; (3) Optimal stoppage (Dalal and Mallows, 1989), which uses information on the catch times and severity of bugs to set up a dynamic programming problem whose solution (under some assumptions) determines the best time to release the code; (4) Extensions of binomial models for failures in a fixed test suite (cf. Sahinoglu and Spafford, 1990), using Bayesian and finite-population techniques. A key goal is comparative evaluation of these different approaches.

We have begun a simulation experiment that aims at giving first-order information on comparative performance of inspection protocols for virtual software. The experiment takes account of differential catchability among bugs, different costs of bugs (in terms of the damage caused if the undetected bug were released), different locations of the bugs (either in terms of module or function), and the fact that some kinds of bugs, such as logical errors, show cluster structure. This work is related to optimal search theory. We believe that simulated software tests have the potential to remove the greatest single barrier to progress in the evaluation of conformance testing methods.

Also, we are exploring the of clinical trials methods for deciding whether to release software. The decision to market a new drug is analogous to the decision to release new code--drug approval uses a four-step protocol established by the FDA. Software manufactures employ four similar steps, but are less systematic in combining the information at each stage. By transferring statistical techniques developed for drug testing, we hope to improve software testing.

$\begin{figure} \epsfig{file=/proj/sedshare/panelbk/98/data/projects/dex/tsm.ps,width=6.0in}\end{figure}$

Figure 2: There are many strategies for software validation. This partial taxonomy organizes the major lines of research on this problem; however, because the field is large and methods may be applied in combination, this isn't a comprehensive representation.

Date created: 7/20/2001
Last updated: 7/20/2001
Please email comments on this WWW page to sedwww@nist.gov.