One possible solution to the problem would be to develop an alternative to falsification testing itself. Conformance testing based on this alternative would involve analyzing, in some automated or semi-automated manner, the actual code of the candidate implementation to evaluate the implementation's conformance to the relevant specification. Two research areas have been suggested as having the potential to lead to the development of non falsification techniques.
The first area is that of program verification, in which mathematical claims about the behavior of a program are stated and proved. In theory, program verification techniques could prove that a particular program must behave correctly on all possible inputs. However, despite considerable research effort during the last two decades, there is nothing to suggest that this approach has led or will lead to a practical tool. Program verification has had limited success because even relatively simple programs are hard to prove correct. Furthermore, even a proof that a particular program is correct would be a statement only about that program's source code, not about its compiled code and how it executes in particular software/hardware environments. [BLLR93].
Thus, attention has shifted to a second, newer, research area, which involves "statistical" or "probabilistic" measures of correctness of programs. Under this approach, a program isn't proven correct, but instead is shown to be very likely (say 99.99% likely) to be correct. Techniques developed using this approach would presumably be applicable to the testing of more complex programs than currently is the case with formal proof techniques, and the testing itself would become significantly automated. The probabilistic aspect itself shouldn't be controversial--it isn't much different from what we do today when we construct test data representing only a subset of the possible combinations allowed by the standard specification. Rather, the question is simply whether any of the theoretical research has reached the point where it can be picked up by ITL and used as the basis of a nontrivial, practical alternative to falsification testing.
Result checking is based on the observation that "for certain mathematical functions f, the task of determining, given inputs x and y, whether or not f(x) = y is easier than the task of, on input x, computing f(x)" [BLWA96]. In other words, checking that y is correct is often easier than computing the "true" value f(x) in some other way. For example, suppose that xsqrt is a program that claims to implement the square root function. We can then verify xsqrt(x) by verifying if, in fact, xsqrt(x) * xsqrt(x) is equal to x (ignoring roundoff error here). The point is that the single multiplication needed to calculate xsqrt(x) * xsqrt(x) is likely to be faster than the sequence of operations that the xsqrt program went through to calculate xsqrt(x) in the first place, and is also likely to be faster than comparing xsqrt(x) with a value calculated by using an existing reference square root function.
The idea of result checking leads to the idea of "self-checking," whereby the implementing program includes, within itself, code that can quickly check the results of its own calculations. Furthermore, many self-checking techniques can be extended naturally to be "self-correcting" as well, if the algorithm underlying the implemented program is such that the program's value, or output, for a given input can be calculated using the value or output of the program at other points. This is where randomly generated data can come into play, leading to the "statistical" nature of the overall approach.
My conclusions, after reviewing the literature and discussing the subject with NIST experts, are very pessimistic regarding the first vision, and only slightly more optimistic regarding the second.
ITL's basic task regarding conformance testing is to answer the question "how can we test an existing program P to find out if, according to specification S, P does the right thing." This is, in a sense, a version of the question being asked by the transparent proof people. However, they are answering it in the domain of formal mathematical proofs, and "it is not clear how this work could be applied to conformance testing" [PIAT96].
On the other hand, the checking/correcting people work in the domain of software, but are directing their attention to the quite different problem "how can we construct program P to ensure that, according to specification S, P does the right thing." The latter problem is the domain of the implementor, although a given self checking technique used in the construction of P might also be used as a conformance test. In any case, the "statistical" aspect of this approach comes in connection with possible randomization algorithms to be imbedded within P to check and correct P's handling of a given specific input, not in the analysis of P's source code or in the selection of potential inputs to test P in general.
Some ideas and techniques implied by the research (although perhaps not the statistical analysis part itself) are immediately applicable to the design of conformance tests. Certainly the idea of indirectly checking the results of a calculation (e.g., square the results of an implementor's square root function and see if you get back the original test point, or, to take another example, see if you have any problems deleting a file created by an implementor's create function) is a valuable technique that is already being used by conformance test developers.
Likewise, the use, where appropriate, of randomly generated test data points is a technique worth considering, especially when working in a test development environment such as Sun's Assertion Definition Language (ADL) that helps the developer specify test input.
Nevertheless, the larger problem of conformance test development remains unaddressed. The checking/correcting approach is most often applicable precisely when it's least needed by test developers, namely when testing hardware or straightforward, easily calculated functions, such as mathematical software.
In short, I haven't read or heard about anything currently percolating in the research community that is directly applicable to either of the above-mentioned visions. This isn't to say that the visions are impossible or aren't worth pursuing, just that there don't seem to be any breakthroughs out there that would lead to a major change in the way we do business.
[BLLR93] Blum, M., Luby, M., and Rubinfeld, R., "Self-Testing/Correcting with Applications to Numerical Problems," Journal of Computer and System Sciences, vol. 47, pp. 549-595, 1993.
[BLWA96]B lum, M., and Wasserman, H., "Reflections on the Pentium Division Bug," IEEE Transactions on Computers, vol. 45, no. 4, pp. 385-393, April 1996.
[PIAT96] Piatko, C., personal e-mail, August 1996.