David L. Banks, Stefan Leigh, Mark Levenson, Andrew Rukhin,
Statistical Engineering Division, ITL
James Dray, James Nechvatal, Miles Smid, Juan Soto
Computer Security Division, ITL
Secure network communication depends on encryption technology. And all modern encryption technology requires the use of random number generators that can produce essentially random binary sequences. This project develops statistical tests for nonrandomness that reveal insecure generators.
Many generators have been proposed by the mathematical community. Some are known to be flawed, but have compensating advantages of speed or low memory requirements. Others are thought to be truly random (in a technical sense, that relates the unpatternedness of their behavior to the hardness of intransigent mathematical problems). But before any of these generators are used, their output should be subjected to a battery of statistical tests, to verify an acceptable level of pseudorandomness.
Many statistical tests exist that determine specific kinds of nonrandomness. These include: (1) Freqency tests, to verify that the proportions of zeroes and ones are equal; (2) Runs tests, to verify that the binary sequence doensn't alternate too frequently, or contain too many or too few long substrings of zeroes or ones; (3) Spectral tests, that look for periodicity in the local frequencies of zeroes and ones (an extension of this test based on wavelet analysis looks for aperiodic variation in these frequencies); (4) Geometric tests, which can look at how often a random walk based on the sequence returns to the origin, or at the local dimensionality of n-tuplets of numbers built from the sequence, and other theoretical properties of multivariate spacings; (5) Compression tests, that use complexity theory to estimate if a sequence compresses too much, indicating that some kind of patterning is present. We have developed a large suite of these tests, together with the associated mathematical theory, critical values, and guidelines for application, so that any user will be able to assess the strength of their own encryption algorithms.
Two issues are independence and coverage of the tests. Independence problems arise because some tests give similar results for nearly all sequences, and are redundant. We solve this using latent variable theory and principal components analysis.
The coverage problem arises because we want to ensure that our tests are able to capture the principle kinds of non-randomness that arise from current and future generators. (Matin-Löf proved that no finite set of tests can ever completely validate a generator, and thus the best possible result is to detect shortcomings that arise in practice.) To address this problem, we use the theory of simultaneous inference and a modified form of the Inclusion-Exclusion Principle.
Figure 1: This figure shows the Lempel-Ziv complexity of the 4096 binary sequences of length twelve, from to . The data have been smoothed by loess, and illustrate the complexity of the sequences by the length of their compression.
Date created: 7/20/2001