Exploratory Data Analysis
Graphical Techniques: Alphabetic
(Box and Jenkins, pp. 28-32)
are a commonly-used tool for checking randomness in a data set.
This randomness is ascertained by computing autocorrelations for
data values at varying time lags. If random, such autocorrelations
should be near zero for any and all time-lag separations. If
non-random, then one or more of the autocorrelations will be
In addition, autocorrelation plots are used in the model
identification stage for
Box-Jenkins autoregressive, moving average time series models.
Autocorrelations should be near-zero for randomness. Such is
not the case in this example and thus the randomness assumption
This sample autocorrelation plot shows that the time series is not
random, but rather has a high degree of autocorrelation between
adjacent and near-adjacent observations.
r(h) versus h
Autocorrelation plots are formed by
- Vertical axis: Autocorrelation coefficient
where Ch is the autocovariance
and C0 is the variance function
Note--Rh is between -1 and +1.
Note--Some sources may use the following formula for
the autocovariance function
Although this definition has less bias, the (1/N)
formulation has some desirable statistical properties and
is the form most commonly used in the statistics literature.
See pages 20 and
49-50 in Chatfield for details.
- Horizontal axis: Time lag h (h = 1, 2, 3, ...)
- The above line also contains several horizontal reference
lines. The middle line is at zero. The other four lines
are 95 % and 99 % confidence bands. Note that there are
two distinct formulas for generating the confidence bands.
- If the autocorrelation plot is being used to test for
randomness (i.e., there is no time dependence in the
data), the following formula is recommended:
where N is the sample size, z is the
cumulative distribution function of the standard normal
significance level. In this case, the confidence bands have
fixed width that depends on the sample size. This is the
formula that was used to generate the confidence bands in
the above plot.
- Autocorrelation plots are also used in the model
identification stage for fitting
In this case, a moving average model is assumed for the data
and the following confidence bands should be generated:
where k is the lag, N is the sample size,
z is the cumulative distribution function of the standard normal
the significance level. In this case, the confidence bands
increase as the lag increases.
The autocorrelation plot can provide answers to the following
- Are the data random?
- Is an observation related to an adjacent observation?
- Is an observation related to an observation twice-removed?
- Is the observed time series white noise?
- Is the observed time series sinusoidal?
- Is the observed time series autoregressive?
- What is an appropriate model for the observed time series?
- Is the model
valid and sufficient?
- Is the formula
Ensure validity of engineering conclusions
Randomness (along with fixed model, fixed variation, and fixed
distribution) is one of the four assumptions that typically
underlie all measurement processes. The randomness assumption is
critically important for the following three reasons:
In short, if the analyst does not check for randomness, then
the validity of many of the statistical conclusions becomes
suspect. The autocorrelation plot is an excellent way of checking
for such randomness.
- Most standard statistical tests depend on randomness. The
validity of the test conclusions is directly linked to the
validity of the randomness assumption.
- Many commonly-used statistical formulae depend on the
randomness assumption, the most common formula being the
formula for determining the standard deviation of the sample
where is the standard
deviation of the data. Although heavily used, the results
from using this formula are of no value unless the
randomness assumption holds.
- For univariate data, the default model is
If the data are not random, this model is incorrect
and invalid, and the estimates for the parameters (such as
the constant) become nonsensical and invalid.
Examples of the autocorrelation plot for several common
situations are given in the following pages.
- Random (= White Noise)
- Weak autocorrelation
- Strong autocorrelation and
- Sinusoidal model
The autocorrelation plot is demonstrated in the
beam deflection data
Autocorrelation plots are available in most general purpose
statistical software programs.