6. Process or Product Monitoring and Control 6.5. Tutorials 6.5.2. What to do when data are nonnormal 

Often it is possible to transform nonnormal data into approximately normal data 
Nonnormality is a way of life, since no characteristic
(height, weight, etc.) will have exactly a normal
distribution. One strategy to make nonnormal data resemble normal
data is by using a transformation. There is no dearth of transformations
in statistics; the issue is which one to select for the situation at hand.
Unfortunately, the choice of the "best" transformation is generally not
obvious.
This was recognized in 1964 by G.E.P. Box and D.R. Cox. They wrote a paper in which a useful family of power transformations was suggested. These transformations are defined only for positive data values. This should not pose any problem because a constant can always be added if the set of observations contains one or more negative values. 

The BoxCox Transformation 
$$ \begin{eqnarray}
x(\lambda) & = & \frac{x^\lambda 1}{\lambda} \,\,\,\,\, & \lambda \ne 0 \\
x(\lambda) & = & \mbox{ln}(x) & \lambda = 0 \, .
\end{eqnarray} $$
Given the vector of data observations \( x = x_1, \, x_2, \, \ldots, \, x_n\), one way to select the power \(\lambda\) is to use the \(\lambda\) that maximizes the logarithm of the likelihood function 

The logarithm of the likelihood function 
$$ f(x,\lambda) = \frac{n}{2} \mbox{ln}\left[ \sum_{i=1}^{n}{\frac{(x_i(\lambda)  \bar{x}(\lambda))^2}{n}} \right] +
(\lambda  1)\sum_{i=1}^{n}{\ln(x_i)} \, , $$
is the arithmetic mean of the transformed data. 

Confidence bound for \(\lambda\)  In addition, a confidence bound (based on the likelihood ratio statistic) can be constructed for \(\lambda\) as follows: a set of \(\lambda\) values that represent an approximate \(100(1\alpha)\) % confidence bound for \(\lambda\) is formed from those \(\lambda\) that satisfy $$ f(x,\lambda) \ge f(x,\hat{\lambda})  0.5 \chi^2_{1\alpha, \, 1} \, ,$$ where \(\hat{\lambda}\) denotes the maximum likelihood estimator for \(\lambda\) and \(\chi_{1\alpha, \, 1}^2\) is the \(100(1\alpha)\) percentile of the chisquare distribution with 1 degree of freedom.  
Example of the BoxCox scheme  To illustrate the procedure, we used the data from Johnson and Wichern's textbook (Prentice Hall 1988), Example 4.14. The observations are microwave radiation measurements.  
Sample data 


Table of loglikelihood values for various values of \(\lambda\) 
The values of the loglikelihood function obtained by varying \(\lambda\)
from 2.0 to 1.9 are given below.
This table shows that \(\lambda = 0.3\) maximizes the loglikelihood function (LLF). This becomes 0.28 if a second digit of accuracy is calculated. The BoxCox transform is also discussed in Chapter 1 under the Box Cox Linearity Plot and the Box Cox Normality Plot. The BoxCox normality plot discussion provides a graphical method for choosing \(\lambda\) to transform a data set to normality. The criterion used to choose \(\lambda\) for the BoxCox linearity plot is the value of \(\lambda\) that maximizes the correlation between the transformed \(x\)values and the \(y\)values when making a normal probability plot of the (transformed) data. 