1.
Exploratory Data Analysis
1.3. EDA Techniques 1.3.3. Graphical Techniques: Alphabetic


Purpose: Find transformation to normalize data 
Many statistical tests and intervals are based on the assumption of
normality. The assumption of normality often leads to tests that are
simple, mathematically tractable, and powerful compared to tests that
do not make the normality assumption. Unfortunately, many real data
sets are in fact not normal. However, an appropriate transformation
of a data set can often yield a data set that does follow a normal
distribution. This increases the applicability and usefulness of
statistical techniques based on the normality assumption.
The BoxCox transformation is a particulary useful family of transformations. It is defined as: Given a particular transformation, it is helpful to define a measure of the normality of the resulting transformation. One measure is to compute the correlation coefficient of a normal probability plot. The correlation is computed between the vertical and horizontal axis variables of the probability plot and is a convenient measure of the linearity of the probability plot (the more linear the probability plot, the better a normal distribution fits the data). The BoxCox normality plot is a plot of these correlation coefficients for various values of the lambda parameter. The value of lambda corresponding to the maximum correlation on the plot is then the optimal choice for . 

Sample Plot 
The histogram in the upper lefthand shows a data set that has significant right skewness (and so does not follow a normal distribution). The BoxCox normality plot shows that the maximum value of the correlation coefficient is at = 0.3. The histogram of the data after applying the BoxCox transformation with = 0.3. shows a data set for which the normality assumption is reasonable. This is verified with a normal probability plot of the transformed data. 

Definition 
BoxCox normality plots are formed by:


Questions 
The BoxCox normality plot can provide answers to the following
questions:


Importance: Normalization improves validity of tests 
Normality assumptions are critical for many univariate intervals and tests. It is important to test this normality assumption. If the data are in fact not normal, the BoxCox normality plot can often be used to find a transformation that will normalize the data.  
Related Techniques 
Normal Probability Plot BoxCox Linearity Plot 

Software  BoxCox normality plots are not a standard part of most general purpose statistical software programs. However, the underlying technique is based on a normal probability plot and computing a correlation coefficient. So if a statistical program supports these capabilities, writing a macro for a BoxCox normality plot should be feasible. Dataplot supports a BoxCox normality plot directly. 