1.
Exploratory Data Analysis
1.3. EDA Techniques 1.3.3. Graphical Techniques: Alphabetic


Purpose: Find transformation to normalize data 
Many statistical tests and intervals are based on the assumption of
normality. The assumption of normality often leads to tests that are
simple, mathematically tractable, and powerful compared to tests that
do not make the normality assumption. Unfortunately, many real data
sets are in fact not approximately normal. However, an appropriate
transformation of a data set can often yield a data set that does
follow approximately a normal distribution. This increases the
applicability and usefulness of statistical techniques based on the
normality assumption.
The BoxCox transformation is a particulary useful family of transformations. It is defined as:
Given a particular transformation such as the BoxCox transformation defined above, it is helpful to define a measure of the normality of the resulting transformation. One measure is to compute the correlation coefficient of a normal probability plot. The correlation is computed between the vertical and horizontal axis variables of the probability plot and is a convenient measure of the linearity of the probability plot (the more linear the probability plot, the better a normal distribution fits the data). The BoxCox normality plot is a plot of these correlation coefficients for various values of the \( \lambda \) parameter. The value of \( \lambda \) corresponding to the maximum correlation on the plot is then the optimal choice for \( \lambda \). 

Sample Plot 
The histogram in the upper lefthand corner shows a data set that has significant right skewness (and so does not follow a normal distribution). The BoxCox normality plot shows that the maximum value of the correlation coefficient is at \( \lambda \) = 0.3. The histogram of the data after applying the BoxCox transformation with \( \lambda \) = 0.3 shows a data set for which the normality assumption is reasonable. This is verified with a normal probability plot of the transformed data. 

Definition 
BoxCox normality plots are formed by:


Questions 
The BoxCox normality plot can provide answers to the following
questions:


Importance: Normalization Improves Validity of Tests 
Normality assumptions are critical for many univariate intervals and hypothesis tests. It is important to test the normality assumption. If the data are in fact clearly not normal, the BoxCox normality plot can often be used to find a transformation that will approximately normalize the data.  
Related Techniques 
Normal Probability Plot BoxCox Linearity Plot 

Software  BoxCox normality plots are not a standard part of most general purpose statistical software programs. However, the underlying technique is based on a normal probability plot and computing a correlation coefficient. So if a statistical program supports these capabilities, writing a macro for a BoxCox normality plot should be feasible. 