1.
Exploratory Data Analysis
1.3. EDA Techniques 1.3.3. Graphical Techniques: Alphabetic


Purpose: Check If Two Data Sets Can Be Fit With the Same Distribution 
The quantilequantile (qq) plot is a graphical technique
for determining if two data sets come from populations with
a common distribution.
A qq plot is a plot of the quantiles of the first data set against the quantiles of the second data set. By a quantile, we mean the fraction (or percent) of points below the given value. That is, the 0.3 (or 30%) quantile is the point at which 30% percent of the data fall below and 70% fall above that value. A 45degree reference line is also plotted. If the two sets come from a population with the same distribution, the points should fall approximately along this reference line. The greater the departure from this reference line, the greater the evidence for the conclusion that the two data sets have come from populations with different distributions. The advantages of the qq plot are:
The qq plot is similar to a probability plot. For a probability plot, the quantiles for one of the data samples are replaced with the quantiles of a theoretical distribution. 

Sample Plot 
This qq plot of the JAHANMI2.DAT data set shows that


Definition: Quantiles for Data Set 1 Versus Quantiles of Data Set 2 
The qq plot is formed by:
Both axes are in units of their respective data sets. That is, the actual quantile level is not plotted. For a given point on the qq plot, we know that the quantile level is the same for both points, but not what that quantile level actually is. If the data sets have the same size, the qq plot is essentially a plot of sorted data set 1 against sorted data set 2. If the data sets are not of equal size, the quantiles are usually picked to correspond to the sorted values from the smaller data set and then the quantiles for the larger data set are interpolated. 

Questions 
The qq plot is used to answer the following questions:


Importance: Check for Common Distribution  When there are two data samples, it is often desirable to know if the assumption of a common distribution is justified. If so, then location and scale estimators can pool both data sets to obtain estimates of the common location and scale. If two samples do differ, it is also useful to gain some understanding of the differences. The qq plot can provide more insight into the nature of the difference than analytical methods such as the chisquare and KolmogorovSmirnov 2sample tests.  
Related Techniques 
Bihistogram T Test F Test 2Sample ChiSquare Test 2Sample KolmogorovSmirnov Test 

Case Study  The quantilequantile plot is demonstrated in the ceramic strength data case study.  
Software  QQ plots are available in some general purpose statistical software programs. If the number of data points in the two samples are equal, it should be relatively easy to write a macro in statistical programs that do not support the qq plot. If the number of points are not equal, writing a macro for a qq plot may be difficult. 