1.
Exploratory Data Analysis
1.3. EDA Techniques 1.3.5. Quantitative Techniques


Purpose: Test if two population means are equal 
The twosample ttest
(Snedecor and
Cochran, 1989) is used to determine if two population means
are equal. A common application is to test if a
new process or treatment is superior to a current process
or treatment.
There are several variations on this test.


Definition 
The twosample ttest for unpaired data is defined as:


TwoSample tTest Example 
The following twosample ttest was generated for the
AUTO83B.DAT data set. The data set
contains miles per gallon for U.S. cars (sample 1) and for
Japanese cars (sample 2); the summary statistics for each
sample are shown below.
SAMPLE 1: NUMBER OF OBSERVATIONS = 249 MEAN = 20.14458 STANDARD DEVIATION = 6.41470 STANDARD ERROR OF THE MEAN = 0.40652 SAMPLE 2: NUMBER OF OBSERVATIONS = 79 MEAN = 30.48101 STANDARD DEVIATION = 6.10771 STANDARD ERROR OF THE MEAN = 0.68717 We are testing the hypothesis that the population means are equal for the two samples. We assume that the variances for the two samples are equal.
H_{0}: μ_{1} = μ_{2} H_{a}: μ_{1} ≠ μ_{2}The absolute value of the test statistic for our example, 12.62059, is greater than the critical value of 1.9673, so we reject the null hypothesis and conclude that the two population means are different at the 0.05 significance level. In general, there are three possible alternative hypotheses and rejection regions for the onesample ttest:
For our twotailed ttest, the critical value is t_{1α/2,ν} = 1.9673, where α = 0.05 and ν = 326. If we were to perform an upper, onetailed test, the critical value would be t_{1α,ν} = 1.6495. The rejection regions for three posssible alternative hypotheses using our example data are shown below. 

Questions 
Twosample ttests can be used to answer the following
questions:


Related Techniques 
Confidence Limits for the Mean Analysis of Variance 

Case Study  Ceramic strength data.  
Software  Twosample ttests are available in just about all general purpose statistical software programs. Both Dataplot code and R code can be used to generate the analyses in this section. 