Next Page Previous Page Home Tools & Aids Search Handbook
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.5. Quantitative Techniques

1.3.5.3.

Two-Sample t-Test for Equal Means

Purpose:
Test if two population means are equal
The two-sample t-test (Snedecor and Cochran, 1989) is used to determine if two population means are equal. A common application of this is to test if a new process or treatment is superior to a current process or treatment.

There are several variations on this test.

  1. The data may either be paired or not paired. By paired, we mean that there is a one-to-one correspondence between the values in the two samples. That is, if X1, X2, ..., Xn and Y1, Y2, ... , Yn are the two samples, then Xi corresponds to Yi. For paired samples, the difference Xi - Yi is usually calculated. For unpaired samples, the sample sizes for the two samples may or may not be equal. The formulas for paired data are somewhat simpler than the formulas for unpaired data.
  2. The variances of the two samples may be assumed to be equal or unequal. Equal variances yields somewhat simpler formulas, although with computers this is no longer a significant issue.
  3. In some applications, you may want to adopt a new process or treatment only if it exceeds the current treatment by some threshold. In this case, we can state the null hypothesis in the form that the difference between the two populations means is equal to some constant (u1 - u2 = d0) where the constant is the desired threshold.
Definition The two sample t test for unpaired data is defined as:
H0: u1 = u2
Ha: u1 <> u2
Test Statistic: T = (YBAR1 - YBAR2)/[(1/sp)*SQRT((1/N1) + (1/N2))]

where N1 and N2 are the sample sizes, YBAR1 and YBAR2 are the sample means, and s1**2 and s2**2 are the sample variances.

If equal variances are assumed, then the formula reduces to:

    T = (YBAR1 - YBAR2)/[sp*SQRT((1/N1) + (1/N2))]
where
    sp**2 = [(N1-1)*s1**2 + (N2-1)*s2**2]/(N1 +N2 -2)
Significance Level:
alpha.
Critical Region: Reject the null hypothesis that the two means are equal if
    T < -t(alpha/2,nu
or
    T > t(alpha/2,nu
where t(alpha/2,nu) is the critical value of the t distribution with nu degrees of freedom where
    (s1**2/N1 + s2**2/N2)**2/[(s1**2/N1)**2/(N1-1) + (s2**2/N2)**2/(N2-1)]
If equal variances are assumed, then
    nu = N1 + N2 - 2
Sample Output
Dataplot generated the following output for the t test from the AUTO83B.DAT data set:
                       T TEST
                     (2-SAMPLE)
 NULL HYPOTHESIS UNDER TEST--POPULATION MEANS MU1 = MU2
  
 SAMPLE 1:
    NUMBER OF OBSERVATIONS      =      249
    MEAN                        =    20.14458
    STANDARD DEVIATION          =    6.414700
    STANDARD DEVIATION OF MEAN  =   0.4065151
  
 SAMPLE 2:
    NUMBER OF OBSERVATIONS      =       79
    MEAN                        =    30.48101
    STANDARD DEVIATION          =    6.107710
    STANDARD DEVIATION OF MEAN  =   0.6871710
  
 IF     ASSUME SIGMA1 = SIGMA2:
    POOLED STANDARD DEVIATION   =    6.342600
    DIFFERENCE (DEL) IN MEANS   =   -10.33643
    STANDARD DEVIATION OF DEL   =   0.8190135
    T TEST STATISTIC VALUE      =   -12.62059
    DEGREES OF FREEDOM          =    326.0000
    T TEST STATISTIC CDF VALUE  =    0.000000
  
 IF NOT ASSUME SIGMA1 = SIGMA2:
    STANDARD DEVIATION SAMPLE 1 =    6.414700
    STANDARD DEVIATION SAMPLE 2 =    6.107710
    BARTLETT CDF VALUE          =    0.402799
    DIFFERENCE (DEL) IN MEANS   =   -10.33643
    STANDARD DEVIATION OF DEL   =   0.7984100
    T TEST STATISTIC VALUE      =   -12.94627
    EQUIVALENT DEG. OF FREEDOM  =    136.8750
    T TEST STATISTIC CDF VALUE  =    0.000000
  
                   ALTERNATIVE-         ALTERNATIVE-
 ALTERNATIVE-      HYPOTHESIS           HYPOTHESIS
 HYPOTHESIS        ACCEPTANCE INTERVAL  CONCLUSION
 MU1 <> MU2         (0,0.025) (0.975,1)   ACCEPT
 MU1 < MU2          (0,0.05)              ACCEPT
 MU1 > MU2          (0.95,1)              REJECT
Interpretation of Sample Output We are testing the hypothesis that the population mean is equal for the two samples. The output is divided into five sections.
  1. The first section prints the sample statistics for sample one used in the computation of the t-test.

  2. The second section prints the sample statistics for sample two used in the computation of the t-test.

  3. The third section prints the pooled standard deviation, the difference in the means, the t-test statistic value, the degrees of freedom, and the cumulative distribution function (cdf) value of the t-test statistic under the assumption that the standard deviations are equal. The t-test statistic cdf value is an alternative way of expressing the critical value. This cdf value is compared to the acceptance intervals printed in section five. For an upper one-tailed test, the acceptance interval is (0,1 - alpha), the acceptance interval for a two-tailed test is (alpha/2, 1 - alpha/2), and the acceptance interval for a lower one-tailed test is (alpha,1).

  4. The fourth section prints the pooled standard deviation, the difference in the means, the t-test statistic value, the degrees of freedom, and the cumulative distribution function (cdf) value of the t-test statistic under the assumption that the standard deviations are not equal. The t-test statistic cdf value is an alternative way of expressing the critical value. cdf value is compared to the acceptance intervals printed in section five. For an upper one-tailed test, the alternative hypothesis acceptance interval is (1 - alpha,1), the alternative hypothesis acceptance interval for a lower one-tailed test is (0,alpha), and the alternative hypothesis acceptance interval for a two-tailed test is (1 - alpha/2,1) or (0,alpha/2). Note that accepting the alternative hypothesis is equivalent to rejecting the null hypothesis.

  5. The fifth section prints the conclusions for a 95% test under the assumption that the standard deviations are not equal since a 95% test is the most common case. Results are given in terms of the alternative hypothesis for the two-tailed test and for the one-tailed test in both directions. The alternative hypothesis acceptance interval column is stated in terms of the cdf value printed in section four. The last column specifies whether the alternative hypothesis is accepted or rejected. For a different significance level, the appropriate conclusion can be drawn from the t-test statistic cdf value printed in section four. For example, for a significance level of 0.10, the corresponding alternative hypothesis acceptance intervals are (0,0.05) and (0.95,1), (0, 0.10), and (0.90,1).
Output from other statistical software may look somewhat different from the above output.
Questions Two-sample t-tests can be used to answer the following questions:
  1. Is process 1 equivalent to process 2?
  2. Is the new process better than the current process?
  3. Is the new process better than the current process by at least some pre-determined threshold amount?
Related Techniques Confidence Limits for the Mean
Analysis of Variance
Case Study Ceramic strength data.
Software Two-sample t-tests are available in just about all general purpose statistical software programs, including Dataplot.
Home Tools & Aids Search Handbook Previous Page Next Page