|
1.
Exploratory Data Analysis
1.3. EDA Techniques 1.3.5. Quantitative Techniques
|
|||||||||||
|
Purpose: Interval Estimate for Mean |
Confidence limits for the mean
(Snedecor and
Cochran, 1989)
are an interval estimate for the
mean. Interval estimates are often desirable because the
estimate of the mean varies from sample to sample. Instead of
a single estimate for the mean, a confidence interval generates
a lower and upper limit for the mean. The interval
estimate gives an indication of how much uncertainty there is
in our estimate of the true mean. The narrower the interval,
the more precise is our estimate.
Confidence limits are expressed in terms of a confidence coefficient. Although the choice of confidence coefficient is somewhat arbitrary, in practice 90%, 95%, and 99% intervals are often used, with 95% being the most commonly used. As a technical note, a 95% confidence interval does not mean that there is a 95% probability that the interval contains the true mean. The interval computed from a given sample either contains the true mean or it does not. Instead, the level of confidence is associated with the method of calculating the interval. The confidence coefficient is simply the proportion of samples of a given size that may be expected to contain the true mean. That is, for a 95% confidence interval, if many samples are collected and the confidence interval computed, in the long run about 95% of these intervals would contain the true mean. |
||||||||||
| Definition: Confidence Interval |
Confidence limits are defined as:
is the sample mean,
s is the sample standard deviation, N is the
sample size, is the desired significance level, and
is the upper critical value
of the t distribution
with N - 1 degrees of freedom. Note that the confidence
coefficient is 1 - .
From the formula, it is clear that the width of the interval is controlled by two factors:
|
||||||||||
| Definition: Hypothesis Test |
To test whether the population mean has a specific value,
, against
the two-sided alternative that it does not have a value
,
the confidence interval is converted to hypothesis-test form.
The test is a one-sample t-test, and it is defined as:
|
||||||||||
| Sample Output for Confidence Interval |
Dataplot generated the following output for a
confidence interval from the
ZARR13.DAT data set:
CONFIDENCE LIMITS FOR MEAN
(2-SIDED)
NUMBER OF OBSERVATIONS = 195
MEAN = 9.261460
STANDARD DEVIATION = 0.2278881E-01
STANDARD DEVIATION OF MEAN = 0.1631940E-02
CONFIDENCE T T X SD(MEAN) LOWER UPPER
VALUE (%) VALUE LIMIT LIMIT
---------------------------------------------------------
50.000 0.676 0.110279E-02 9.26036 9.26256
75.000 1.154 0.188294E-02 9.25958 9.26334
90.000 1.653 0.269718E-02 9.25876 9.26416
95.000 1.972 0.321862E-02 9.25824 9.26468
99.000 2.601 0.424534E-02 9.25721 9.26571
99.900 3.341 0.545297E-02 9.25601 9.26691
99.990 3.973 0.648365E-02 9.25498 9.26794
99.999 4.536 0.740309E-02 9.25406 9.26886
|
||||||||||
| Interpretation of the Sample Output |
The first few lines print the sample statistics used in calculating
the confidence interval. The table shows the confidence interval
for several different significance levels. The first column lists
the confidence level (which is 1 - expressed as a percent),
the second column lists the t-value (i.e., ), the third column lists
the t-value times the standard error (the standard error is
), the
fourth column lists the lower confidence limit, and the fifth column
lists the upper confidence limit. For example, for a 95% confidence
interval, we go to the row identified by 95.000 in the first column
and extract an interval of (9.25824, 9.26468) from the last two
columns.
Output from other statistical software may look somewhat different from the above output. |
||||||||||
| Sample Output for t Test |
Dataplot generated the following output for a
one-sample t-test from the
ZARR13.DAT data set:
T TEST
(1-SAMPLE)
MU0 = 5.000000
NULL HYPOTHESIS UNDER TEST--MEAN MU = 5.000000
SAMPLE:
NUMBER OF OBSERVATIONS = 195
MEAN = 9.261460
STANDARD DEVIATION = 0.2278881E-01
STANDARD DEVIATION OF MEAN = 0.1631940E-02
TEST:
MEAN-MU0 = 4.261460
T TEST STATISTIC VALUE = 2611.284
DEGREES OF FREEDOM = 194.0000
T TEST STATISTIC CDF VALUE = 1.000000
ALTERNATIVE- ALTERNATIVE-
ALTERNATIVE- HYPOTHESIS HYPOTHESIS
HYPOTHESIS ACCEPTANCE INTERVAL CONCLUSION
MU <> 5.000000 (0,0.025) (0.975,1) ACCEPT
MU < 5.000000 (0,0.05) REJECT
MU > 5.000000 (0.95,1) ACCEPT
|
||||||||||
| Interpretation of Sample Output |
We are testing the hypothesis that the population mean is 5.
The output is divided into three sections.
|
||||||||||
| Questions |
Confidence limits for the mean can be used to answer the following
questions:
|
||||||||||
| Related Techniques |
Two-Sample T-Test Confidence intervals for other location estimators such as the median or mid-mean tend to be mathematically difficult or intractable. For these cases, confidence intervals can be obtained using the bootstrap. |
||||||||||
| Case Study | Heat flow meter data. | ||||||||||
| Software | Confidence limits for the mean and one-sample t-tests are available in just about all general purpose statistical software programs, including Dataplot. | ||||||||||