Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
18.104.22.168. Uniform Random Numbers
The goal of this analysis is threefold:
|4-Plot of Data|
The assumptions are addressed by the graphics shown above:
|Individual Plots||Although it is usually not necessary, the plots can be generated individually to give more detail.|
|Run Sequence Plot||
|Histogram (with overlaid Normal PDF)||
This plot shows that a normal distribution is a poor fit. The flatness of the histogram suggests that a uniform distribution might be a better fit.
|Histogram (with overlaid Uniform PDF)||
Since the histogram from the 4-plot suggested that the uniform distribution might be a good fit, we overlay a uniform distribution on top of the histogram. This indicates a much better fit than a normal distribution.
|Normal Probability Plot||
As with the histogram, the normal probability plot shows that the normal distribution does not fit these data well.
|Uniform Probability Plot||
Since the above plots suggested that a uniform distribution might be appropriate, we generate a uniform probability plot. This plot shows that the uniform distribution provides an excellent fit to the data.
Since the data follow the underlying assumptions, but with a uniform
distribution rather than a normal distribution, we would still like
to characterize C by a typical value plus or
minus a confidence interval. In this case, we would like to find a
with the smallest variability.
The bootstrap plot is an ideal tool for this purpose. The following plots show the bootstrap plot, with the corresponding histogram, for the mean, median, mid-range, and median absolute deviation.
|Mid-Range is Best||
From the above histograms, it is obvious that for these data,
the mid-range is far superior to the mean or median as an
estimate for location.
Using the mean, the location estimate is 0.507 and a 95% confidence interval for the mean is (0.482,0.534). Using the mid-range, the location estimate is 0.499 and the 95% confidence interval for the mid-range is (0.497,0.503).
Although the values for the location are similar, the difference in the uncertainty intervals is quite large.
Note that in the case of a uniform distribution it is known theoretically that the mid-range is the best linear unbiased estimator for location. However, in many applications, the most appropriate estimator will not be known or it will be mathematically intractable to determine a valid condfidence interval. The bootstrap provides a method for determining (and comparing) confidence intervals in these cases.