4.2.1.4. The random errors follow a normal distribution.

4. Process Modeling
4.2. Underlying Assumptions for Process Modeling
4.2.1. What are the typical underlying assumptions in process modeling?

4.2.1.4. The random errors follow a normal distribution.

Primary Need for Distribution Information is Inference

After fitting a model to the data and validating it, scientific or engineering questions about the process are usually answered by computing statistical intervals for relevant process quantities using the model. These intervals give the range of plausible values for the process parameters based on the data and the underlying assumptions about the process. Because of the statistical nature of the process, however, the intervals cannot always be guaranteed to include the true process parameters and still be narrow enough to be useful. Instead the intervals have a probabilistic interpretation that guarantees coverage of the true process parameters a specified proportion of the time. In order for these intervals to truly have their specified probabilistic interpretations, the form of the distribution of the random errors must be known. Although the form of the probability distribution must be known, the parameters of the distribution can be estimated from the data.

Of course the random errors from different types of processes could be described by any one of a wide range of different probability distributions in general, including the uniform, triangular, double exponential, binomial and Poisson distributions. With most process modeling methods, however, inferences about the process are based on the idea that the random errors are drawn from a normal distribution. One reason this is done is because the normal distribution often describes the actual distribution of the random errors in real-world processes reasonably well. The normal distribution is also used because the mathematical theory behind it is well-developed and supports a broad array of inferences on functions of the data relevant to different types of questions about the process.

Non-Normal Random Errors May Result in Incorrect Inferences

Of course, if it turns out that the random errors in the process are not normally distributed, then any inferences made about the process may be incorrect. If the true distribution of the random errors is such that the scatter in the data is less than it would be under a normal distribution, it is possible that the intervals used to capture the values of the process parameters will simply be a little longer than necessary. The intervals will then contain the true process parameters more often than expected. It is more likely, however, that the intervals will be too short or will be shifted away from the true mean value of the process parameter being estimated. This will result in intervals that contain the true process parameters less often than expected. When this is the case, the intervals produced under the normal distribution assumption will likely lead to incorrect conclusions being drawn about the process.

Parameter Estimation Methods Can Require Gaussian Errors

The methods used for parameter estimation can also imply the assumption of normally distributed random errors. Some methods, like maximum likelihood, use the distribution of the random errors directly to obtain parameter estimates. Even methods that do not use distributional methods for parameter estimation directly, like least squares, often work best for data that are free from extreme random fluctuations. The normal distribution is one of the probability distributions in which extreme random errors are rare. If some other distribution actually describes the random errors better than the normal distribution does, then different parameter estimation methods might need to be used in order to obtain good estimates of the values of the unknown parameters in the model.