|
4.
Process Modeling
4.3. Data Collection for Process Modeling
|
|||
| Experimental Design Principles Applied to Process Modeling |
There are six principles of experimental design as applied to
process modeling:
|
||
| Capacity for Primary Model | For your best-guess model, make sure that the design has the capacity for estimating the coefficients of that model. For a simple example of this, if you are fitting a quadratic model, then make sure you have at least three distinct horixontal axis points. | ||
| Capacity for Alternative Model | If your best-guess model happens to be inadequate, make sure that the design has the capacity to estimate the coefficients of your best-guess back-up alternative model (which means implicitly that you should have already identified such a model). For a simple example, if you suspect (but are not positive) that a linear model is appropriate, then it is best to employ a globally robust design (say, four points at each extreme and three points in the middle, for a ten-point design) as opposed to the locally optimal design (such as five points at each extreme). The locally optimal design will provide a best fit to the line, but have no capacity to fit a quadratic. The globally robust design will provide a good (though not optimal) fit to the line and additionally provide a good (though not optimal) fit to the quadratic. | ||
| Minimum Variance of Coefficient Estimators |
For a given model, make sure the design has the property of
minimizing the variation of the least squares estimated
coefficients. This is a general principle that is always in
effect but which in practice is hard to implement for many
models beyond the simpler 1-factor
models. For more complicated 1-factor models, and for most
multi-factor models, the expressions for
the variance of the least squares estimators, although available,
are complicated and assume more than the analyst typically knows.
The net result is that this principle, though important, is harder
to apply beyond the simple cases.
|
||
| Sample Where the Variation Is (Non Constant Variance Case) |
Regardless of the simplicity or complexity of the model, there are
situations in which certain regions of the curve are noisier than
others. A simple case is when there is a linear relationship
between and but the recording device
is proportional rather than absolute and so larger values of
are intrinsically noisier than smaller values of
. In such cases, sampling where the variation is
means to have more replicated points in those regions that are
noisier. The practical answer to how many such replicated
points there should be is
![]()
with |
||
| Sample Where the Variation Is (Steep Curve Case) |
A common occurence for non-linear models is for some regions of the
curve to be steeper than others. For example, in fitting an
exponential model (small corresponding to large
, and large corresponding to small
) it is often the case that the data
in the steep region are intrinsically noisier than the
data in the relatively flat regions. The reason for
this is that commonly the values themselves have
a bit of noise and this -noise gets translated into
larger -noise in the steep sections than in the
shallow sections. In such cases, when we know the shape of the
response curve well enough to identify steep-versus-shallow
regions, it is often a good idea to sample more heavily in the steep
regions than in the shallow regions. A practical rule-of-thumb for where
to position the values in such situations is to
values to use in the
design.
The above rough procedure for an exponentially decreasing curve would thus yield a logarithmic preponderance of points in the steep region of the curve and relatively few points in the flatter part of the curve. |
||
| Replication | If affordable, replication should be part of every design. Replication allows us to compute a model-independent estimate of the process standard deviation. Such an estimate may then be used as a criterion in an objective lack-of-fit test to assess whether a given model is adequate. Such an objective lack-of-fit F-test can be employed only if the design has built-in replication. Some replication is essential; replication at every point is ideal. | ||
| Randomization |
Just because the 's have some natural ordering does
not mean that the data should be collected in the same order as
the 's. Some aspect of randomization should
enter into every experiment, and experiments for process modeling
are no exception. Thus if your are sampling ten points on a curve,
the ten values should not be collected by
sequentially stepping through the values from the
smallest to the largest. If you do so, and if some extraneous
drifting or wear occurs in the machine, the operator, the
environment, the measuring device, etc., then that drift will
unwittingly contaminate the values and in turn
contaminate the final fit. To minimize the effect of such potential
drift, it is best to randomize (use random number tables) the
sequence of the values. This will not make the
drift go away, but it will spread the drift effect
evenly over the entire curve, realistically inflating the variation
of the fitted values, and providing some mechanism after the fact
(at the residual analysis model validation stage) for uncovering or
discovering such a drift. If you do not randomize the run sequence,
you give up your ability to detect such a drift if it occurs.
|
||