
3.4.3 HELP for Missing Data
Hungkung Liu, William F. Guthrie
J.T. Gene Hwang
Gerard N. Stenbakken, Michael T. Souders Many engineering problems involve highdimensional observations with mean vectors sitting in a lower dimensional space. Exhaustive measurement of all the elements of an observation is often time consuming and expensive. Applying a traditional multivariate linear model, one can combine a small subset of the elements of the observation with a known design matrix to predict the rest of the elements. However, for a complicated engineering system, the design matrix is often hard to fully determine. We investigate an empirical linear model, in which we allow ourselves to use the data to determine the size of the design matrix and to estimate the unknown part of the design matrix. This estimated model is then used to construct point and interval estimates for the future observation. This technique is called HELP (Highdimensional Empirical Linear Prediction). As part of the work on developing efficient new testing strategies for softwareembedded systems, we have collaborated on tests of an extension of the HELP algorithm to devices following a model outside the usual nonsoftwareembedded framework. The new device model, while simpler than the models ultimately applicable to softwareembedded systems, provides a readilyavailable starting point for testing an extension of the HELP methodology using the concept of Expectation Maximization (EM) which has potential importance for softwareembedded systems. The EM approach is attractive because it would provide an efficient method for extending HELP, or other testing tools, to the more complex device model.
We studied patterns of the observed data to see if they can identify
the model being considered.
Our results not only may help to resolve the issue about whether EM
works for this situation but also can help engineers to properly
design their experiment in such a way that the principle components
can still be identified when some of the data are missing.
Figure 18: The norm of the predicting error versus the number of test points selected (t_{p}) are plotted for years 1, 2, 3, and 4, and for ``E", the EM estimates for year 4. Our analytical result on the lower bound for t_{p} implying identifiability is 149. The graph reflects this in the fact that the the size of the norm of the predicted error of the E's behaves like the 4's for small t_{p}, and comes down to a reasonable small size around t_{p}=149.
Date created: 7/20/2001 