3.4.3 HELP for Missing Data

Hung-kung Liu, William F. Guthrie
Statistical Engineering Division, ITL

J.T. Gene Hwang
Cornell University

Gerard N. Stenbakken, Michael T. Souders
Electricity Division, EEEL

Many engineering problems involve high-dimensional observations with mean vectors sitting in a lower dimensional space. Exhaustive measurement of all the elements of an observation is often time consuming and expensive. Applying a traditional multivariate linear model, one can combine a small subset of the elements of the observation with a known design matrix to predict the rest of the elements. However, for a complicated engineering system, the design matrix is often hard to fully determine. We investigate an empirical linear model, in which we allow ourselves to use the data to determine the size of the design matrix and to estimate the unknown part of the design matrix. This estimated model is then used to construct point and interval estimates for the future observation. This technique is called HELP (High-dimensional Empirical Linear Prediction).

As part of the work on developing efficient new testing strategies for software-embedded systems, we have collaborated on tests of an extension of the HELP algorithm to devices following a model outside the usual non-software-embedded framework. The new device model, while simpler than the models ultimately applicable to software-embedded systems, provides a readily-available starting point for testing an extension of the HELP methodology using the concept of Expectation Maximization (EM) which has potential importance for software-embedded systems. The EM approach is attractive because it would provide an efficient method for extending HELP, or other testing tools, to the more complex device model.

We studied patterns of the observed data to see if they can identify the model being considered. Our results not only may help to resolve the issue about whether EM works for this situation but also can help engineers to properly design their experiment in such a way that the principle components can still be identified when some of the data are missing.

$\begin{figure} \epsfig{file=/proj/sedshare/panelbk/2000/data/projects/inf/tses.ps,angle=-90,width=6.0in} \end{figure}$

Figure 18: The norm of the predicting error versus the number of test points selected (t_p) are plotted for years 1, 2, 3, and 4, and for ``E", the EM estimates for year 4. Our analytical result on the lower bound for t_p implying identifiability is 149. The graph reflects this in the fact that the the size of the norm of the predicted error of the E's behaves like the 4's for small t_p, and comes down to a reasonable small size around t_p=149.

Date created: 7/20/2001
Last updated: 7/20/2001
Please email comments on this WWW page to sedwww@nist.gov.