|
6.
Process or Product Monitoring and Control
6.5. Tutorials
|
|||
| Dimension reduction tool | A Multivariate Analysis problem could start out with a substantial number of correlated variables. Principal Component Analysis is a dimension-reduction tool that can be used advantageously in such situations. Principal component analysis aims at reducing a large set of variables to a small set that still contains most of the information in the large set. | ||
| Principal factors | The technique of principal component analysis enables us to create and use a reduced set of variables, which are called principal factors. A reduced set is much easier to analyze and interpret. To study a data set that results in the estimation of roughly 500 parameters may be difficult, but if we could reduce these to 5 it would certainly make our day. We will show in what follows how to achieve substantial dimension reduction. | ||
| Inverse transformaion not possible | While these principal factors represent or replace one or more of the original variables, it should be noted that they are not just a one-to-one transformation, so inverse transformations are not possible. | ||
| Original data matrix | To shed a light on the structure of principal components analysis, let us consider a multivariate data matrix X, with n rows and p columns. The p elements of each row are scores or measurements on a subject such as height, weight and age. | ||
| Linear function that maximizes variance | Next, standardize the X matrix so that each column mean is 0 and each column variance is 1. Call this matrix Z. Each column is a vector variable, zi, i = 1, . . . , p. The main idea behind principal component analysis is to derive a linear function y for each of the vector variables zi. This linear function possesses an extremely important property; namely, its variance is maximized. | ||
| Linear function is component of z |
This linear function is referred to as a component of
z. To illustrate the computation of a single element
for the jth y vector, consider the product
y = z v' where v' is a column vector
of V and V is a p x p
coefficient matrix that carries the p-element variable
z into the derived n-element variable
y. V is known as the eigen vector
matrix. The dimension of z is 1 x p, the
dimension of v' is p x 1. The scalar algebra
for the component score for the ith individual of
yj, j = 1, ...p is:
|
||
| Mean and dispersion matrix of y |
The mean of y is my =
V'mz = 0, because
mz = 0.
The dispersion matrix of y is
|
||
| R is correlation matrix | Now, it can be shown that the dispersion matrix Dz of a standardized variable is a correlation matrix. Thus R is the correlation matrix for z. | ||
| Number of parameters to estimate increases rapidly as p increases |
At this juncture you may be tempted to say: "so what?". To answer
this let us look at the intercorrelations among the elements of a
vector variable. The number of parameters to be estimated for a
p-element variable is
|
||
| Uncorrelated variables require no covariance estimation | All these parameters must be estimated and interpreted. That is a herculean task, to say the least. Now, if we could transform the data so that we obtain a vector of uncorrelated variables, life becomes much more bearable, since there are no covariances. | ||