|
6.
Process or Product Monitoring and Control
6.5. Tutorials 6.5.5. Principal Components
|
||||||||||||||||||||||||||||||||||||
| Orthogonalizing Transformations | ||||||||||||||||||||||||||||||||||||
| Transformation from z to y | The equation y = V'z represents a transformation, where y is the transformed variable, z is the original standardized variable and V is the premultiplier to go from z to y. | |||||||||||||||||||||||||||||||||||
| Orthogonal transformations simplify things | To produce a transformation vector for y for which the elements are uncorrelated is the same as saying that we want V such that Dy is a diagonal matrix. That is, all the off-diagonal elements of Dy must be zero. This is called an orthogonalizing transformation. | |||||||||||||||||||||||||||||||||||
| Infinite number of values for V | There are an infinite number of values for V that will produce a diagonal Dy for any correlation matrix R. Thus the mathematical problem "find a unique V such that Dy is diagonal" cannot be solved as it stands. A number of famous statisticians such as Karl Pearson and Harold Hotelling pondered this problem and suggested a "variance maximizing" solution. | |||||||||||||||||||||||||||||||||||
| Principal components maximize variance of the transformed elements, one by one | Hotelling (1933) derived the "principal components" solution. It proceeds as follows: for the first principal component, which will be the first element of y and be defined by the coefficients in the first column of V, (denoted by v1), we want a solution such that the variance of y1 will be maximized. | |||||||||||||||||||||||||||||||||||
| Constrain v to generate a unique solution |
The constraint on the numbers in v1 is that the
sum of the squares of the coefficients equals 1. Expressed
mathematically, we wish to maximize
and v1'v1 = 1 ( this is called "normalizing " v1). |
|||||||||||||||||||||||||||||||||||
| Computation of first principal component from R and v1 |
Substituting the middle equation in the first yields
|
|||||||||||||||||||||||||||||||||||
| The eigenstructure | ||||||||||||||||||||||||||||||||||||
| Lagrange multiplier approach |
Let
>
|
|||||||||||||||||||||||||||||||||||
| Set of p homogeneous equations |
The partial differentiation resulted in a set of p homogeneous
equations, which may be written in matrix form as follows
|
|||||||||||||||||||||||||||||||||||
| The characteristic equation | ||||||||||||||||||||||||||||||||||||
| Characterstic equation of R is a polynomial of degree p |
The characteristic equation of R is a polynomial of degree
p, which is obtained by expanding the determinant of
j, j = 1, 2, ..., p.
|
|||||||||||||||||||||||||||||||||||
| Largest eigenvalue |
Specifically, the largest eigenvalue,
1,
and its associated vector, v1, are required.
Solving for this eigenvalue and vector is another mammoth numerical
task that can realistically only be performed by a computer. In
general, software is involved and the algorithms are complex.
|
|||||||||||||||||||||||||||||||||||
| Remainig p eigenvalues | After obtaining the first eigenvalue, the process is repeated until all p eigenvalues are computed. | |||||||||||||||||||||||||||||||||||
| Full eigenstructure of R |
To succinctly define the full eigenstructure of R, we introduce
another matrix L, which is a diagonal matrix with
j
in the jth position on the diagonal.
Then the full eigenstructure of R is given as
|
|||||||||||||||||||||||||||||||||||
| Principal Factors | ||||||||||||||||||||||||||||||||||||
| Scale to zero means and unit variances | It was mentioned before that it is helpful to scale any transformation y of a vector variable z so that its elements have zero means and unit variances. Such a standardized transformation is called a factoring of z, or of R, and each linear component of the transformation is called a factor. | |||||||||||||||||||||||||||||||||||
| Deriving unit variances for principal components |
Now, the principal components already have zero means, but their
variances are not 1; in fact, they are the eigenvalues,
comprising the diagonal elements of L. It is possible to derive
the principal factor with unit variance from the principal component as
follows
|
|||||||||||||||||||||||||||||||||||
| B matrix | The matrix B is then the matrix of factor score coefficients for principal factors. | |||||||||||||||||||||||||||||||||||
| How many Eigenvalues? | ||||||||||||||||||||||||||||||||||||
| Dimensionality of the set of factor scores | The number of eigenvalues, N, used in the final set determines the dimensionality of the set of factor scores. For example, if the original test consisted of 8 measurements on 100 subjects, and we extract 2 eigenvalues, the set of factor scores is a matrix of 100 rows by 2 columns. | |||||||||||||||||||||||||||||||||||
| Eigenvalues greater than unity | Each column or principal factor should represent a number of original variables. Kaiser (1966) suggested a rule-of-thumb that takes as a value for N, the number of eigenvalues larger than unity. | |||||||||||||||||||||||||||||||||||
| Factor Structure | ||||||||||||||||||||||||||||||||||||
| Factor structure matrix S |
The primary interpretative device in principal components is the factor
structure, computed as
|
|||||||||||||||||||||||||||||||||||
| Table showing relation between variables and principal components |
The rij are the correlation coefficients between variable i and principal component j, where i ranges from 1 to 4 and j from 1 to 2. |
|||||||||||||||||||||||||||||||||||
| The communality | SS' is the source of the "explained" correlations among the variables. Its diagonal is called "the communality". | |||||||||||||||||||||||||||||||||||
| Rotation | ||||||||||||||||||||||||||||||||||||
| Factor analysis | If this correlation matrix, i.e., the factor structure matrix, does not help much in the interpretation, it is possible to rotate the axis of the principal components. This may result in the polarization of the correlation coefficients. Some practitioners refer to rotation after generating the factor structure as factor analysis. | |||||||||||||||||||||||||||||||||||
| Varimax rotation |
A popular scheme for rotation was suggested by Henry Kaiser in 1958.
He produced a method for orthogonal rotation of factors, called the
varimax rotation, which cleans up the factors as follows:
|
|||||||||||||||||||||||||||||||||||
| Example |
The following computer output from a principal component analysis on
a 4-variable data set, followed by varimax rotation of the factor
structure, will illustrate his point.
|
|||||||||||||||||||||||||||||||||||
| Communality | ||||||||||||||||||||||||||||||||||||
| Formula for communality statistic |
A measure of how well the selected factors
(principal components) "explain" the variance of each of the variables
is given by a statistic called communality. This is defined by
|
|||||||||||||||||||||||||||||||||||
| Explanation of communality statistic | That is: the square of the correlation of variable k with factor i gives the part of the variance accounted for by that factor. The sum of these squares for n factors is the communality, or explained variable for that variable (row). | |||||||||||||||||||||||||||||||||||
| Roadmap to solve the V matrix | ||||||||||||||||||||||||||||||||||||
| Main steps to obtaining eigenstructure for a correlation matrix |
In summary, here are the main steps to obtain the eigenstructure for
a correlation matrix.
|
|||||||||||||||||||||||||||||||||||