6.
Process or Product Monitoring and Control
6.5. Tutorials 6.5.5. Principal Components


Orthogonalizing Transformations  
Transformation from \({\bf z}\) to \({\bf y}\)  The equation \({\bf y} = {\bf V}'{\bf z}\) represents a transformation, where \({\bf y}\) is the transformed variable, \({\bf z}\) is the original standardized variable, and \({\bf V}\) is the premultiplier to go from \({\bf z}\) to \({\bf y}\).  
Orthogonal transformations simplify things  To produce a transformation vector for \({\bf y}\) for which the elements are uncorrelated is the same as saying that we want \({\bf V}\) such that \({\bf D}_{\bf y}\) is a diagonal matrix. That is, all the offdiagonal elements of \({\bf D}_{\bf y}\) must be zero. This is called an orthogonalizing transformation.  
Infinite number of values for \({\bf V}\)  There are an infinite number of values for \({\bf V}\) that will produce a diagonal \({\bf D}_{\bf y}\) for any correlation matrix \({\bf R}\). Thus the mathematical problem "find a unique \({\bf V}\) such that \({\bf D}_{\bf y}\) is diagonal" cannot be solved as it stands. A number of famous statisticians such as Karl Pearson and Harold Hotelling pondered this problem and suggested a "variance maximizing" solution.  
Principal components maximize variance of the transformed elements, one by one  Hotelling (1933) derived the "principal components" solution. It proceeds as follows: for the first principal component, which will be the first element of \({\bf y}\) and be defined by the coefficients in the first column of \({\bf V}\), (denoted by \({\bf v}_1\)), we want a solution such that the variance of \({\bf y}_1\) will be maximized.  
Constrain \({\bf v}\) to generate a unique solution  The constraint on the numbers in \({\bf v}_1\) is that the sum of the squares of the coefficients equals 1. Expressed mathematically, we wish to maximize $$ \frac{1}{N} \sum_{i=1}^N Y_{1i}^2 \, , $$ where $$ y_{1i} = {\bf v}_1' {\bf z}_i \, , $$ and \({\bf v}_1'{\bf v}_1 = 1\) (this is called "normalizing" \({\bf v}_1\)).  
Computation of first principal component from \({\bf R}\) and \({\bf v}_1\)  Substituting the middle equation in the first yields $$ \frac{1}{N} \sum_{i=1}^N Y_{1i}^2 = {\bf v}_1' {\bf R} {\bf v}_1 \, , $$ where \({\bf R}\) is the correlation matrix of \({\bf Z}\), which, in turn, is the standardized matrix of \({\bf X}\), the original data matrix. Therefore, we want to maximize \({\bf v}_1' {\bf R} {\bf v}_1\) subject to \({\bf v}_1'{\bf v}_1 = 1\).  
The eigenstructure  
Lagrange multiplier approach  Let $$ \phi_1 = {\bf v}_1' {\bf R} {\bf v}_1  \lambda_1({\bf v}_1'{\bf v}_1  1) $$ introduce the restriction on \({\bf v}_1\) via the Lagrange multiplier approach. It can be shown (T.W. Anderson, 1958, page 347, theorem 8) that the vector of partial derivatives is $$ \frac{\partial \phi_1}{\partial {\bf v}_1} = 2 {\bf R} {\bf v}_1  2 \lambda_1 {\bf v}_1 \, , $$ and setting this equal to zero, dividing out 2, and factoring, gives $$ ({\bf R}  \lambda_1 {\bf I}) {\bf v}_1 = 0 \, . $$ This is known as "the problem of the eigenstructure of \({\bf R}\)".  
Set of \(p\) homogeneous equations  The partial differentiation resulted in a set of \(p\) homogeneous equations, which may be written in matrix form as follows. $$ \left[ \begin{array}{cccc} (1\lambda_i) & r_{12} & \cdots & r_{1p} \\ r_{21} & (1\lambda_i) & \cdots & r_{2p} \\ \vdots & \vdots & & \vdots \\ r_{p1} & r_{p2} & \cdots & (1\lambda_i) \end{array} \right] \left[ \begin{array}{c} v_{1i} \\ v_{2i} \\ \vdots \\ v_{pi} \end{array} \right] = \left[ \begin{array}{c} 0 \\ 0 \\ \vdots \\ 0 \end{array} \right] $$  
The characteristic equation  
Characterstic equation of \({\bf R}\) is a polynomial of degree \(p\)  The characteristic equation of \({\bf R}\) is a polynomial of degree \(p\), which is obtained by expanding the determinant of $$ {\bf R}  \lambda {\bf I} = \left \begin{array}{cccc} r_{11}\lambda & r_{12} & \cdots & r_{1p} \\ r_{21} & r_{22}\lambda & \cdots & r_{2p} \\ \vdots & \vdots & & \vdots \\ r_{p1} & r_{p2} & \cdots & r_{pp}\lambda \end{array} \right = 0 \, , $$ and solving for the roots \(\lambda_j, \, j = 1, \, 2, \, \ldots, \, p\).  
Largest eigenvalue  Specifically, the largest eigenvalue, \(\lambda_1\), and its associated vector, \({\bf v}_1\), are required. Solving for this eigenvalue and vector is another mammoth numerical task that can realistically only be performed by a computer. In general, software is involved and the algorithms are complex.  
Remaining \(p\) eigenvalues  After obtaining the first eigenvalue, the process is repeated until all \(p\) eigenvalues are computed.  
Full eigenstructure of \({\bf R}\) 
To succinctly define the full eigenstructure of \({\bf R}\),
we introduce another matrix \({\bf L}\)
which is a diagonal matrix with \(\lambda_j\)
in the \(j\)th
position on the diagonal.
Then the full eigenstructure of \({\bf R}\)
is given as


Principal Factors  
Scale to zero means and unit variances  It was mentioned before that it is helpful to scale any transformation \({\bf y}\) of a vector variable \({\bf z}\) so that its elements have zero means and unit variances. Such a standardized transformation is called a factoring of \({\bf z}\), or of \({\bf R}\), and each linear component of the transformation is called a factor.  
Deriving unit variances for principal components  Now, the principal components already have zero means, but their variances are not 1; in fact, they are the eigenvalues, comprising the diagonal elements of \({\bf L}\). It is possible to derive the principal factor with unit variance from the principal component as follows: $$f_i = \frac{y_i}{\sqrt{\lambda}} \, , $$ or for all factors, $$ f = {\bf L}^{1/2}{\bf y} \, . $$ Substituting \({\bf V}'{\bf z}\) for \({\bf y}\) we have $$f = {\bf L}^{1/2} {\bf V}' {\bf z} = {\bf B}'{\bf z} \, , $$ where $${\bf B} = {\bf VL}^{1/2} \, . $$  
\({\bf B}\) matrix  The matrix \({\bf B}\) is then the matrix of factor score coefficients for principal factors.  
How many Eigenvalues?  
Dimensionality of the set of factor scores  The number of eigenvalues, \(N\), used in the final set determines the dimensionality of the set of factor scores. For example, if the original test consisted of 8 measurements on 100 subjects, and we extract 2 eigenvalues, the set of factor scores is a matrix of 100 rows by 2 columns.  
Eigenvalues greater than unity  Each column or principal factor should represent a number of original variables. Kaiser (1966) suggested a ruleofthumb that takes as a value for \(N\), the number of eigenvalues larger than unity.  
Factor Structure  
Factor structure matrix \({\bf S}\)  The primary interpretative device in principal components is the factor structure, computed as $$ {\bf S} = {\bf VL}^{1/2} \, . $$ \({\bf S}\) is a matrix whose elements are the correlations between the principal components and the variables. If we retain, for example, two eigenvalues, meaning that there are two principal components, then the \({\bf S}\) matrix consists of two columns and \(p\) (number of variables) rows.  
Table showing relation between variables and principal components 
The \(r_{ij}\) are the correlation coefficients between variable \(i\) and principal component \(j\), where \(i\) ranges from 1 to 4 and \(j\) from 1 to 2. 

The communality  \({\bf SS}'\) is the source of the "explained" correlations among the variables. Its diagonal is called "the communality".  
Rotation  
Factor analysis  If this correlation matrix, i.e., the factor structure matrix, does not help much in the interpretation, it is possible to rotate the axis of the principal components. This may result in the polarization of the correlation coefficients. Some practitioners refer to rotation after generating the factor structure as factor analysis.  
Varimax rotation 
A popular scheme for rotation was suggested by Henry Kaiser in 1958.
He produced a method for orthogonal rotation of factors, called the
varimax rotation, which cleans up the factors as follows:


Example 
The following computer output from a principal component analysis on
a fourvariable data set, followed by varimax rotation of the factor
structure, will illustrate his point.


Communality  
Formula for communality statistic  A measure of how well the selected factors (principal components) "explain" the variance of each of the variables is given by a statistic called communality. This is defined by $$ h_k^2 = \sum_{i=1}^k S_{ki}^2 \, . $$  
Explanation of communality statistic  That is: the square of the correlation of variable \(k\) with factor \(i\) gives the part of the variance accounted for by that factor. The sum of these squares for \(n\) factors is the communality, or explained variable for that variable (row).  
Roadmap to solve the V matrix  
Main steps to obtaining eigenstructure for a correlation matrix 
In summary, here are the main steps to obtain the eigenstructure for
a correlation matrix.
