a branch of multivariate analysis embracing methods for estimating the dimensions of a set of observed variables by studying the structure of the covariance or correlation matrices.
The basic assumption underlying factor analysis is that the correlations between a large number of observable variables are determined by the existence of a smaller number of hypothetical unobservable variables, or factors. A general model for factor analysis is provided in terms of the random variables X1 . . .,Xn, which are the observation results, by the following linear model:
Here, the random variables fj are common factors, the random variables Ui are factors specific to the variables Xi and are not correlated with the fj, and the εj, are random errors. It is assumed that k < n, that the random variables ej are independent of each other and of the fj and Ui, and that E∊i = 0 and D∊i = The constant coefficients aij are called loadings (weights): aij is the loading of the ith variable on the jth factor. The quantities aij, bi, and are taken as unknown parameters that have to be estimated.
In the form given above, the model for factor analysis is characterized by some indeterminacy, since n variables are expressed in terms of n + k other variables. Equations (*), however, imply a hypothesis, regarding the covariance matrix, that can be tested. For example, if the factors fj are uncorrelated, Dfi = 1, Bi = 0, and cij are the elements of the matrix of covariances between the Xi, then there follows from equation (*) an expression for the cij in terms of the loadings and the variances of the errors:
The general model for factor analysis is thus equivalent to a hypothesis regarding the covariance matrix: the covariance matrix can be represented as the sum of the matrix A A’ and the diagonal matrix with elements , where
A = {aij}
The estimation procedure in factor analysis consists of two steps. First, the factor structure (that is, the number of factors required to account for the correlations between the Xi) is determined, and the loadings are estimated. Second, the factors are estimated on the basis of the observation results. The fundamental obstacle to the interpretation of the set of factors is that for k > 1 neither the loadings nor the factors can be determined uniquely, since the factors fj in equations (*) can be replaced by means of any orthogonal transformation. This property of the model is made use of to transform (rotate) the factors; the transformation is chosen so that the observed variables have the maximum possible loadings on one factor and minimum possible loadings on the remaining factors.
Various practical methods are known for estimating loadings. The methods assume that X1,. . ., Xn obey a multivariate normal distribution with covariance matrix C = {cij}. The maximum likelihood method is noteworthy. It leads to a unique set of estimates of the cij, but for the estimates of the aij it yields equations that are satisfied by an infinite set of solutions with equally good statistical properties.
Factor analysis is regarded as dating from 1904. Although it was originally developed for problems in psychology, the range of its applications is much broader, and it is now used to solve various practical problems in such fields as medicine, economics, and chemistry. A rigorous theoretical grounding, however, has not yet been provided for many results and methods of factor analysis that are widely used in practice. The mathematical description of modern factor analysis in a rigorous manner is an extremely difficult task and remains uncompleted.
REFERENCES
Lawley, D., and A. Maxwell. Faktornyi analiz kak statisticheskii metod. Moscow, 1967. (Translated from English.)
Harman, H. Sovremennyi faktornyi analiz. Moscow, 1972. (Translated from English.)A. V. PROKHOROV