1 Overview

The common factor model represents the view that covariation among a set of observed variables reflects the influence of one or more common factors (i.e., shared latent causes), as well as unexplained variable-specific variance. In the common factor model, items are considered indicators of the latent variables that underlie their covariation (a form of a reflective latent variable model).

Thus, factor analysis partitions variation in the indicators into common variance and unique variance. Common variance reflects the shared influence of underlying factors on an indicator. Unique variances in factor models have the same interpretation as the familiar concept of a disturbance in SEM. That is, unique variance represents a) reliable variation in the item that reflects unknown latent causes, and b) random error due to unreliability or measurement error.

Each indicator has a communality (sometimes denoted \(h^2\)), which is the total variance in an indicator explained by latent factors. The remaining unexplained variation is called an indicator’s uniqueness (\(u^2\)). Factor analyses are conventionally conducted on standardized data (M = 0, SD = 1), meaning that the communality and uniqueness should sum to 1.0 for each indicator (i.e., its total variance).

Altogether, if a factor model is a good representation of the data, then the correlation among items that load onto a factor should be largely attributable to the factor. More generally, a good factor model should have high communality estimates and low uniquenesses for all items. If the uniqueness of an indicator is high (e.g., 0.5), it indicates that variation in the indicator is not explained by the specified factor structure.

Returning to our discussions of latent variables, the common factor model assumes conditional independence among the indicators. That is, there should not be meaningful correlations among indicators after accounting for the factor structure. In SEM, we can relax this assumption in selected cases, but in exploratory factor analysis (EFA), it is built in.

2 Formal specification of the common factor model

The common factor model builds on the mechanics of linear regression, where we view realizations of a dependent variable \(Y\) as a linear combination of multiple predictors, \(\textbf{X}\), plus unexplained variance, \(\varepsilon\). Unlike regression, however, the common factor specifies that observed data reflect a linear combination of latent influences. If we consider a single item, \(\textbf{y}_i\), the model specification is:

2.1 Single item form

\[ \textbf{y}_i = \lambda_{i1} \boldsymbol{\eta}_1 + \lambda_{i2} \boldsymbol{\eta_2} + ... + \lambda_{im} \boldsymbol{\eta}_m + \boldsymbol{\varepsilon}_i \] where \(\lambda_{im}\) reflects the strength of the association between factor \(m\) and indicator \(i\). For example, if we estimated 3 factors and had 8 indicators, the factor loadings would be:

\[ \underset{8 \times 3}{\boldsymbol{\Lambda}_y} = \begin{bmatrix} \lambda_{11} & \lambda_{12} & \lambda_{13} \\ \lambda_{21} & \lambda_{22} & \lambda_{23} \\ \lambda_{31} & \lambda_{32} & \lambda_{33} \\ \lambda_{41} & \lambda_{42} & \lambda_{43} \\ \lambda_{51} & \lambda_{52} & \lambda_{53} \\ \lambda_{61} & \lambda_{62} & \lambda_{63} \\ \lambda_{71} & \lambda_{72} & \lambda_{73} \\ \lambda_{81} & \lambda_{82} & \lambda_{83} \\ \end{bmatrix} \]

Note that in the ‘vanilla’ common factor model of EFA, each item is a weighted combination of all factors, which is often an anti-parsimonious account. In multiple-factor models, scientists often seek to achieve simple structure in which each item has a dominant loading on only one factor.

In EFA, this is often accomplished by rotating the factor solution, a procedure that tries to simplify the pattern of factor loadings by geometrically rotating the orientation of the latent directions. In confirmatory factor analysis (CFA), we often specify a sparse \(\boldsymbol{\Lambda}_y\) matrix in which many improbable factor loadings are fixed at zero. That is, we assert that an observed variable is only a function of a small number of factors (preferably one). This assertion is testable by fitting the hypothesized confirmatory factor model and examining global and local fit.

2.2 Expansion of equations for each indicator

Here is the notation for a simple case in which there are 5 indicators and represented by two factors:

\[ \begin{align*} \textbf{y}_1 &= \lambda_{11} \boldsymbol{\eta}_1 + \lambda_{12} \boldsymbol{\eta}_2 + \boldsymbol{\varepsilon}_1 \\ \textbf{y}_2 &= \lambda_{21} \boldsymbol{\eta}_1 + \lambda_{22} \boldsymbol{\eta}_2 + \boldsymbol{\varepsilon}_2 \\ \textbf{y}_3 &= \lambda_{31} \boldsymbol{\eta}_1 + \lambda_{32} \boldsymbol{\eta}_2 + \boldsymbol{\varepsilon}_3 \\ \textbf{y}_4 &= \lambda_{41} \boldsymbol{\eta}_1 + \lambda_{42} \boldsymbol{\eta}_2 + \boldsymbol{\varepsilon}_4 \\ \textbf{y}_5 &= \lambda_{51} \boldsymbol{\eta}_1 + \lambda_{52} \boldsymbol{\eta}_2 + \boldsymbol{\varepsilon}_5 \\ \end{align*} \]

2.3 Graphical depiction

2.4 Matrix form

We can generalize the factor model to matrix form:

\[ \boldsymbol{y} = \boldsymbol{\Lambda}_y \boldsymbol{\eta} + \boldsymbol{\varepsilon} \]

Nice and simple! :-)

3 Assumptions of the common factor model

The common factor model has several key assumptions:

  1. Unique variances (disturbances) have a mean of zero: \(E(\varepsilon_{i}) = 0\)
  2. Latent factors have mean zero, \(E(\eta_{i}) = 0\).
  3. Latent factors have a variance of one, \(\textrm{var}(\eta_i) = 1\). (Standardized solution)
  4. Unique variances are uncorrelated with each other: \(\textrm{cov}(\varepsilon_{i},\boldsymbol{\varepsilon}_{\backslash i}) = 0\). (Conditional independence)
  5. Latent factors are independent of each other: \(\textrm{cov}(\eta_{i},\boldsymbol{\eta}_{\backslash i}) = 0\)
  6. Latent factors are uncorrelated with unique variances: \(\textrm{cov}(\boldsymbol{\varepsilon} ,\boldsymbol{\eta}) = 0\)

These assumptions are necessary to obtain unique solutions to the model parameters (i.e., identification).

Together, assumptions 1 and 2 yield the appropriate expectation that the mean of the ith indicator is the zero when we solve the equation:

\[ y_i = \lambda_{i1} \eta_1 + \lambda_{i2} \eta_2 + ... + \lambda_{ij} \eta_j + \boldsymbol{\varepsilon}_i \] In this scenario, all terms become zero, yielding a zero estimate of the indicator \(y_i\). If it had a non-zero mean, we would need to include mean structure in the model by adding intercepts to the equations (as in multiple regression):

\[ y_i = \tau_i + \lambda_{i1} \eta_1 + \lambda_{i2} \eta_2 + ... + \lambda_{ij} \eta_j + \boldsymbol{\varepsilon}_i \]

4 Partitioning variance in the common factor model

As we discussed last week, the variance explained in a given indicator i by a factor in the model (e.g., \(\psi_1\)), is given by squaring its (standardized) factor loading. This comes from the equivalency of factor loadings and correlations in the common factor model, assuming uncorrelated factors (otherwise it gets more complicated).

More generally, we can decompose the variance of any indicator in to the part explained by the factor model (i.e., its communality) and residual, unexplained variation (i.e., its uniqueness):

\[ \textrm{var}(y_i) = \lambda_{i1} \psi_{11} + e_i \]

where \(\psi_{11} = 1.0\) (i.e., the requirement that factor variances are unity). Consequently, if we want to estimate the proportion of variance explained in a variable \(y_i\) by the mth factor, it is:

\[ h^2_i = r^2_i = \lambda_{im}^2 \] And proportion of unexplained variance is:

\[ u^2_i = \varepsilon_i = 1 - \lambda_{im}^2 \]

4.1 Communality

These equations only hold if item i loads only onto a single factor. In multiple-factor EFA, this is not the case. Thus, the total explained variance for item i is the sum of squared loadings for all factors multiplied by the corresponding factor scores:

\[ h^2_i = r^2_i = \sum_{j=1}^{m}\lambda_{ij}^2 \]

4.2 Uniqueness

\[ u^2_i = \varepsilon_i = 1 - \sum_{j=1}^{m}\lambda_{ij}^2 \] Again, the \(\eta\) (factor scores) term drops out because it is 1.0 for all factors.

4.3 Covariance

In the case of a single-factor model or a lambda matrix in which two indicators load only on one factor (e.g., a simple structure CFA model), their covariance can be obtained via:

\[ \textrm{cov}(y_1, y_2) = \sigma_{12} = \lambda_{11} \eta_{11} \lambda_{21} \]

5 A quick example of the common factor model in EFA

Consider a (simulated) dataset in which 300 students rated their affinity for different topics of study: biology, geology, chemistry, algebra, calculus, and statistics. Items were rated on a 1-5 scale from ‘strongly dislike’ to ‘strongly like’. Dataset courtesy of John Quick: http://rtutorialseries.blogspot.com/2011/10/r-tutorial-series-exploratory-factor.html.

First, let’s take a look at the descriptive statistics, including correlations among items:

data <- read.csv("quick_efa_data.csv")
describe(data)
vars n mean sd median trimmed mad min max range skew kurtosis se
BIO 1 300 2.35 1.23 2 2.24 1.48 1 5 4 0.496 -0.785 0.071
GEO 2 300 2.17 1.23 2 2.02 1.48 1 5 4 0.654 -0.690 0.071
CHEM 3 300 2.24 1.27 2 2.09 1.48 1 5 4 0.630 -0.767 0.073
ALG 4 300 3.05 1.17 3 3.06 1.48 1 5 4 -0.121 -0.793 0.068
CALC 5 300 3.06 1.13 3 3.09 1.48 1 5 4 -0.250 -0.564 0.065
STAT 6 300 2.94 1.26 3 2.92 1.48 1 5 4 -0.092 -1.017 0.073
round(cor(data), 2)
##       BIO  GEO CHEM  ALG CALC STAT
## BIO  1.00 0.68 0.75 0.12 0.21 0.20
## GEO  0.68 1.00 0.68 0.14 0.20 0.23
## CHEM 0.75 0.68 1.00 0.08 0.14 0.17
## ALG  0.12 0.14 0.08 1.00 0.77 0.41
## CALC 0.21 0.20 0.14 0.77 1.00 0.51
## STAT 0.20 0.23 0.17 0.41 0.51 1.00

At first glance, there appears to be some association among math-related topics and science-related topics.

We could fit a one-factor model to these data in which all six indicators are thought to reflect a common latent factor (e.g., affinity for school). Here, we estimate this factor using maximum likelihood:

f1 <- fa(data, nfactors=1, fm="ml")
print(f1)
## Factor Analysis using method =  ml
## Call: fa(r = data, nfactors = 1, fm = "ml")
## Standardized loadings (pattern matrix) based upon correlation matrix
##       ML1    h2   u2 com
## BIO  0.87 0.753 0.25   1
## GEO  0.79 0.632 0.37   1
## CHEM 0.85 0.722 0.28   1
## ALG  0.17 0.029 0.97   1
## CALC 0.25 0.064 0.94   1
## STAT 0.26 0.067 0.93   1
## 
##                 ML1
## SS loadings    2.27
## Proportion Var 0.38
## 
## Mean item complexity =  1
## Test of the hypothesis that 1 factor is sufficient.
## 
## The degrees of freedom for the null model are  15  and the objective function was  2.87 with Chi Square of  849
## The degrees of freedom for the model are 9  and the objective function was  1.17 
## 
## The root mean square of the residuals (RMSR) is  0.24 
## The df corrected root mean square of the residuals is  0.31 
## 
## The harmonic number of observations is  300 with the empirical chi square  524  with prob <  3.3e-107 
## The total number of observations was  300  with Likelihood Chi Square =  345  with prob <  6.1e-69 
## 
## Tucker Lewis Index of factoring reliability =  0.327
## RMSEA index =  0.356  and the 90 % confidence intervals are  0.322 0.386
## BIC =  294
## Fit based upon off diagonal values = 0.68
## Measures of factor score adequacy             
##                                                    ML1
## Correlation of (regression) scores with factors   0.94
## Multiple R square of scores with factors          0.88
## Minimum correlation of possible factor scores     0.77

We see that much of the variation in BIO, GEO, and CHEM is explained by the latent factor. But only 3-7% of variation in ALG, CALC, and STAT is attributable to the factor – that is, we have low communality estimates and high uniquenesses. This suggests a poor factor solution. Let’s compare against a 2-factor solution:

f2 <- fa(data, nfactors=2, fm="ml", rotate="oblimin")
## Loading required namespace: GPArotation
print(f2)
## Factor Analysis using method =  ml
## Call: fa(r = data, nfactors = 2, rotate = "oblimin", fm = "ml")
## Standardized loadings (pattern matrix) based upon correlation matrix
##        ML2   ML1   h2    u2 com
## BIO   0.86  0.03 0.75 0.252 1.0
## GEO   0.78  0.04 0.63 0.375 1.0
## CHEM  0.88 -0.05 0.75 0.249 1.0
## ALG  -0.04  0.80 0.63 0.374 1.0
## CALC  0.01  0.97 0.95 0.048 1.0
## STAT  0.13  0.49 0.29 0.715 1.1
## 
##                        ML2  ML1
## SS loadings           2.14 1.84
## Proportion Var        0.36 0.31
## Cumulative Var        0.36 0.66
## Proportion Explained  0.54 0.46
## Cumulative Proportion 0.54 1.00
## 
##  With factor correlations of 
##      ML2  ML1
## ML2 1.00 0.21
## ML1 0.21 1.00
## 
## Mean item complexity =  1
## Test of the hypothesis that 2 factors are sufficient.
## 
## The degrees of freedom for the null model are  15  and the objective function was  2.87 with Chi Square of  849
## The degrees of freedom for the model are 4  and the objective function was  0.01 
## 
## The root mean square of the residuals (RMSR) is  0.01 
## The df corrected root mean square of the residuals is  0.02 
## 
## The harmonic number of observations is  300 with the empirical chi square  0.97  with prob <  0.91 
## The total number of observations was  300  with Likelihood Chi Square =  2.94  with prob <  0.57 
## 
## Tucker Lewis Index of factoring reliability =  1
## RMSEA index =  0  and the 90 % confidence intervals are  0 0.076
## BIC =  -19.9
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    ML2  ML1
## Correlation of (regression) scores with factors   0.94 0.98
## Multiple R square of scores with factors          0.89 0.96
## Minimum correlation of possible factor scores     0.77 0.91

Here, we see that a two-factor solution provides a reasonably parsimonious account of the data, with a factor representing ‘science’ (BIO, GEO, and CHEM) and a factor representing ‘math’ (ALG, CALC, and STAT). By default, psych applies an oblimin rotation to the factor loadings to improve their interpretability – especially to shift the solution toward simple structure. This is an oblique transformation; in this case, the factors correlate at \(r = .21\). Note that the two-factor solution explains 66% of the total variation in the data, with each the ‘science’ factor accounting for slightly more variance that than ‘math.’

We could also attempt a 3-factor solution:

f3 <- fa(data, nfactors=3, fm="ml")
print(f3)
## Factor Analysis using method =  ml
## Call: fa(r = data, nfactors = 3, fm = "ml")
## Standardized loadings (pattern matrix) based upon correlation matrix
##        ML2   ML3   ML1   h2    u2 com
## BIO   0.85  0.08 -0.06 0.75 0.253 1.0
## GEO   0.78  0.05  0.00 0.62 0.375 1.0
## CHEM  0.89 -0.10  0.06 0.76 0.242 1.0
## ALG   0.01  0.02  0.98 1.00 0.005 1.0
## CALC -0.02  0.86  0.08 0.85 0.155 1.0
## STAT  0.08  0.61 -0.10 0.32 0.677 1.1
## 
##                        ML2  ML3  ML1
## SS loadings           2.13 1.16 1.00
## Proportion Var        0.35 0.19 0.17
## Cumulative Var        0.35 0.55 0.72
## Proportion Explained  0.50 0.27 0.23
## Cumulative Proportion 0.50 0.77 1.00
## 
##  With factor correlations of 
##      ML2  ML3  ML1
## ML2 1.00 0.25 0.11
## ML3 0.25 1.00 0.80
## ML1 0.11 0.80 1.00
## 
## Mean item complexity =  1
## Test of the hypothesis that 3 factors are sufficient.
## 
## The degrees of freedom for the null model are  15  and the objective function was  2.87 with Chi Square of  849
## The degrees of freedom for the model are 0  and the objective function was  0 
## 
## The root mean square of the residuals (RMSR) is  0.01 
## The df corrected root mean square of the residuals is  NA 
## 
## The harmonic number of observations is  300 with the empirical chi square  0.56  with prob <  NA 
## The total number of observations was  300  with Likelihood Chi Square =  1.32  with prob <  NA 
## 
## Tucker Lewis Index of factoring reliability =  -Inf
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    ML2  ML3  ML1
## Correlation of (regression) scores with factors   0.94 0.94 1.00
## Multiple R square of scores with factors          0.89 0.88 0.99
## Minimum correlation of possible factor scores     0.77 0.76 0.99

Note that the third factor appears to capture CALC specifically and that the correlation between the 2nd and 3rd factors is \(r = 0.8\). Thus, this model does not appear to be interpretable or parsimonious. It also explains little more variation than the 2-factor model (66% versus 72%).

5.1 Looking under the hood of the FA model

The object returned by the fa function from psych gives us a lot of information that we can use to diagnose and interpret the factor model. Much of this is printed in the summary, but we may wish to work with these statistics further. First, let’s examine the estimated communalities:

f2$communalities #note that these will diverge from the sum of squared loadings if we use 'minrank' factor analysis.
##   BIO   GEO  CHEM   ALG  CALC  STAT 
## 0.748 0.625 0.751 0.626 0.952 0.285

And uniquenesses:

f2$uniquenesses
##    BIO    GEO   CHEM    ALG   CALC   STAT 
## 0.2521 0.3745 0.2494 0.3738 0.0477 0.7149

The factor loadings. Note that by default, fa applies the ‘oblimin’ rotation, which allows the factors to correlate in its attempt to achieve simple structure. Therefore, the loadings no longer represent the correlations between items and factors. Instead, they are standardized partial regression coefficients (i.e., the unique effects of a factor on a variable).

f2$loadings
## 
## Loadings:
##      ML2    ML1   
## BIO   0.859       
## GEO   0.782       
## CHEM  0.876       
## ALG          0.799
## CALC         0.974
## STAT  0.126  0.493
## 
##                  ML2   ML1
## SS loadings    2.133 1.835
## Proportion Var 0.356 0.306
## Cumulative Var 0.356 0.661

Note that loadings below 0.1 are suppressed by default. You can change this using print(f2$loadings, cut=0.3) to suppress anything below 0.3, for example.

If an oblique rotation is performed, the Phi matrix contains the correlations among factors:

f2$Phi
##       ML2   ML1
## ML2 1.000 0.212
## ML1 0.212 1.000

When an oblique rotation is applied, we may nevertheless be interested in assessing the correlation then the correlation between indicators and factors is not equal to the summed squared factor loadings because the factors correlate. One must take this into account to understand the relationship between indicators and factors. We can obtain the correlation between a factor and an indicator by multiplying the factor loadings \(\boldsymbol{\Lambda}_y\) by the factor correlation matrix \(\boldsymbol{\Psi}\). This is called the factor structure matrix and is stored in the $Str element of an fa object:

f2$Str
## 
## Loadings:
##      ML2   ML1  
## BIO  0.864 0.209
## GEO  0.790 0.205
## CHEM 0.865 0.134
## ALG  0.128 0.790
## CALC 0.215 0.976
## STAT 0.231 0.519
## 
##                  ML2   ML1
## SS loadings    2.235 1.950
## Proportion Var 0.373 0.325
## Cumulative Var 0.373 0.698

Notice the modest level of correlation between CALC, ALG, and STAT with the ‘science’ factor (similar cross-associations for ‘math’ factor).

We can examine the eigenvalues of the raw correlation matrix:

f2$e.values
## [1] 2.787 1.778 0.630 0.337 0.262 0.206

Note, however, that factor analysis is computed on the reduced correlation matrix \(\boldsymbol{R}_r\), which places the communalities along the diagonal. Unlike PCA, where the model attempts to explain total variance in each indicator, factor analysis tries to explain only variation that is shared among variables due to common factors. In this way, non-shared variance (i.e., uniquenesses) is discarded from the analysis. This is exemplified by the use of a raw correlation matrix in PCA (1.0 on the diagonal) versus reduced correlation matrix in EFA. As a refresher, here’s the raw correlation matrix:

round(cor(data), 3)
##        BIO   GEO  CHEM   ALG  CALC  STAT
## BIO  1.000 0.682 0.747 0.115 0.213 0.203
## GEO  0.682 1.000 0.681 0.135 0.205 0.232
## CHEM 0.747 0.681 1.000 0.084 0.136 0.166
## ALG  0.115 0.135 0.084 1.000 0.771 0.409
## CALC 0.213 0.205 0.136 0.771 1.000 0.507
## STAT 0.203 0.232 0.166 0.409 0.507 1.000

And the reduced correlation matrix based on the 2-factor solution above:

cmat <- cor(data)
diag(cmat) <- f2$communalities
print(cmat)
##        BIO   GEO   CHEM    ALG  CALC  STAT
## BIO  0.748 0.682 0.7470 0.1153 0.213 0.203
## GEO  0.682 0.625 0.6815 0.1354 0.205 0.232
## CHEM 0.747 0.681 0.7506 0.0838 0.136 0.166
## ALG  0.115 0.135 0.0838 0.6262 0.771 0.409
## CALC 0.213 0.205 0.1364 0.7709 0.952 0.507
## STAT 0.203 0.232 0.1660 0.4094 0.507 0.285

As a result, we can also consider the eigenvalues of the reduced correlation matrix:

f2$values
## [1]  2.4855  1.5041  0.0304  0.0170 -0.0178 -0.0317

NB. As pointed out by Preacher, Kaiser’s rule (eigenvalues > 1.0) for deciding on the number of factors is based on the eigenvalues from the original (unreduced) correlation matrix. Kaiser’s rule is discussed below.

We can examine the divergence between the raw correlation matrix and the model-implied correlation matrix using the residuals function. Note that the uniquenesses (\(u^2\)) are along the diagonal.

residuals(f2)
##      BIO   GEO   CHEM  ALG   CALC  STAT 
## BIO   0.25                              
## GEO   0.00  0.37                        
## CHEM  0.00  0.00  0.25                  
## ALG  -0.02  0.00  0.01  0.37            
## CALC  0.00  0.00  0.00  0.00  0.05      
## STAT -0.01  0.03 -0.01  0.00  0.00  0.71

5.2 Plotting factor solutions to aid interpretation

We can plot the factor loadings to get a sense of how items relate to factors.

plot(f2)

Note that more extreme points along each factor axis are a good thing because they reflect a stronger association between the indicator and the factor. Thus, we are not looking for dispersion along the axes per se, but evidence of separability

For reference, things get a bit more complex as we plot 3+ factors, but we can still get some mileage from this:

plot(f3)

Another popular visualization tool for factor analysis is the biplot, which plots the factor scores as points and the factor loadings as directional vectors. As the name suggests, biplots visualize two factors at a time, but can be tiled to look at all pairwise combinations. Ideally, we would like to see vectors for indicators of one factor that are at 90 degrees to indicators of the other factor. This is evidence of the speparability of the indicators (a part of simple structure). The dispersion of factor scores provides information about how correlated the factors are. For example, if the scatterplot reveals an ellipse whose major axis has a positive slope, this is evidence of correlation between factors.

biplot(f2)

biplot(f3)

6 Goal of EFA

(from Brown) The goal of EFA is to evaluate the dimensionality of a set of indicators in order to identify the smallest number of latent factors that explain the pattern of correlations. EFA tries to identify a simple model, which means a) minimizing the number of factors (parsimony), and b) minimizing non-zero factor loadings (simple structure, often achieved through rotation).

EFA is a data-driven exploratory technique with little intervention from the scientist in the estimation of the model. The primary decisions are how many factors to extract and how/whether to rotate the factors for interpretability.

Of course, once one has a set of outputs from EFAs with different dimensionality (number of factors) and rotations, there is considerable involvement of the scientist in interpreting the models. But when we think of EFA as a measurement model, it is important to remember that no restriction are placed a priori on the number of factors to extract or the pattern of relationships between observed and latent variables. As mentioned above, all indicators load onto all factors, even if trivially.

6.1 Estimation methods in EFA

Because this is primarily an SEM class, we won’t spend much time on this but there are two particularly popular methods for estimating: maximum likelihood and principal axis factoring.

Maximum likelihood follows the same approach as in typical SEM, trying to reproduce the observed covariance matrix \(\textbf{S}_{XX}\) on the basis of the model-implied covariance: \(\hat{\boldsymbol{\Sigma}}\). This is achieved by choosing parameter values (e.g., factor loadings) that maximize the sample likelihood function. One advantage of ML for EFA is that the fit (log-likelihood) is on the same basis (i.e., sample likelihood) as other latent variable models (esp. CFA) of the same data (also fit using ML). Furthermore, fit statistics from SEM (e.g., model \(\chi^2\)) can be computed to understand and diagnose misfit. ML assumes that the indicators are distributed as multivariate normal, just like typical SEM. ML is also limited in the number of factors it can extract (see free parameters below) relative to other methods.

Principal axis factoring (PAF) does not assume that the data follow a particular distribution and is less prone to invalid solutions than ML. There is also a much higher limit on the number of factors that can be extracted compared to ML: \(p - 1\) where \(p\) is the number of indicators. PAF is based on iteratively solving for the eigenvalues and eigenvectors underlying the reduced correlation matrix by updating the factor loadings. That is, we try to maximize the communality estimates of the data by updating the factor loadings until no further increase in communalities can be achieved. In this way, we begin with an estimate of communality (usually the squared multiple correlation among items) and try to increase it according to the factor model. Unlike ML, PAF does not provide estimates of parameter uncertainty (e.g., SEs of factor loadings)

7 Selecting an appropriate number of factors in EFA

Interpretation is king: factors should make sense from a conceptual perspective. Moreover, if only two or three items load onto a factor, you may have a) overfactored, or b) the items are not a coherent part of the larger test.

7.1 Kaiser’s rule: eigenvalues > 1.0

The Kaiser-Guttman rule for factor extraction is that all factors should have eigenvalues greater than 1.0. This rule of thumb reflects the intuition that a factor (latent direction through the shared correlational structure) should account for the variance of at least one indicator. Recall that because EFA is run on a correlation matrix, the variances of indicators are all 1.0, and thus the eigenvalues sum to the number of indicators.

Although Kaiser’s rule is nice because of its simplicity, it should not be used in isolation because of the risk of being too dogmatic. For example, if the eigenvalues are \(\boldsymbol{\lambda} = [10, 5, 4, 1.01, 1.01, 0.99, ... \lambda_p ]\), the fourth, fifth, and sixth factors all explain similar variance, but the sixth would be excluded. Likewise, these later factor explain much less variance than the first three, suggesting they may be relatively trivial. Finally, we would need to examine the factor loading matrix to determine the interpretability of the factors, which is a key aspect of factor models.

7.2 Scree test

Cattell’s scree test plots the eigenvalues (from largest to smallest) in order to identify an ‘elbow’ in the function (i.e., a change in slope) that may suggest a natural cut point in selecting the number of factors. Scree is defined as ‘a mass of small loose stones that form or cover a slope on a mountain’ (Google). Here’s a plot of the eigenvalues mentioned above, taken from a correlation matrix with 25 indicators (and thus 25 eigenvalues). I have plotted only the first 10 for clarity.

df <- data.frame(ev=c(10, 5, 4, 1.01, 1.01, 0.99, .12, .1, .08, .06), factor_number=factor(1:10))
ggplot(df, aes(x=factor_number, y=ev, group=1)) + geom_point() + geom_line() + theme_bw(base_size=15) +
  xlab("Factor number") + ylab("Eigenvalue")

As this example, illustrates, identifying the elbow in the scree plot is often an interpretative exercise. Here, there is clearly a strong change in slope between 3 and 4 factors, suggesting that we may with to stop at 3 factors. However, there is a less pronounced slope change between 2 and 3 factors, so a scree test might lead us to consider 2- or 3-factor solutions. Like most factor extraction rubrics, one must examine the factor loadings to decide which solution aligns with theory and is most interpretable.

7.3 Parallel analysis

Parallel analysis compares the magnitude of eigenvalues generated from random data against the eigenvalues from the sample correlation matrix. The idea is that meaningful directions in one’s data should have eigenvalues larger than would be expected by change. Notably, eigenvalues depend on the number of variables, \(p\), and the number of observations, \(n\). Thus, in parallel analysis, one usually simulates data from 100+ random datasets of the same \(p\) and \(n\) as the observed data. The eigenvalues of the ‘parallel’ data are then averaged and plotted against the observed eigvalues.

Consider the nine-variable Thurstone dataset:

describe(Thurstone)
vars n mean sd median trimmed mad min max range skew kurtosis se
Sentences 1 9 0.588 0.222 0.447 0.588 0.099 0.380 1 0.620 0.699 -1.300 0.074
Vocabulary 2 9 0.598 0.217 0.493 0.598 0.090 0.358 1 0.642 0.650 -1.263 0.072
Sent.Completion 3 9 0.575 0.222 0.460 0.575 0.110 0.359 1 0.641 0.716 -1.171 0.074
First.Letters 4 9 0.535 0.202 0.460 0.535 0.117 0.350 1 0.650 1.234 0.310 0.067
Four.Letter.Words 5 9 0.528 0.199 0.446 0.528 0.065 0.367 1 0.633 1.402 0.638 0.066
Suffixes 6 9 0.494 0.216 0.447 0.494 0.181 0.288 1 0.712 1.232 0.533 0.072
Letter.Series 7 9 0.500 0.209 0.432 0.500 0.076 0.288 1 0.712 1.369 0.795 0.070
Pedigrees 8 9 0.517 0.203 0.534 0.517 0.122 0.320 1 0.680 1.283 0.742 0.068
Letter.Group 9 9 0.482 0.210 0.424 0.482 0.096 0.325 1 0.675 1.572 1.176 0.070
print(Thurstone)
##                   Sentences Vocabulary Sent.Completion First.Letters
## Sentences             1.000      0.828           0.776         0.439
## Vocabulary            0.828      1.000           0.779         0.493
## Sent.Completion       0.776      0.779           1.000         0.460
## First.Letters         0.439      0.493           0.460         1.000
## Four.Letter.Words     0.432      0.464           0.425         0.674
## Suffixes              0.447      0.489           0.443         0.590
## Letter.Series         0.447      0.432           0.401         0.381
## Pedigrees             0.541      0.537           0.534         0.350
## Letter.Group          0.380      0.358           0.359         0.424
##                   Four.Letter.Words Suffixes Letter.Series Pedigrees
## Sentences                     0.432    0.447         0.447     0.541
## Vocabulary                    0.464    0.489         0.432     0.537
## Sent.Completion               0.425    0.443         0.401     0.534
## First.Letters                 0.674    0.590         0.381     0.350
## Four.Letter.Words             1.000    0.541         0.402     0.367
## Suffixes                      0.541    1.000         0.288     0.320
## Letter.Series                 0.402    0.288         1.000     0.555
## Pedigrees                     0.367    0.320         0.555     1.000
## Letter.Group                  0.446    0.325         0.598     0.452
##                   Letter.Group
## Sentences                0.380
## Vocabulary               0.358
## Sent.Completion          0.359
## First.Letters            0.424
## Four.Letter.Words        0.446
## Suffixes                 0.325
## Letter.Series            0.598
## Pedigrees                0.452
## Letter.Group             1.000

The eigenvalues of the corresponding correlation matrix are:

round(eigen(Thurstone)$values, 3)
## [1] 4.851 1.090 1.038 0.475 0.448 0.375 0.321 0.234 0.168

Note that there is some debate about whether the scree test (and thus, parallel analysis) should be conducted on the eigenvalues from the raw correlation matrix or on the correlation matrix that reflects only shared variance among the items based on the squared multiple correlation (SMC), or communality. The correlation matrix of just the shared variance among indicators is sometimes called the reduced correlation matrix.

If we want to examine the eigenvalues of the reduced correlation matrix, we typically substitute the SMC onto the diagonal of the correlation matrix. Thus, instead of each element being 1.0, it is the proportion of variance shared by that item with all other items. We can obtain the SMCs using the smc function in psych.

smc(Thurstone)
##         Sentences        Vocabulary   Sent.Completion     First.Letters 
##             0.738             0.750             0.675             0.553 
## Four.Letter.Words          Suffixes     Letter.Series         Pedigrees 
##             0.521             0.426             0.477             0.450 
##      Letter.Group 
##             0.429

We could then obtain the reduced correlation matrix as:

Thurstone_reduced <- Thurstone
diag(Thurstone_reduced) <- smc(Thurstone)
print(Thurstone_reduced)
##                   Sentences Vocabulary Sent.Completion First.Letters
## Sentences             0.738      0.828           0.776         0.439
## Vocabulary            0.828      0.750           0.779         0.493
## Sent.Completion       0.776      0.779           0.675         0.460
## First.Letters         0.439      0.493           0.460         0.553
## Four.Letter.Words     0.432      0.464           0.425         0.674
## Suffixes              0.447      0.489           0.443         0.590
## Letter.Series         0.447      0.432           0.401         0.381
## Pedigrees             0.541      0.537           0.534         0.350
## Letter.Group          0.380      0.358           0.359         0.424
##                   Four.Letter.Words Suffixes Letter.Series Pedigrees
## Sentences                     0.432    0.447         0.447     0.541
## Vocabulary                    0.464    0.489         0.432     0.537
## Sent.Completion               0.425    0.443         0.401     0.534
## First.Letters                 0.674    0.590         0.381     0.350
## Four.Letter.Words             0.521    0.541         0.402     0.367
## Suffixes                      0.541    0.426         0.288     0.320
## Letter.Series                 0.402    0.288         0.477     0.555
## Pedigrees                     0.367    0.320         0.555     0.450
## Letter.Group                  0.446    0.325         0.598     0.452
##                   Letter.Group
## Sentences                0.380
## Vocabulary               0.358
## Sent.Completion          0.359
## First.Letters            0.424
## Four.Letter.Words        0.446
## Suffixes                 0.325
## Letter.Series            0.598
## Pedigrees                0.452
## Letter.Group             0.429

Here, I generate 100 normally distributed random datasets of \(p = 9\) and \(n = 213\).

ff <- fa.parallel(Thurstone, n.iter=100, n.obs=213, fa="fa", plot = TRUE, SMC=TRUE)

## Parallel analysis suggests that the number of factors =  3  and the number of components =  NA
print(ff$fa.values)
## [1]  4.4349  0.6879  0.5145 -0.0482 -0.0655 -0.0874 -0.0962 -0.1458 -0.1750

Thus, parallel analysis suggests that 3 factors may best account for these data. Note that if raw data are provided to fa.parallel, parallel datasets are obtained using resampling (row shuffling) and multivariate normal simulation. These may give slightly different results, and are stored in $fa.sim and $fa.simr.

8 Rotation in EFA

As mentioned above, even though EFA identifies latent directions in the data, there is no guarantee that these directions are inherently interpretable. Thus, we may wish to rotate the axes of the factor

Note that rotation does not change the fit or the amount of variance explained.

Rather, it redistributes the variace across factors to aid in interpretation. This may change the eigenvalues of each factor, but not their sum (i.e., same total variance explained). Rotations should usually be applied after one selects an appropriate number of factors.

There are two types of rotations: orthogonal and oblique. Orthogonal rotations such as ‘varimax’ try to rotate the factors to obtain simple structure while keeping the directions at 90 degrees to each other (uncorrelated). This has the advantage of keeping the standardized factor loadings in correlational units. Oblique rotations allow for the factors to correlate, but are still motivated to achieve simple structure of the loading matrix. In general, assuming that factors are uncorrelated is probably a bad, or at least questionable, idea (cf. Preacher & MacCallum, 2003).

8.1 Unrotated solution (from Brown CFA book)

8.2 Orthogonal rotation

8.3 Oblique rotation

9 EFA versus CFA

EFA is data-driven, no specification from scientist, no constraints on factor loadings.

10 PCA versus EFA

PCA tries to find composites of observed variables that explain the total variation in each item.

EFA provides a structural model of shared variation based on an assumption that common latent factors underlie the data.

PCA is best for creating composites of variables without necessarily assuming interpretability of latent cause (more like formative measurement), whereas EFA is better for identifying potentially interpretable factors (more like reflective measurement).

11 Free parameters in EFA

The number of free parameters estimated in an EFA model is:

\(k = p \cdot m + m(m+1)/2 + p - m^2\)

where \(p\) is the number of indicators, \(m\) is the number of factors, and \(k\) is the number of free parameters.

11.1 Factor loadings

In EFA, the \(\boldsymbol{\Lambda}_y\) matrix is \(p \times m\) in dimension (i.e., indicators x factors). Moreover, unlike CFA, where some elements of the factor matrix can be fixed at zero (a priori specification), in EFA, all elements are free parameters:

\[ \underset{p \times m}{\boldsymbol{\Lambda}_y} = \begin{bmatrix} \lambda_{11} & \lambda_{12} & \dots & \lambda_{1m} \\ \lambda_{21} & \lambda_{22} & \dots & \lambda_{2m} \\ \vdots & \vdots & \ddots & \vdots \\ \lambda_{p1} & \lambda_{p2} & \dots & \lambda_{pm} \end{bmatrix} \]

11.2 Factor variances and covariances

\(\boldsymbol{\Psi}\) is a square matrix representing variances and covariances among factors. Thus, like the typical covariance matrix, the number of free parameters in \(\boldsymbol{\Psi}\) is \(m(m+1)/2\).

11.3 Residual variances

Each indicator also has a free parameter in the diagonal of the \(\boldsymbol{\Theta}_\varepsilon\) matrix that quantifies its residual variance:

\[ \underset{p \times p}{\boldsymbol{\Theta}_\varepsilon} = \begin{bmatrix} \varepsilon_{11} & 0 & \dots & 0 \\ 0 & \varepsilon_{22} & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & \varepsilon_{pp} \end{bmatrix} \]

Under the assumption of conditional independence in the factor model, the residual correlations indicators are assumed to be zero. In SEM, we can free off-diagonal elements of this matrix, but this is not a typical part of EFA.

11.4 Identification constraints

Why is \(- m^2\) the last term in the number of parameters to be estimated? In essence, to identify a factor model and obtain an interpretable estimates, there must be some constraint on the solution. In standard EFA, variances are fixed to 1, so we save m parameters. We also fix factor covariances to zero: \(m(m-1)/2\). Other restrictions include fixing the loading of an anchor item onto each factor and fixing anchor loadings to 0 on other factors. There are actually many ways to impose these identifications constraints, as we’ll get to in SEM proper.

12 Confirmatory factor analysis (CFA)

The common factor model underlies both exploratory and confirmatory factor analyses. The crucial difference in CFA, however, is that the investigator must specify the expected relationships between indicators and factors. Thus, one must have an explicit account of what factors underlie each item. This account then informs how the \(\boldsymbol{\Lambda}_y\) matrix is defined. For example, if we havee 6 indicators that we believe load onto two factors (first three, second three):

\[ \underset{6 \times 2}{\boldsymbol{\Lambda}_y} = \begin{bmatrix} \lambda_{11} & 0 \\ \lambda_{21} & 0 \\ \lambda_{31} & 0 \\ 0 & \lambda_{42} \\ 0 & \lambda_{52} \\ 0 & \lambda_{62} \\ \end{bmatrix} \]

Here, instead of estimating 12 parameters as in EFA, we estimate only 6 and enforce simple structure. Whether this provides a good account of the data is a matter of model fit.

Also, recall the bifactor model in which some items may have both general and specific causes:

In addition to requiring that the scientist specify the measurement model, CFA also allows for certain assumptions of EFA to be relaxed. For example, in the six indicator model above, we could relax the assumption of conditional independence by freeing the residual association of two indicators, \(Y_1\) and \(Y_2\):

\[ \underset{6 \times 6}{\boldsymbol{\Theta}_\varepsilon} = \begin{bmatrix} \varepsilon_{1} & \varepsilon_{7} & 0 & 0 & 0 & 0 \\ \varepsilon_{7} & \varepsilon_{2} & 0 & 0 & 0 & 0 \\ 0 & 0 & \varepsilon_{3} & 0 & 0 & 0 \\ 0 & 0 & 0 & \varepsilon_{4} & 0 & 0 \\ 0 & 0 & 0 & 0 & \varepsilon_{5} & 0 \\ 0 & 0 & 0 & 0 & 0 & \varepsilon_{6} \end{bmatrix} \]

12.1 CFA lavaan demo

We start with lavaan’s default, which is to fix the first factor loading of each factor to 1.0 and estimate the variance of the factor.

Note that passing standardized=TRUE to summary prints out the standardized parameter estimates alongside the regular set. Note that Std.lv refers to a model in which the variances of the factors are fixed at 1.0, but the indicators themselves are not standardized. The Std.all column indicates that both latent variables and indicators have been standardized (mean = 0, SD = 1), which is closest to the conventional EFA model. That is, in the case of uncorrelated factors, loadings represent the correlation between an indicator and its factor.

12.2 Model with orthogonal (uncorrelated) factors

We force the factors to be orthogonal in the first model by specifying math ~~ 0*sci.

msyn <- '
math =~ 1*CALC + ALG + STAT #putting the 1* here for clarity, but it is the lavaan default
sci =~ 1*BIO + GEO + CHEM
math ~~ 0*sci #force orthogonal (uncorrelated) factors
math ~~ math #explicitly indicate that lavaan should estimate a free parameters for factor variances
sci ~~ sci
'

mcfa <- cfa(msyn, data)
summary(mcfa, standardized=TRUE)
## lavaan (0.5-23.1097) converged normally after  23 iterations
## 
##   Number of observations                           300
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic               27.024
##   Degrees of freedom                                 9
##   P-value (Chi-square)                           0.001
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   math =~                                                               
##     CALC              1.000                               1.100    0.977
##     ALG               0.841    0.076   11.116    0.000    0.925    0.789
##     STAT              0.593    0.072    8.226    0.000    0.652    0.519
##   sci =~                                                                
##     BIO               1.000                               1.060    0.865
##     GEO               0.914    0.059   15.400    0.000    0.969    0.789
##     CHEM              1.036    0.062   16.604    0.000    1.097    0.864
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   math ~~                                                               
##     sci               0.000                               0.000    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     math              1.209    0.139    8.672    0.000    1.000    1.000
##     sci               1.123    0.128    8.799    0.000    1.000    1.000
##    .CALC              0.057    0.094    0.604    0.546    0.057    0.045
##    .ALG               0.519    0.079    6.600    0.000    0.519    0.378
##    .STAT              1.154    0.100   11.560    0.000    1.154    0.731
##    .BIO               0.379    0.056    6.730    0.000    0.379    0.252
##    .GEO               0.569    0.061    9.355    0.000    0.569    0.378
##    .CHEM              0.410    0.060    6.770    0.000    0.410    0.254

12.3 Model with correlated factors

How does model fit change if we allow the factors to correlate? This is accomplished by estimating a free parameter, and we can compare to the orthogonal version using a likelihood ratio test (LRT; which tests difference in model fit).

msyn_corr <- '
math =~ 1*CALC + ALG + STAT #putting the 1* here for clarity, but it is the lavaan default
sci =~ 1*BIO + GEO + CHEM
math ~~ sci #estimate correlation between factors
math ~~ math #explicitly indicate that lavaan should estimate a free parameters for factor variances
sci ~~ sci
'

mcfa_corr <- cfa(msyn_corr, data)
summary(mcfa_corr, standardized=TRUE)
## lavaan (0.5-23.1097) converged normally after  24 iterations
## 
##   Number of observations                           300
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic               14.452
##   Degrees of freedom                                 8
##   P-value (Chi-square)                           0.071
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   math =~                                                               
##     CALC              1.000                               1.102    0.980
##     ALG               0.836    0.072   11.623    0.000    0.922    0.786
##     STAT              0.591    0.070    8.399    0.000    0.652    0.519
##   sci =~                                                                
##     BIO               1.000                               1.065    0.869
##     GEO               0.912    0.059   15.504    0.000    0.971    0.791
##     CHEM              1.024    0.061   16.668    0.000    1.090    0.858
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   math ~~                                                               
##     sci               0.259    0.075    3.429    0.001    0.220    0.220
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     math              1.215    0.136    8.940    0.000    1.000    1.000
##     sci               1.133    0.128    8.871    0.000    1.000    1.000
##    .CALC              0.051    0.088    0.572    0.567    0.051    0.040
##    .ALG               0.524    0.075    6.968    0.000    0.524    0.381
##    .STAT              1.154    0.099   11.621    0.000    1.154    0.731
##    .BIO               0.368    0.056    6.617    0.000    0.368    0.245
##    .GEO               0.565    0.061    9.331    0.000    0.565    0.375
##    .CHEM              0.425    0.060    7.066    0.000    0.425    0.263
#likelihood ratio test
anova(mcfa, mcfa_corr)
Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
mcfa_corr 8 4980 5029 14.5 NA NA NA
mcfa 9 4991 5035 27.0 12.6 1 0

The LRT (and AIC difference) provides strong evidence that the factors are meaningfully correlated. Forcing them to be orthogonal significantly worsens fit.