# 1 Overview

The common factor model represents the view that covariation among a set of observed variables reflects the influence of one or more common factors (i.e., shared latent causes), as well as unexplained variable-specific variance. In the common factor model, items are considered indicators of the latent variables that underlie their covariation (a form of a reflective latent variable model).

Thus, factor analysis partitions variation in the indicators into common variance and unique variance. Common variance reflects the shared influence of underlying factors on an indicator. Unique variances in factor models have the same interpretation as the familiar concept of a disturbance in SEM. That is, unique variance represents a) reliable variation in the item that reflects unknown latent causes, and b) random error due to unreliability or measurement error.

Each indicator has a communality (sometimes denoted $$h^2$$), which is the total variance in an indicator explained by latent factors. The remaining unexplained variation is called an indicator’s uniqueness ($$u^2$$). Factor analyses are conventionally conducted on standardized data (M = 0, SD = 1), meaning that the communality and uniqueness should sum to 1.0 for each indicator (i.e., its total variance).

Altogether, if a factor model is a good representation of the data, then the correlation among items that load onto a factor should be largely attributable to the factor. More generally, a good factor model should have high communality estimates and low uniquenesses for all items. If the uniqueness of an indicator is high (e.g., 0.5), it indicates that variation in the indicator is not explained by the specified factor structure.

Returning to our discussions of latent variables, the common factor model assumes conditional independence among the indicators. That is, there should not be meaningful correlations among indicators after accounting for the factor structure. In SEM, we can relax this assumption in selected cases, but in exploratory factor analysis (EFA), it is built in.

# 2 Formal specification of the common factor model

The common factor model builds on the mechanics of linear regression, where we view realizations of a dependent variable $$Y$$ as a linear combination of multiple predictors, $$\textbf{X}$$, plus unexplained variance, $$\varepsilon$$. Unlike regression, however, the common factor specifies that observed data reflect a linear combination of latent influences. If we consider a single item, $$\textbf{y}_i$$, the model specification is:

## 2.1 Single item form

$\textbf{y}_i = \lambda_{i1} \boldsymbol{\eta}_1 + \lambda_{i2} \boldsymbol{\eta_2} + ... + \lambda_{im} \boldsymbol{\eta}_m + \boldsymbol{\varepsilon}_i$ where $$\lambda_{im}$$ reflects the strength of the association between factor $$m$$ and indicator $$i$$. For example, if we estimated 3 factors and had 8 indicators, the factor loadings would be:

$\underset{8 \times 3}{\boldsymbol{\Lambda}_y} = \begin{bmatrix} \lambda_{11} & \lambda_{12} & \lambda_{13} \\ \lambda_{21} & \lambda_{22} & \lambda_{23} \\ \lambda_{31} & \lambda_{32} & \lambda_{33} \\ \lambda_{41} & \lambda_{42} & \lambda_{43} \\ \lambda_{51} & \lambda_{52} & \lambda_{53} \\ \lambda_{61} & \lambda_{62} & \lambda_{63} \\ \lambda_{71} & \lambda_{72} & \lambda_{73} \\ \lambda_{81} & \lambda_{82} & \lambda_{83} \\ \end{bmatrix}$

Note that in the ‘vanilla’ common factor model of EFA, each item is a weighted combination of all factors, which is often an anti-parsimonious account. In multiple-factor models, scientists often seek to achieve simple structure in which each item has a dominant loading on only one factor.

In EFA, this is often accomplished by rotating the factor solution, a procedure that tries to simplify the pattern of factor loadings by geometrically rotating the orientation of the latent directions. In confirmatory factor analysis (CFA), we often specify a sparse $$\boldsymbol{\Lambda}_y$$ matrix in which many improbable factor loadings are fixed at zero. That is, we assert that an observed variable is only a function of a small number of factors (preferably one). This assertion is testable by fitting the hypothesized confirmatory factor model and examining global and local fit.

## 2.2 Expansion of equations for each indicator

Here is the notation for a simple case in which there are 5 indicators and represented by two factors:

\begin{align*} \textbf{y}_1 &= \lambda_{11} \boldsymbol{\eta}_1 + \lambda_{12} \boldsymbol{\eta}_2 + \boldsymbol{\varepsilon}_1 \\ \textbf{y}_2 &= \lambda_{21} \boldsymbol{\eta}_1 + \lambda_{22} \boldsymbol{\eta}_2 + \boldsymbol{\varepsilon}_2 \\ \textbf{y}_3 &= \lambda_{31} \boldsymbol{\eta}_1 + \lambda_{32} \boldsymbol{\eta}_2 + \boldsymbol{\varepsilon}_3 \\ \textbf{y}_4 &= \lambda_{41} \boldsymbol{\eta}_1 + \lambda_{42} \boldsymbol{\eta}_2 + \boldsymbol{\varepsilon}_4 \\ \textbf{y}_5 &= \lambda_{51} \boldsymbol{\eta}_1 + \lambda_{52} \boldsymbol{\eta}_2 + \boldsymbol{\varepsilon}_5 \\ \end{align*}

## 2.3 Graphical depiction ## 2.4 Matrix form

We can generalize the factor model to matrix form:

$\boldsymbol{y} = \boldsymbol{\Lambda}_y \boldsymbol{\eta} + \boldsymbol{\varepsilon}$

Nice and simple! :-)

# 3 Assumptions of the common factor model

The common factor model has several key assumptions:

1. Unique variances (disturbances) have a mean of zero: $$E(\varepsilon_{i}) = 0$$
2. Latent factors have mean zero, $$E(\eta_{i}) = 0$$.
3. Latent factors have a variance of one, $$\textrm{var}(\eta_i) = 1$$. (Standardized solution)
4. Unique variances are uncorrelated with each other: $$\textrm{cov}(\varepsilon_{i},\boldsymbol{\varepsilon}_{\backslash i}) = 0$$. (Conditional independence)
5. Latent factors are independent of each other: $$\textrm{cov}(\eta_{i},\boldsymbol{\eta}_{\backslash i}) = 0$$
6. Latent factors are uncorrelated with unique variances: $$\textrm{cov}(\boldsymbol{\varepsilon} ,\boldsymbol{\eta}) = 0$$

These assumptions are necessary to obtain unique solutions to the model parameters (i.e., identification).

Together, assumptions 1 and 2 yield the appropriate expectation that the mean of the ith indicator is the zero when we solve the equation:

$y_i = \lambda_{i1} \eta_1 + \lambda_{i2} \eta_2 + ... + \lambda_{ij} \eta_j + \boldsymbol{\varepsilon}_i$

In this scenario, all terms become zero, yielding a zero estimate of the indicator $$y_i$$. If it had a non-zero mean, we would need to include mean structure in the model by adding intercepts to the equations (as in multiple regression):

$y_i = \tau_i + \lambda_{i1} \eta_1 + \lambda_{i2} \eta_2 + ... + \lambda_{ij} \eta_j + \boldsymbol{\varepsilon}_i$

# 4 Partitioning variance in the common factor model

As we discussed last week, the variance explained in a given indicator i by a factor in the model (e.g., $$\psi_1$$), is given by squaring its (standardized) factor loading. This comes from the equivalency of factor loadings and correlations in the common factor model, assuming uncorrelated factors (otherwise it gets more complicated).

More generally, we can decompose the variance of any indicator in to the part explained by the factor model (i.e., its communality) and residual, unexplained variation (i.e., its uniqueness):

$\textrm{var}(y_i) = \lambda_{i1} \psi_{11} + e_i$

where $$\psi_{11} = 1.0$$ (i.e., the requirement that factor variances are unity). Consequently, if we want to estimate the proportion of variance explained in a variable $$y_i$$ by the mth factor, it is:

$h^2_i = r^2_i = \lambda_{im}^2$ And proportion of unexplained variance is:

$u^2_i = \varepsilon_i = 1 - \lambda_{im}^2$

## 4.1 Communality

These equations only hold if item i loads only onto a single factor. In multiple-factor EFA, this is not the case. Thus, the total explained variance for item i is the sum of squared loadings for all factors multiplied by the corresponding factor scores:

$h^2_i = r^2_i = \sum_{j=1}^{m}\lambda_{ij}^2$

## 4.2 Uniqueness

$u^2_i = \varepsilon_i = 1 - \sum_{j=1}^{m}\lambda_{ij}^2$ Again, the $$\eta$$ (factor scores) term drops out because it is 1.0 for all factors.

## 4.3 Covariance

In the case of a single-factor model or a lambda matrix in which two indicators load only on one factor (e.g., a simple structure CFA model), their covariance can be obtained via:

$\textrm{cov}(y_1, y_2) = \sigma_{12} = \lambda_{11} \eta_{11} \lambda_{21}$

# 5 A quick example of the common factor model in EFA

Consider a (simulated) dataset in which 300 students rated their affinity for different topics of study: biology, geology, chemistry, algebra, calculus, and statistics. Items were rated on a 1-5 scale from ‘strongly dislike’ to ‘strongly like’. Dataset courtesy of John Quick: http://rtutorialseries.blogspot.com/2011/10/r-tutorial-series-exploratory-factor.html.

First, let’s take a look at the descriptive statistics, including correlations among items:

data <- read.csv("quick_efa_data.csv")
describe(data) %>% select(-vars, -trimmed, -mad, -se)
n mean sd median min max range skew kurtosis
BIO 300 2.35 1.23 2 1 5 4 0.496 -0.785
GEO 300 2.17 1.23 2 1 5 4 0.654 -0.690
CHEM 300 2.24 1.27 2 1 5 4 0.630 -0.767
ALG 300 3.05 1.17 3 1 5 4 -0.121 -0.793
CALC 300 3.06 1.13 3 1 5 4 -0.250 -0.564
STAT 300 2.94 1.26 3 1 5 4 -0.092 -1.017
round(cor(data), 2)
##       BIO  GEO CHEM  ALG CALC STAT
## BIO  1.00 0.68 0.75 0.12 0.21 0.20
## GEO  0.68 1.00 0.68 0.14 0.20 0.23
## CHEM 0.75 0.68 1.00 0.08 0.14 0.17
## ALG  0.12 0.14 0.08 1.00 0.77 0.41
## CALC 0.21 0.20 0.14 0.77 1.00 0.51
## STAT 0.20 0.23 0.17 0.41 0.51 1.00
ggcorrplot(cor(data), outline.col = "black", lab=TRUE)