1 Purpose of SEM

Conceptually, the goal of structural equation modeling (SEM) is to test whether a theoretically motivated model of the covariance among variables provides a good approximation of the data.

More specifically, we are trying to test how well a parsimonious model (composed of measurement and/or structural components) reproduces the observed covariance matrix. Formally, we are seeking to develop a model whose model-implied covariance matrix approaches the sample (observed) covariance matrix.

\[ \mathbf{S_{XX}} \approx \boldsymbol{\Sigma}(\hat{\boldsymbol{\theta}}) \] In the case where we have positive degrees of freedom (overidentified model), \(\boldsymbol{\Sigma}(\hat{\boldsymbol{\theta}})\) is an approximation of \(\mathbf{S_{XX}}\). That is, in SEM, we (usually) have more equations than unknowns, meaning that many parameters could yield a model-implied covariance matrix that approaches the sample covariance matrix. This is a feature, not a bug — we are hoping to identify a parsimonious model whose parameters best fit the observed data.

Thus, there will always be some approximation error. One can think of this as the multivariate extension of residuals in the linear regression case: \(Y - \hat{Y}\). More specifically, as in ANOVA and regression, we can decompose sources of error into deviations from the ‘true’ covariance matrix of the variables in the population, \(\mathbf{\Sigma}\).

Let’s get technical for a moment. The population covariance matrix \(\mathbf{\Sigma}\) does not depend on estimates of each cell derived from a specific sample, nor does it depend on parameters from a statistical model. If we could observe the population covariance matrix and a plausible, but imperfect, SEM at the true population parameters, we could also derive a model-implied covariance matrix for the population, \(\boldsymbol{\Sigma}(\boldsymbol{\theta})\). The extent to which this model approximates the true population covariance matrix quantifies the error of approximation (i.e., does the model recover the covariance structure of the population):

\[ Error_\textrm{approximation} = \boldsymbol{\Sigma} - \boldsymbol{\Sigma}(\hat{\boldsymbol{\theta}}) \]

We, however, only have a sample of the population on which to test our model. Thus, our model-implied covariance matrix is based on parameters estimated in the sample, \(\hat{\boldsymbol{\theta}}\). Thus, a second source of error in SEM is not how well the model could do in the ideal case (i.e., the population), but how well we can estimate population parameters using our sample.

\[ Error_\textrm{estimation} = \mathbf{\Sigma({\boldsymbol{\theta}})} - \mathbf{\Sigma}(\hat{\boldsymbol{\theta}}) \]

In SEM, we will be minimizing the overall discrepancy by searching for parameters that maximize the sample likelihood function:

\[ \mathcal{L}(\mathbf{S_{XX}}|\hat{\boldsymbol{\theta}},\textrm{model}) \]

More descriptively, we are trying to maximize the likelihood of the observed covariance matrix given a set of parameter values \(\hat{\boldsymbol{\theta}}\) and a specific model (some variant of SEM). We’ll return to the underpinnings of maximum likelihood estimation next week, but for now, the idea is that we have a quantitative basis for ascertaining how good our parameter estimates are. If we identify better parameter estimates, the sample likelihood increases, indicating that we are getting closer to the observed covariance matrix.

2 Types of parameters

2.1 Free

Free parameters are estimated by the model according to the data. That is, they are estimated based by fitting the model to the data to achieve a criterion (e.g., OLS in regression).

2.2 Fixed

Fixed parameters are constants, often specified by you (the scientist).

2.3 Constrained

Constrained parameters specify a required relationship between two variables. For example, x = 2y. Or one could require that two factor loadings be equal. One can also specify that a parameter must fall within a certain range (e.g., -2 – 2) using inequality constraints.

2.3.1 Equality constraint

When two free parameters are constrained to be equal, then in effect we are only estimating one free parameter. For example, if we posit that the association between self anger and partner sadness during an interaction should be equal for both members of a couple (p1 and p2), then we can estimate just one parameter (\(b\) in the path model below).

grViz("digraph regression {
graph [ bgcolor=transparent]

forcelabels=true; splines=line;

node [shape = box, fontcolor=gray25 color=gray80]

node [fontname='Helvetica']
{rank=same; Anger_M [label=<Anger<sub>p1</sub>>]; node [fillcolor=gray90 style=filled] Sadness_M [label=<Sadness<sub>p1</sub>>]; }

{rank=same; Anger_F [label=<Anger<sub>p2</sub>>]; node [fillcolor=gray90 style=filled] Sadness_F [label=<Sadness<sub>p2</sub>>]; }

edge [color=gray50 style=filled]
{ rank=same; Anger_M -> Sadness_M [label='a'] }
{ rank=same; Anger_F -> Sadness_F [label='a'] }
Anger_M -> Sadness_F [label='b']
Anger_F -> Sadness_M [label='b']
}")

3 Types of variables

3.1 Endogenous

Endogenous variables have at least one cause (predictor) in the model. That is, their values depend on something that is modeled. In standard regression, we might call them dependent variables. In a path diagram, endogenous variables have at least one incoming arrow.

3.1.1 Disturbance

Each endogenous variable has a disturbance that represents unexplained variability due to unmeasured exogenous causes. As a result, they are considered latent variables. Disturbances reflect a combination of measurement error and unmeasured latent causes.

3.2 Exogenous

Exogenous variables are not caused (predicted) by other variables in the model.

4 Types of relationships among variables

4.1 Direct effect

Direct effects represent the relationship between a predictor and an outcome (endogenous variable) that are not mediated by any other variable (Bollen, 1987). This could mean either ignoring potential mediating variables (i.e., they are not included in the model) or including them. In causal mediation analysis, the direct effect represents the effect of an independent variable (e.g., treatment) on an outcome (e.g., depression) holding a candidate mediator (e.g., ruminative thoughts) constant at a level that would occur under the corresponding level of the treatment (Imai definition).

grViz("digraph regression {
graph [rankdir = LR bgcolor=transparent]

forcelabels=true;

node [shape = box, fontcolor=gray25 color=gray80]

node [fontname='Helvetica']
X;

node [fillcolor=gray90 style=filled]
Y;

edge [color=gray50 style=filled]
X -> Y [label='Direct']
}")

4.1.1 Direct effect in the presence of a mediator

grViz("digraph regression {
graph [rankdir = LR bgcolor=transparent]

forcelabels=true;

node [shape = box, fontcolor=gray25 color=gray80]

node [fontname='Helvetica']
X; M;

node [fillcolor=gray90 style=filled]
Y;

edge [color=gray50 style=filled]
X -> Y [label='Direct']
X -> M [label='Indirect (a)']
M -> Y [label='Indirect (b)']
}")

4.2 Indirect effect

Indirect effects represent the effect of a predictor on an outcome via one or more intervening (mediating) variables. Thus, the idea is that the outcome is related to a predictor because the predictor influences a mediating variable, which in turn influences the outcome.

4.2.1 Single mediator

grViz("digraph regression {
graph [rankdir = LR bgcolor=transparent]

forcelabels=true;

node [shape = box, fontcolor=gray25 color=gray80]

node [fontname='Helvetica']
X; M;

node [fillcolor=gray90 style=filled]
Y;

edge [color=gray50 style=filled]
X -> Y [label='Direct']
X -> M
M -> Y
}")

4.2.2 Multiple mediators

grViz("digraph regression {
graph [rankdir = LR, bgcolor=transparent, layout=dot]

forcelabels=true;

node [shape = box, fontcolor=gray25 color=gray80]

node [fontname='Helvetica']
X; M1; M2;

node [fillcolor=gray90 style=filled]
Y;

edge [color=gray50 style=filled]
X -> Y [xlabel='e']
X -> M1 [label='a']
M1 -> M2 [label='b']
M2 -> Y [label='c']
M1 -> Y [label='d']
}")

In a multiple mediation model, one considers the direct effect (\(e\)) of a predictor \(X\) on a criterion \(Y\) after accounting for all mediating variables (here, \(M1\) and \(M2\)).

4.2.3 Specific indirect effect

Multiple mediation also exposes the new concept of specific indirect effects, which summarize the association between a predictor, \(X\), and outcome, \(Y\), via one or more specific intervening pathways. For example, in the above model, we could measure how much of the relationship between \(X\) and \(Y\) is transmitted via the \(X \rightarrow M1 \rightarrow Y\) path, which would be \(a \cdot d\).

4.2.4 Total indirect effect

The total indirect effect is the sum of all indirect pathways between \(X\) and \(Y\) (i.e., excluding the \(X \rightarrow Y\) path).

4.2.5 Testing indirect effects

We will return to the topic of mediation and moderation later in the semester, so for now this cursory treatment will have to do. But when we think about the magnitude of an indirect effect, it is best to conceptualize it as a product of the pathways between the predictor and outcome. Thus, the specific indirect effect of \(X\) on \(Y\) via the \(X \rightarrow M1 \rightarrow M2 \rightarrow Y\) pathway is the product of the corresponding parameter estimates \(a \cdot b \cdot c\).

4.3 Total effect

The total effect of \(X\) on \(Y\) is the sum of the direct effect and the total indirect effects. Note: Thus, the total indirect effect can also be conceptualized as the total effect minus the direct effect.

4.4 Common cause

Two variables can be related to each other because they were caused by a common third variable. For example, \(Y_1\) and \(Y_2\) may be related because they were both influenced by \(X\).

grViz("digraph regression {
graph [rankdir = LR bgcolor=transparent]

forcelabels=true;

node [shape = box, fontcolor=gray25 color=gray80]

node [fontname='Helvetica']
X;

node [fillcolor=gray90 style=filled]
Y1; Y2;

edge [color=gray50 style=filled]
X -> Y1;
X -> Y2;

}")

4.5 Common outcome

As we’ll return to when discuss causality, a variable can also be a common outcome of two (potentially independent) causes. In the causality world, this is also called an inverted fork.

grViz("
digraph g {
  # a 'graph' statement
  graph [overlap = true, fontsize = 12]

  # nodes for observed and latent
  node [shape = circle, fontname = Arial, width=1, height=1]
  Flu Lyme Fever

  edge [fontname = Arial]
  Flu -> Fever
  Lyme -> Fever
}")

5 Review of SEM Notation

5.1 RAM (graphical) notation

5.2 Simplified RAM notation

For simplicity, we can remove the distinction between exogenous variances and endogenous disturbances using a single inward-pointing arrow to denote both constructs:

5.3 LISREL all-y notation

In this class, we will typically adopt the ‘all-y’ LISREL notation in which the parameter matrices do not distinguish between exogenous and endogenous variables.

  • \(\mathbf{y}_i\): A vector of observed values for the ith variable.
  • \(\boldsymbol{\eta}_i\): A vector of estimated values of the ith latent variable. These are also referred to as factor scores.
  • \(\boldsymbol{\Lambda}_y\) or \(\lambda_{1,1}\): Factor loadings of the observed variables \(\mathbf{Y}\) onto factors \(\boldsymbol{\eta}\). Also conceptualized as regression slopes (observed regressed on latent). Note that if expressed by specific subscripts (e.g., \(\lambda_{1,1}\)), the first subscript is the observed variable (\(y_{1...j}\)) and the second subscript is the factor (\(\eta_{1...k}\)). Thus, \(\mathbf{\Lambda}\) is \(j \times k\) in size.
  • \(\mathbf{\Theta_\varepsilon}\) or \(\mathbf{\varepsilon}_i\): The variance-covariance matrix for all observed variables. Also includes covariances among errors. The diagonal will contain error variances for endogenous observed variables.
  • \(\boldsymbol{\zeta}_i\): vector of disturbances in endogenous variables (i.e., residual variability due to unmeasured causes).
  • \(\boldsymbol{\Psi}\): covariance matrix among latent variables. Note that the disturbances for endogenous variables fall along the diagonal.
  • \(\mathbf{B}\): The matrix of regression coefficients.

5.4 lavaan notation for SEM matrices

The lavaan package largely uses LISREL notation in naming and structuring the underlying matrices in a SEM. More specifically, it uses the ‘all-Y’ notation in which there are not separate matrices for endogenous and exogenous variables (see Kline Appendix 10.A). Thus, for example, all structural regression coefficients are represented in the \(\mathbf{B}\) matrix, rather than separating it into exogenous (\(\mathbf{\Gamma}\)) and endogenous (\(\mathbf{B}\)) predictors. The parameter specification and notation of lavaan can be seen by examining the output of the inspect function on a fitted model object.

5.4.1 Political democracy example

Here is a model from Bollen’s 1989 SEM book (p. 332). The dataset contains various measures of political democracy and industrialization in developing countries.

  • y1: Expert ratings of the freedom of the press in 1960
  • y2: The freedom of political opposition in 1960
  • y3: The fairness of elections in 1960
  • y4: The effectiveness of the elected legislature in 1960
  • y5: Expert ratings of the freedom of the press in 1965
  • y6: The freedom of political opposition in 1965
  • y7: The fairness of elections in 1965
  • y8: The effectiveness of the elected legislature in 1965
  • x1: The gross national product (GNP) per capita in 1960
  • x2: The inanimate energy consumption per capita in 1960
  • x3: The percentage of the labor force in industry in 1960
## The industrialization and Political Democracy Example 
## Bollen (1989), page 332
model <- ' 
  # latent variable definitions
     ind60 =~ x1 + x2 + x3
     dem60 =~ y1 + a*y2 + b*y3 + c*y4
     dem65 =~ y5 + a*y6 + b*y7 + c*y8

  # regressions
    dem60 ~ ind60
    dem65 ~ ind60 + dem60

  # residual correlations
    y1 ~~ y5
    y2 ~~ y4 + y6
    y3 ~~ y7
    y4 ~~ y8
    y6 ~~ y8

#regression of latent vars on observed
ind60 ~ y4
'

fit <- sem(model, data=PoliticalDemocracy)
#summary(fit, fit.measures=TRUE)
semPaths(fit)

inspect(fit)
## 
## Note: model contains equality constraints:
## 
##   lhs op rhs
## 1   3 ==   6
## 2   4 ==   7
## 3   5 ==   8
## 
## $lambda
##    ind60 dem60 dem65 y2 y4 y6 y8
## x1     0     0     0  0  0  0  0
## x2     1     0     0  0  0  0  0
## x3     2     0     0  0  0  0  0
## y1     0     0     0  0  0  0  0
## y2     0     0     0  0  0  0  0
## y3     0     4     0  0  0  0  0
## y4     0     0     0  0  0  0  0
## y5     0     0     0  0  0  0  0
## y6     0     0     0  0  0  0  0
## y7     0     0     7  0  0  0  0
## y8     0     0     0  0  0  0  0
## 
## $theta
##    x1 x2 x3 y1 y2 y3 y4 y5 y6 y7 y8
## x1 19                              
## x2  0 20                           
## x3  0  0 21                        
## y1  0  0  0 22                     
## y2  0  0  0  0  0                  
## y3  0  0  0  0  0 24               
## y4  0  0  0  0  0  0  0            
## y5  0  0  0 12  0  0  0 26         
## y6  0  0  0  0  0  0  0  0  0      
## y7  0  0  0  0  0 15  0  0  0 28   
## y8  0  0  0  0  0  0  0  0  0  0  0
## 
## $psi
##       ind60 dem60 dem65 y2 y4 y6 y8
## ind60 30                           
## dem60  0    31                     
## dem65  0     0    32               
## y2     0     0     0    23         
## y4     0     0     0    13 25      
## y6     0     0     0    14  0 27   
## y8     0     0     0     0 16 17 29
## 
## $beta
##       ind60 dem60 dem65 y2 y4 y6 y8
## ind60     0     0     0  0 18  0  0
## dem60     9     0     0  0  0  0  0
## dem65    10    11     0  0  0  0  0
## y2        0     3     0  0  0  0  0
## y4        0     5     0  0  0  0  0
## y6        0     0     6  0  0  0  0
## y8        0     0     8  0  0  0  0

5.4.2 Common factor model

These parameter matrices can be used to express specific variants of SEM in matrix form. For example, the common factor model is a form of linear regression in which observed variables \(\mathbf{Y}\) are regressed on latent factors \(\boldsymbol{\eta}\) with factor loadings \(\mathbf{\Lambda}_y\) quantifying the magnitude of the relationship:

\[ \mathbf{y}_i = \boldsymbol{\nu} + \boldsymbol{\Lambda}_{y} \boldsymbol{\eta}_i + \boldsymbol{\Gamma}_y \mathbf{x}_i + \boldsymbol{\varepsilon}_i \]

6 Steps in developing and testing SEMs

The typical set of steps in developing and testing SEMs is provided in the figure below:

7 Model identification

7.1 Number of observations

The number of ‘observations’ in SEM refers to the number of unique values in the covariance matrix. We’ve seen this equation before:

\[ p = \frac{k(k+1)}{2} \]

where \(p\) is the number of variance-covariance parameters and \(k\) is the number of variables. The number of observations provides crucial information about how complex a SEM we can test given the number of observed variables.

The number of free parameters in a SEM is denoted \(q\), and the degrees of freedom for the overall model is the difference between \(p\) and \(q\):

\[ df_\textrm{M} = p - q \]

Thus, as we’ve discussed, the goal is to develop a parsimonious model with fewer free parameters than observations (covariance values). If two models fit the data equally well, the model with more degrees of freedom should be preferred as more parsimonious. We will return to this issue when considering relative model evidence.

7.2 Overidentified

A model with \(df_\textrm{M} > 0\) is called overidentified, meaning that there are more equations than unknowns. Thus, there are many parameter sets one could test, and all will be imperfect. Given a model parameterization (i.e., what matrices are included and what the free parameters are), the goal is to identify parameters that best fit the data. In an overidentified model, the idea is to adjudicate among parameter values based on a fit criterion. In SEM, this is usually the sample likelihood function, as mentioned above.

7.3 Underidentified

A model with \(df_\textrm{M} < 0\) is called underidentified, meaning that there are too man unknown parameters to solve for and no way to solve the equations.

\[ a + b = 6 \]

There are an infinite number of values for \(a\) and \(b\) that would solve this equation.

7.4 Just-identified

A model with \(df_\textrm{M} = 0\) is called just-identified, and by definition, it will fit the data perfectly. The idea is that there is one set of parameters in a just-identified model that reproduces the covariance matrix. But there are many alternative just-identified models one could test in a given dataset, but all fit the data equally well, making hypothesis testing difficult.

7.5 Aside: Nonrecursive Models

From Michael Clark

Recursive models have all unidirectional causal effects and disturbances are not correlated. A model is considered nonrecursive if there is a reciprocal relationship, feedback loop, or correlated disturbance in the model. Nonrecursive models are potentially problematic when there is not enough information to estimate the model (unidentified model), which is a common concern with them.

A classic example of a nonrecursive relationship is marital satisfaction: the more satisfied one partner is, the more satisfied the other, and vice versa (at least that’s the theory). This can be represented by a simple model (below).

Such models are notoriously difficult to specify in terms of identification, which we will talk more about later. For now, we can simply say the above model would not even be estimated as there are more parameters to estimate (two paths, two variances) than there is information in the data (two variances and one covariance).

8 Path analysis

8.1 Notation

Path analysis is largely an extension of linear regression with observed variables to the multivariate case. That is, in path analysis, there are typically multiple predictors and multiple outcome variables. Furthermore, chained relationships (e.g., \(X \rightarrow M \rightarrow Y\)) lead to the possibility of indirect effects and the potential for a given variable to be endogenous (i.e., its values depend on other variables in the model) and to predict another endogenous variable.

Conceptually, we can think of path analysis in terms of a set of regression equations (Kline Appendix 6.A):

\[ \begin{align} Y_1 &= \gamma_{11} X_1 + \beta_{12} Y_2 + \zeta_1 \\ Y_2 &= \gamma_{22} X_2 + \beta_{12} Y_1 + \zeta_2 \end{align} \] We could re-express this in matrix form as:

\[ \mathbf{Y} = \mathbf{\Gamma X} + \mathbf{B Y} + \boldsymbol{\zeta} \]

In addition, there may be associations among exogenous variables (in \(\boldsymbol{\Phi}\)) or endogenous variables (in \(\boldsymbol{\Psi}\)).

\(\boldsymbol{\Phi}\) contains correlations among measured exogenous variables.

\[ \boldsymbol{\Phi}=\begin{bmatrix} \phi_{11} & \\ \phi_{21} & \phi_{22} \end{bmatrix} \]

\(\boldsymbol{\Psi}\) contains correlations among the disturbances for endogenous variables.

\[ \boldsymbol{\Psi}=\begin{bmatrix} \psi_{11} & \\ \psi_{21} & \psi_{22} \end{bmatrix} \]

Recall that in LISREL all-y notation, we eliminate the need to distinguish between exogenous and endogenous variables in the formal matrix representation, though these are nevertheless very important conceptual distinctions.

\[ \boldsymbol{\Phi}=\begin{bmatrix} \phi_{11} & \\ \phi_{21} & \phi_{22} \end{bmatrix} \]

9 Tracing rule

Adapted from Michael Clark

In a recursive model, implied correlations between two variables, \(A\) and \(B\), can be found using tracing rules. The bivariate correlation between two variables in a path model is equal to the sum of the product of all standardized coefficients (i.e., variances equal to 1) for the paths between them. Specifically, one traces all valid routes between \(A\) and \(B\) that do not a) enter the same variable twice, and b) enter a variable through an arrowhead and leave through an arrowhead. If unstandardized coefficients are used, the product of tracings needs to be multiplied by the corresponding variance estimates for the variables.

Consider a set of correlated variables, \(A\), \(B\), and \(C\) with a model seen in the below diagram. We are interested in identifying the implied correlation between \(A\) and \(C\) by decomposing the relationship into its different components and using tracing rules.

vars n mean sd median trimmed mad min max range skew kurtosis se
A 1 1000 0 1 0.007 0.006 0.968 -3.53 3.30 6.84 -0.071 0.296 0.032
B 2 1000 0 1 0.004 -0.007 1.013 -2.88 3.27 6.16 0.076 -0.078 0.032
C 3 1000 0 1 -0.003 0.006 0.962 -3.08 2.66 5.74 -0.081 -0.071 0.032
cor(abc)
##     A   B   C
## A 1.0 0.2 0.3
## B 0.2 1.0 0.7
## C 0.3 0.7 1.0
model = "
  C ~ B + A
  B ~ A
"

pathMod = sem(model, data=abc)
coef(pathMod)
##   C~B   C~A   B~A  C~~C  B~~B 
## 0.667 0.167 0.200 0.483 0.959

To reproduce the correlation between A and C (sometimes referred to as a ‘total effect’):

In SEM models, it’s important to consider how well our model-implied correlations correspond to the actual observed correlations. For over-identified models, the correlations will not be reproduced exactly, and thus can serve as a measure of how well our model fits the data.

In addition, the tracing of paths is important in understanding the structural causal models approach of Judea Pearl, of which SEM and the potential outcomes framework are a part.