1 Brief intro to SEM

Structural equation modeling (SEM) can be a very useful tool in determining relationships between variables. Often SEM is used in a “confirmatory” manner, when determining whether a certain model is valid (i.e., comparing the goodness-of-fit of nested models). We can even extend SEM to study interactions across groups. SEM is sometimes referred to as causal modeling, path analysis (with latent variables), or covariances structure analysis. It subsumes a bunch of other techniques, like multiple regression, confirmatory factor analysis, ANOVA, etc.

You supply the observed relationship between variables (i.e., the covariance or correlation matrix), the # of observations, and a formal model specificiation, and SEM basically estimates parameters that will give you the “best” reproduction of the covariance matrix. The better your model fit, the better your reproduction of the covariance matrix (hence, lower chi-squared = better model)!

1.1 Mediation

For more information on how to conduct classic mediation with lavaan, check out the tutorial here.

1.2 Latent variables

Often we are interested in investigating latent, abstract variables (like “intelligence”) by obtaining multiple observable measures (e.g., high school GPA, SAT and ACT scores). Using SEM we can easily include latent variables!

1.3 Sample size


SEM necessitates large sample sizes! In the literature, sample sizes commonly run 200 - 400 for models with 10 - 15 indicators. One survey of 72 SEM studies found the median sample size was 198. A sample of 150 is considered too small unless the covariance coefficients are relatively large. With over ten variables, sample size under 200 generally means parameter estimates are unstable and significance tests lack power.

One rule of thumb found in the literature is that sample size should be at least 50 more than 8 times the number of variables in the model. Mitchell (1993) advances the rule of thumb that there be 10 to 20 times as many cases as variables. Another rule of thumb, based on Stevens (1996), is to have at least 15 cases per measured variable or indicator. The researcher should go beyond these minimum sample size recommendations particularly when data are non-normal (skewed, kurtotic) or incomplete. Note also that to compute the asymptotic covariance matrix, one needs \(\frac{k(k+1)}{2}\) observations, where \(k\) is the number of variables.

1.4 Lavaan in action

Kievit, R. A., Davis, S. W., Mitchell, D. J., Taylor, J. R., Duncan, J., Henson, R. N., & Cam-CAN Research team. (2014). Distinct aspects of frontal lobe structure mediate age-related differences in fluid intelligence and multitasking. Nature communications, 5.