Note that this Rmd file depends on having Mplus installed. You can obtain a fully functioning student version for $350 here: https://www.statmodel.com/orderonline/categories.php?category=Mplus-Software/Student-Pricing.

Or you can use the demo version for free here: https://www.statmodel.com/demo.shtml

The demo version has the following constraints:

  1. Maximum number of dependent variables: 6
  2. Maximum number of independent variables: 2
  3. Maximum number of between variables in two-level analysis: 2
  4. Maximum number of continuous latent variables in time series analysis: 2

At this time, Yves Rosseel, the main developer of lavaan, has a prototype of multilevel SEM working for the package, but this has not been released to the general public. I suspect that this will be released in the next 6 months and should provide the functionality to run all of the examples in this RMarkdown file. You can see some of the basic setup here: http://users.ugent.be/~yrosseel/lavaan/tubingen2017/

In the meantime, Mplus is probably the most user-friendly program for multilevel SEM, though there is similar functionality in EQS and LISREL. In addition, the OpenMx package in R is free and supports multilevel analyses, but requires a substantially different approach to syntax and specification.

This RMarkdown also leverages a package called MplusAutomation, which allows for effective cross-talk between Mplus and R. In particular, Mplus outputs all information into a text-based file, which makes it difficult to compare outputs across models or to extract specific subsections. If you’re interested in how this package works, please see Hallquist and Wiley (in press), Structural Equation Modeling.

1 What is multilevel analysis?

In the standard general linear model (GLM), we assume that observations are independent of each other. In particular, we assume that the residuals are distributed iid normal:

\[ \begin{align*} Y &= B_0 + B_1 X_1 + B_2 X_2 + ... B_p X_p + \varepsilon \\ \varepsilon &\sim \mathcal{N}(0, \sigma^2) \end{align*} \]

The assumption of independence is violated when:

  1. There are multiple observations per person (e.g., longitudinal data).
  2. There are known groups in the data that introduce similarity among group members.

Examples of multilevel/clustered data:

  1. Trials of an experimental task nested within individual (e.g., reaction times or accuracy data)
  2. A set of neurons (e.g., a population) measured in each animal, but with the goal of describing the pattern in a group of animals
  3. Longitudinal measurements of questionnaire or interview-based data
  4. Students within classrooms within schools (a so-called three-level model)
  5. Patients within therapy groups

In each case, the goal of multilevel analyses, including multilevel SEM, is to account for both within-cluster and between-cluster variability explicitly. If we were to analyze the data ignoring the group structure, we would overestimate the degrees of freedom (since our observations are non-independent). And we could not separate within- versus betewen-cluster effects; instead, our parameters would reflect an unknown mixture of the two.

1.1 L1 versus L2 effects

Multilevel analysis also allows the researcher to ask specific questions at each level of the model. Let’s consider the case of students (L1) nested within schools (L2). More specifically, let’s say we’re interested in what student- and school-level factors influence test performance. At the school level (aka L2), we hypothesize that the student:teacher ratio influences reading performance such that more students per teacher is associated with poorer performance. By definition, the student:teacher ratio is only meaningful as a school-level variable (setting aside the possibility of classroom variation) and has no variation at the person level. This might look like:

tt <- dplyr::tribble(
~student, ~school, ~stratio, ~readtest,
1, "Radio Park", 20, 15,
2, "Radio Park", 20, 12,
3, "Radio Park", 20, 19,
4, "Corl Street", 16, 24,
5, "Corl Street", 16, 27,
6, "Corl Street", 16, 21
)

kable(tt)
student school stratio readtest
1 Radio Park 20 15
2 Radio Park 20 12
3 Radio Park 20 19
4 Corl Street 16 24
5 Corl Street 16 27
6 Corl Street 16 21

Notice that there is no student-specific variation in the stratio. This is the simplest case for a predictor because there is no concern that it could represent a mixture of student and school effects. That is, stratio cannot predict any L1 variance.

We wish to represent the reading test performance for person i in school j as a function of reading speed. Some notation:

\[ \begin{align*} \textrm{readtest}_{ij} &= \beta_{0j} + r_{ij} \\ \beta_{0j} &= \gamma_{00} + \gamma_{01} \textrm{stratio}_j + \mu_{0j} \\ \end{align*} \] In this notation, \(\gamma_{01}\) represents the school-level effect of student:teacher ratio on test performance. We also allow for there to be normally distributed school-level variation in test performance, \(\mu_{0j}\) around the sample average \(\gamma_{00}\). To keep the \(\beta_{0j}\) equation easy to interpret, L2 predictors are typically centered around the grand mean. This ensures that \(\gamma_{00}\) retains the interpretation of the sample average test performance (fixed effect), and \(\mu_{0j}\) is school variation around this mean (random effect).

In addition, we might ask: what student characteristics enhance performance on standardized tests? These would be considered L1 predictors because they vary at the level of the student.

For example, students who read more words per minute may perform better on a standardized reading comprehension exam.

tt <- tt %>% add_column(readpermin=c(20, 14, 18, 25, 35, 30), .before="school")
kable(tt)
student readpermin school stratio readtest
1 20 Radio Park 20 15
2 14 Radio Park 20 12
3 18 Radio Park 20 19
4 25 Corl Street 16 24
5 35 Corl Street 16 27
6 30 Corl Street 16 21

1.2 Disaggregating within- versus between-cluster effects

Importantly, the association between an L1 predictors and the outcome could reflect student and/or school variability. Perhaps students read more quickly because of an individual-level (aka L1) factor such as gray matter volume. Or might some schools have policies that promote better reading such that the average reading speed varies by school (i.e., at L2).

Because reading speed is an L1 variable (i.e., individual/student), it may have systematic variation at both the individual (L1) and school (L2) level. In conventional multilevel modeling (MLM), one can disentangle the variance at each level using appropriate centering (Curran & Bauer, 2011). In particular, the usual approach is to center the predictor (reading speed) around the school mean such that we now have a variable that represents deviation in reading speed relative to the school average. Then, we also compute the school means and include those as a predictor at L2. In this way, the effect of per-school reading speed on test performance can be examined specifically. Likewise, individual variation in reading speed not explainable by the school average can also be examined.

More notation from the MLM world.

\[ \begin{align*} \textrm{readtest}_{ij} &= \beta_{0j} + \beta_{1j} \textrm{readspeed}_{ij} + r_{ij} \\ \beta_{0j} &= \gamma_{00} + \gamma_{01} \textrm{stratio}_j + \mu_{0j} \\ \beta_{1j} &= \gamma_{10} + \mu_{1j} \end{align*} \]

The major difficulty of this representation is that \(\textrm{readspeed}_{ij}\) may have variation at the person and school levels. Thus, the main coefficient of interest for this predictor, \(\gamma_{10}\), (fixed effect) is an unknown combination of indiviual and school effects. Here, we also allow for the association of reading speed with reading test performance to vary randomly by school, \(\mu_{1j}\).

As mentioned above, to handle the problem of \(\textrm{readspeed}\) potentially having variation at L1 and L2, we apply the L1 centering + L2 means approach.

The notation for the school-centered reading speed is:

\[ \dot{\textrm{readspeed}_{ij}} = \textrm{readspeed}_{ij} - \bar{\textrm{readspeed}_{j}} \]

That is, we subtract the school mean reading speed, \(\bar{\textrm{readspeed}_{j}}\), from each student in that school.

We then introduce both of these new variables into the model:

\[ \begin{align*} \textrm{readtest}_{ij} &= \beta_{0j} + \beta_{1j} \dot{\textrm{readspeed}_{ij}} + r_{ij} \\ \beta_{0j} &= \gamma_{00} + \gamma_{01} \textrm{stratio}_j + \gamma_{02} \bar{\textrm{readspeed}_{j}} + \mu_{0j} \\ \beta_{1j} &= \gamma_{10} + \mu_{1j} \end{align*} \]

Thus, the school-average reading speed predicts school-level variation in test performance, whereas school-deviated individual variation predicts the association of individual reading speed on performance.

This is perhaps more evident in the combined equation:

\[ \textrm{readtest}_{ij} = ( \gamma_{00} + \gamma_{01} \textrm{stratio}_j + \gamma_{02} \bar{\textrm{readspeed}_{j}} + \gamma_{10} \dot{\textrm{readspeed}_{ij}} ) + \\ ( r_{ij} + \mu_{0j} + \mu_{1j} \dot{\textrm{readspeed}_{ij}} ) \] Note how \(\textrm{readspeed}\) enters the model in two ways (individual and school level). As in previous examples, the fixed effects are denoted in the first set of parentheses, and the random effects in the second.

2 Multilevel SEM (MSEM) overview

Okay, so we’ve got our head around standard multilevel models. Where does SEM enter the picture? In multilevel SEM, we use a latent variable approach to parcellate variation between and within clusters, rather than applying a cluster-based centering approach. This is the intuition: