Note: This is just a copy of my course notes from last year. Given that snow days prevented covering the topic in more depth, I opted to keep my notes unchanged, rather than trying to compress the material, which would be tricky.

This document was adapted in part from Michael Clark’s SEM course: http://m-clark.github.io/docs/sem/latent-growth-curves.html

I’m still in the midst of adapting this for our purposes, so for now, there is more overlap (and outright copying) with the Clark course than will be the case in the near future.

1 Latent Curve Modeling (LCM)

Standard latent curve models (LCMs) instantiate the same representation of longitudinal data as multilevel models (aka hierarchical linear modeling). Both models seek to represent initial standing and rate of change in a process measured over time. Moreover, they typically quantify interindividual variation in growth parameters. For example, perhaps some individuals start out with greater motivation to influence climate policy, but their motivation diminishes quite rapidly. Others, however, might have low motivation that increases over time as they learn more about climate change.

1.1 Fixed versus random effects

Fixed effects in SEM and multilevel models represent the point estimate of a relationship intended to capture the overall association in the sample. For example, how much does cognitive functioning decline per year of age? If we say a person’s IQ on the Wechsler Adult Intelligence Test declines 0.26 points per year in a regression model, this reflects a fixed effect estimate of the average rate of change. However, perhaps some individuals experience slower or more rapid changes. Thus, the value of -0.26 may not capture each person, but instead represents the average (expected) effect.

df <- data.frame(rate=rnorm(1000, -0.26, 0.02))
ggplot(df, aes(x=rate)) + geom_histogram(bins=20, color="blue", fill="grey80") + theme_bw(base_size=15) +
  xlab("IQ change per year") + geom_vline(xintercept = mean(df$rate), size=1.5) + 
  geom_label(aes(x=mean(df$rate - 0.015), y=155, label="Point estimate"), fill="white") +
  geom_segment(x=mean(df$rate) + sd(df$rate), xend=mean(df$rate) - sd(df$rate), y=3, yend=3, color="red", size=1.5) + 
  geom_label(aes(x=mean(df$rate - 0.015), y=18, label="Between-person\nvariation"), fill="white")

1.2 Random effects

Data is often clustered, e.g. students within schools or individuals measured over time. The standard linear model assumes independent observations. Recall the so-called ‘iid’ (independent and identically distributed) assumption in the standard GLM:

\[ \begin{align*} Y &= B_0 + B_1 X_1 + \varepsilon \\ \varepsilon &\sim \mathcal{N}(0, \sigma^2) \end{align*} \]

However, when there are multiple observations per person, or similarities within a cluster (e.g., employees in organizations), this violates the independence assumption. One popular way to deal with these challenges is a class of models called mixed-effects models, or simply mixed models. They are mixed because there is a combination of fixed effects and random effects.

The random effects allow each cluster to have its own unique estimate of a parameter in addition to the overall fixed effect. In growth models, we typically allow person-specific random deviations in the intercept and slope parameters that are normally distributed around the overall intercept and slope (i.e., the fixed effects). Mixed models are a balanced approach between ignoring these unique contributions, and over-contextualizing by running separate regressions for every cluster.

Thus, the idea is that rate of change in IQ due to aging may differ between persons, and this interindividual variation is often of great interest in longitudinal analyses. What variables predict higher versus lower rates of change? This would have policy implications for preventing cognitive aging, for example.

2 LCM formality

We will only discuss clustering in the context of longitudinal data (as opposed to students in classrooms or employees within organizations). The unconditional LCM represents the outcome for the ith person at the tth time (occasion) as:

\[ y_{it} = \alpha_i + \lambda_t \beta_i + \varepsilon_{it} \]

Thus, the model represents a person-specific intercept (\(\alpha_i\)) and slope (\(\beta_i\)) according to the following equations (note that this mirrors multilevel models):

\[ \begin{align*} \alpha_i &= \mu_\alpha + \zeta_{\alpha i} \\ \beta_i &= \mu_\beta + \zeta_{\beta i} \end{align*} \]

Here, \(\mu_\alpha\) is the mean intercept (starting point) in the sample (i.e., the point estimate or fixed effect) and \(\mu_\beta\) is the mean rate of change. The deviations from these point estimates (\(\zeta_{\alpha i}\) and \(\zeta_{\beta i}\)) are assumed to reflect a disturbance that captures person-specific variation relative to the sample. The unconditional LCM also assumes:

\[ \begin{bmatrix} \zeta_{\alpha i} \\ \zeta_{\beta i} \end{bmatrix} \sim \mathcal{N} \left( \begin{bmatrix} 0 \\ 0 \end{bmatrix}, \begin{bmatrix} \psi_{\alpha \alpha} & \\ \psi_{\alpha \beta} & \psi_{\beta \beta} \end{bmatrix} \right) \] That is, the subject-specific deviations (disturbances) are normally distributed with a zero mean and variances for intercepts and slopes. We also permit there to be covariance between the intercepts and slopes.

Note that if there is positive covariance \(\psi_{\alpha \beta}\), then individuals with higher intercepts change more, whereas those with lower intercepts change less. This typically yields a ‘fanning out pattern’ in the trajectories (ala multifinality). By contrast, negative intercept-slope covariance usually reflects a ‘fanning in’ pattern (ala equifinality).

2.1 Combined expression

We can put the equations for the trajectory equation (‘L1’ in MLM parlance) with the cluster-level model (‘L2’ in MLM parlance) to understand the combined (aka ‘reduced form’) expression of the LCM:

\[ y_{it} = (\mu_\alpha + \lambda_t \mu_\beta) + (\zeta_{\alpha i} + \lambda_t \zeta_{\beta i} + \varepsilon_{it}) \]

This expression puts the fixed effects (point estimates in the sample) in the first parentheses and the random effects (subject or observation-specific deviations) in the second parentheses.

Note that we assume that the disturbance of \(y_{it}\) (i.e., residual variation) is uncorrelated with the disturbances in the intercepts and slopes for person i:

\[ \begin{align*} \textrm{COV}(\varepsilon_{it}, \zeta_{\alpha i}) = 0 \\ \textrm{COV}(\varepsilon_{it}, \zeta_{\beta i}) = 0 \end{align*} \]

Recall our unconditional LCM:

Thus, the point estimates for the intercepts and slopes reflect the means of the factors. And the subject-specific deviations around the point estimates are the variances of the factors. In this model, our \(\Psi\) matrix is therefore:

\[ \boldsymbol{\Psi} = \begin{bmatrix} \psi_{\alpha \alpha} & \\ \psi_{\alpha \beta} & \psi_{\beta \beta} \end{bmatrix} \]

and our \(\boldsymbol{\alpha}\) vector (means of factors) is:

\[ \boldsymbol{\alpha} = [ \mu_\alpha , \mu_\beta ] \]

Also notice that we do not have paths connecting the \(\varepsilon\) disturbances (in \(boldsymbol{\Theta}\)) with the variances or covariances of the factors in \(\boldsymbol{\Psi}\). This is consistent with the idea that measurement errors are independent of the disturbances/variances of the growth factors.

3 Understanding LCMs by simulating data

One of the best ways to understand how mixed models and LCMs work is to simulate data from a model containing both fixed and random effects for growth parameters. Recall that the simple unconditional LCM contains parameters that represent starting point (aka intercept) and rate of change (aka slope).

The LCM also asserts that interindividual variation in these parameters is normally distributed around the sample average trajectory.

Here, we will simulate balanced data (all subjects are observed at all waves) of four time-points for 500 individuals (subjects). We will only investigate the trend (‘growth’), and allow subject-specific intercepts and slopes.

set.seed(1234)
n = 500 #500 subjects
timepoints = 4
time = rep(0:3, times=n)
subject = rep(1:n, each=4)

The sample average trajectory, i.e., the ‘fixed’ effects, will be \(\mu_\alpha = .5\) and \(\mu_\beta = .25\). This means our \(\alpha\) vector, which contains the mean structure of latent factors in SEM is:

\[ \boldsymbol{\alpha} = \begin{bmatrix} .5 & .25 \end{bmatrix} \]

We will model a weak correlation between subject-specific intercepts and slopes \(\psi_{\alpha \beta} = .2\) and unit variances for the individual differences in intercept and slope, \(\psi_{\alpha \alpha} = 1\), \(\psi_{\beta \beta} = 1\). Thus our variance/covariance matrix for the growth factors is:

\[ \boldsymbol{\Psi}= \begin{bmatrix} 1 & \\ .2 & 1 \end{bmatrix} \]

First, we will simulate the between-person (interindividual) variation in the growth parameters. That is, we simulate only the individual change component of the growth trajectory. This is person-specific variation around the sample average trajectory (capturing normative change). Remember that in SEM terms, these are the \(\zeta_{\alpha i}\) and \(\zeta_{\alpha b}\) parameters.

We will simulate the random effects (\(\zeta\)) from a multivariate normal distribution using the \(\boldsymbol{\Psi}\) matrix, consistent with the LCM and mixed effects representation.

randomEffectsCorr = matrix(c(1,.2,.2,1), ncol=2) 
randomEffects = MASS::mvrnorm(n, mu=c(0,0), Sigma = randomEffectsCorr, empirical=T) %>% data.frame()
colnames(randomEffects) = c('Int', 'Slope')

Let’s take a look at the data thus far. These are just the person-specific deviations around the mean trajectory, but we haven’t included the mean trajectory yet. Note that we have 4 rows per subject and a time variable. This ‘long’ format is typical in MLMs, but not LCMs. The Int column captures person-specific deviation in the starting point. Thus, positive values indicate a starting point above the sample mean, whereas negative values indicate a starting point below the sample mean.

To simulate an observation of a process (e.g., happiness) for a given indiviudal \(i\) and timepoint \(t\), we simply add the random effects for the intercept \(\zeta_{\alpha i}\) to the overall intercept \(\mu_\alpha\), and likewise for the slopes \(\mu_\beta + \zeta_{\beta i}\). We then add the so-called ‘level 1’ error to a given observation \(y_{it}\) with standard deviation equal to \(\sigma\). Note that these ‘level 1’ errors represent time-specific error: unexplained variance in \(y_{it}\) after considering the person-specific intercept \(\alpha_i\) and slope \(\beta_i\). In SEM terms, we typically refer to these as error variances, measurement error, or residual variances of the indicators. In the LCM, as in other SEMs, these are represented in the \(\boldsymbol{\Theta}\) matrix.

#these are the fixed effects
intercept = .5 # mu alpha
slope = .25 # mu beta

sigma = .5 # standard deviation of residual variances for indicators (time-specific error)

outcome = (intercept + randomEffects$Int[subject]) + # fixed intercept + (random) subject-specific intercept deviation
  (slope + randomEffects$Slope[subject])*time + # fixed slope + (random) subject-specific slope deviation
  rnorm(n*timepoints, mean=0, sd=sigma) #level-1 error aka measurement error

d = data.frame(subject, time, outcome)

Here are a few values of the outcome for subjects across time:

3.1 Fitting growth curves using a multilevel model

Let’s estimate this as a mixed model first1 using the lme4 package. We are trying to match the true parameters from our simulation to the estimated parameters in the output. In the MLM, we allow for fixed effects for the intercept and slope (\(\mu_\alpha\) and \(\mu_\beta\) in LCM notation). These capture the sample-average trajectory (normative change). We also allow for random effects for the intercept and slope (\(\psi_{\alpha \alpha}\) and \(\psi_{\beta \beta}\)), which capture interindividual variation in the intercept and slope.

#In the model formula, fixed effects are not in parentheses, whereas random effects are in parentheses
mixedModel = lmer(outcome ~ 1 + time + (1 + time|subject), data=d)  # 1 represents the intercept
summary(mixedModel, correlation=F)
## Linear mixed model fit by REML ['lmerMod']
## Formula: outcome ~ 1 + time + (1 + time | subject)
##    Data: d
## 
## REML criterion at convergence: 5833
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.3621 -0.4828  0.0205  0.4751  2.8452 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  subject  (Intercept) 1.000    1.000        
##           time        0.990    0.995    0.21
##  Residual             0.238    0.488        
## Number of obs: 2000, groups:  subject, 500
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)   0.4899     0.0483    10.1
## time          0.2640     0.0456     5.8

Our fixed effects are at the values we set for the overall intercept and slope – approximately 0.5 and 0.2, respectively. The estimated random effects variances are at 1, the correlation near .2, and finally, our residual standard deviation is near the .5 value we set.

3.2 Analyzing the same data with an LCM

To analyze these data in an LCM, we will fit latent factors for intercept and slope, but with a slightly odd syntax in which factor loadings are fixed to represent a functional form of change (linear), but are not estimated.

Specifically, we’ll have a latent variable representing the random intercepts, as well as one representing the random slopes. All loadings for the intercept factor are 12. The loadings for the effect of time are arbitrary, but should accurately reflect the time spacing, and typically it is good to start at zero, so that the zero has a meaningful interpretation.

For the LCM, our data needs to be in wide format, where each row represents a person and we have separate columns for each time point of the target variable, as opposed to the long format we used in the previous mixed model. We can use the spread function from tidyr to help with that.

subject time outcome
1 0 -1.580
1 1 -0.123
1 2 -0.340
1 3 1.451
2 0 1.490
2 1 -0.516
# also change the names, as usually things don't work well if they start with a number.

dWide = d %>%  
  spread(time, outcome) %>% 
  rename_at(vars(-subject), function(x) paste0('outcome', x))
head(dWide)
subject outcome0 outcome1 outcome2 outcome3
1 -1.580 -0.123 -0.340 1.45
2 1.490 -0.516 0.203 -1.08
3 0.937 2.884 4.689 6.22
4 -2.298 -2.673 -3.137 -3.93
5 0.085 1.234 4.363 6.01
6 1.053 0.902 1.546 1.52

Note that lavaan has a specific function, growth, to use for these models. The model syntax is the same as with sem, but growth 1) enables mean structure (meanstructure=TRUE), 2) fixes the intercepts of the observed variables at zero, and 3) frees the intercepts/means of the latent factors.

model = "
    # intercept and slope with fixed coefficients
    i =~ 1*outcome0 + 1*outcome1 + 1*outcome2 + 1*outcome3
    s =~ 0*outcome0 + 1*outcome1 + 2*outcome2 + 3*outcome3
"

growthCurveModel = growth(model, data=dWide)
summary(growthCurveModel)
## lavaan 0.6-3 ended normally after 42 iterations
## 
##   Optimization method                           NLMINB
##   Number of free parameters                          9
## 
##   Number of observations                           500
## 
##   Estimator                                         ML
##   Model Fit Test Statistic                      10.616
##   Degrees of freedom                                 5
##   P-value (Chi-square)                           0.060
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Information saturated (h1) model          Structured
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   i =~                                                
##     outcome0          1.000                           
##     outcome1          1.000                           
##     outcome2          1.000                           
##     outcome3          1.000                           
##   s =~                                                
##     outcome0          0.000                           
##     outcome1          1.000                           
##     outcome2          2.000                           
##     outcome3          3.000                           
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   i ~~                                                
##     s                 0.226    0.050    4.512    0.000
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .outcome0          0.000                           
##    .outcome1          0.000                           
##    .outcome2          0.000                           
##    .outcome3          0.000                           
##     i                 0.487    0.048   10.072    0.000
##     s                 0.267    0.045    5.884    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .outcome0          0.287    0.041    6.924    0.000
##    .outcome1          0.219    0.021   10.501    0.000
##    .outcome2          0.185    0.027    6.748    0.000
##    .outcome3          0.357    0.065    5.485    0.000
##     i                 0.977    0.076   12.882    0.000
##     s                 0.969    0.065   14.841    0.000

Most of the output is not of interest since it reflects the peculiar fixed factor loadings of the LCM, but we do get the same five parameter values we are interested in.

Start with the ‘intercepts’:

Intercepts:
                   Estimate  Std.Err  Z-value  P(>|z|)
                       
    i                 0.487    0.048   10.072    0.000
    s                 0.267    0.045    5.884    0.000

It might be odd to call your fixed effects ‘intercepts’, but it makes sense if we are thinking of it as a multilevel model as depicted previously, where we actually broke out the random effects as a separate model. The estimates here are pretty much spot on with our mixed model estimates, which are identical to just the standard regression estimates.

fixef(mixedModel)
## (Intercept)        time 
##       0.490       0.264
lm(outcome ~ time, data=d)
## 
## Call:
## lm(formula = outcome ~ time, data = d)
## 
## Coefficients:
## (Intercept)         time  
##       0.490        0.264

Now let’s look at the variance estimates. The estimation of residual variance for each y in the LCM distinguishes the two approaches, but not necessarily so. We could fix them to be identical here, or conversely allow them to be estimated in the mixed model framework. Just know that’s why the results are not identical (to go along with their respective estimation approaches, which are also different by default). Again though, the variances are near one, and the correlation between the intercepts and slopes is around the .2 value.

Covariances:
                   Estimate  Std.Err  Z-value  P(>|z|)
  i ~~                                                
    s                 0.226    0.050    4.512    0.000
Variances:
                   Estimate  Std.Err  Z-value  P(>|z|)
    outcome0          0.287    0.041    6.924    0.000
    outcome1          0.219    0.021   10.501    0.000
    outcome2          0.185    0.027    6.748    0.000
    outcome3          0.357    0.065    5.485    0.000
    i                 0.977    0.076   12.882    0.000
    s                 0.969    0.065   14.841    0.000
VarCorr(mixedModel)
##  Groups   Name        Std.Dev. Corr
##  subject  (Intercept) 1.000        
##           time        0.995    0.21
##  Residual             0.488

The differences provide some insight. LCM by default assumes heterogeneous variance for each time point. Mixed models by default assume the same variance for each time point, but can allow them to be estimated separately in most modeling packages.

As an example, if we fix the variances to be equal, the models are now identical.

model = "
    # intercept and slope with fixed coefficients
    i =~ 1*outcome0 + 1*outcome1 + 1*outcome2 + 1*outcome3
    s =~ 0*outcome0 + 1*outcome1 + 2*outcome2 + 3*outcome3

    outcome0 ~~ resvar*outcome0    
    outcome1 ~~ resvar*outcome1
    outcome2 ~~ resvar*outcome2
    outcome3 ~~ resvar*outcome3
"

growthCurveModel = growth(model, data=dWide)
summary(growthCurveModel)
## lavaan 0.6-3 ended normally after 27 iterations
## 
##   Optimization method                           NLMINB
##   Number of free parameters                          9
##   Number of equality constraints                     3
## 
##   Number of observations                           500
## 
##   Estimator                                         ML
##   Model Fit Test Statistic                      17.105
##   Degrees of freedom                                 8
##   P-value (Chi-square)                           0.029
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Information saturated (h1) model          Structured
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   i =~                                                
##     outcome0          1.000                           
##     outcome1          1.000                           
##     outcome2          1.000                           
##     outcome3          1.000                           
##   s =~                                                
##     outcome0          0.000                           
##     outcome1          1.000                           
##     outcome2          2.000                           
##     outcome3          3.000                           
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   i ~~                                                
##     s                 0.207    0.050    4.170    0.000
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .outcome0          0.000                           
##    .outcome1          0.000                           
##    .outcome2          0.000                           
##    .outcome3          0.000                           
##     i                 0.490    0.048   10.151    0.000
##     s                 0.264    0.046    5.802    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .outcom0 (rsvr)    0.238    0.011   22.361    0.000
##    .outcom1 (rsvr)    0.238    0.011   22.361    0.000
##    .outcom2 (rsvr)    0.238    0.011   22.361    0.000
##    .outcom3 (rsvr)    0.238    0.011   22.361    0.000
##     i                 0.998    0.074   13.478    0.000
##     s                 0.988    0.066   15.076    0.000

Compare to the lme4 output.

##  Groups   Name        Variance Corr
##  subject  (Intercept) 1.000        
##           time        0.990    0.21
##  Residual             0.238

In addition, the random coefficients estimates from the mixed model perfectly correlate with those of the latent variables.

Int_mix Slope_mix Int_LCM Slope_LCM
-1.12 0.72 -1.12 0.72
0.90 -0.61 0.90 -0.61
1.08 1.72 1.08 1.72
-1.86 -0.68 -1.86 -0.68
0.06 1.94 0.06 1.94
0.87 0.24 0.87 0.24
##           Int_mix Slope_mix Int_LCM Slope_LCM
## Int_mix      1.00      0.29    1.00      0.29
## Slope_mix    0.29      1.00    0.29      1.00
## Int_LCM      1.00      0.29    1.00      0.29
## Slope_LCM    0.29      1.00    0.29      1.00

Both approaches allow those residuals to covary, though the SEM syntax requires much more specificity. Here is the syntax for letting the residual variance of each time point covary with the next. Specifically, this is the syntax for a first-order autoregressive structure (aka AR1). We achieve this by defining different constraints for one-lag, two-lag, and three-lag residual covariances in the indicators. Then we specify the parametric form \(\rho\), \(\rho^2\), and \(\rho^3\) for these lags. Note that we have constrained the residual variances to be equal at each timepoint (homogenous error).

\[ \boldsymbol{\Theta} = \begin{bmatrix} \varepsilon & \rho & \rho^2 & \rho^3 \\ \rho & \varepsilon& \rho & \rho^2 \\ \rho^2 & \rho & \varepsilon & \rho \\ \rho^3 & \rho^2 & \rho & \varepsilon \\ \end{bmatrix} \]

model = "
    # intercept and slope with fixed coefficients
    i =~ 1*outcome0 + 1*outcome1 + 1*outcome2 + 1*outcome3
    s =~ 0*outcome0 + 1*outcome1 + 2*outcome2 + 3*outcome3

    # all of the following is needed for what are essentially only two parameters 
    # to estimate- resvar and correlation (the latter defined explicitly here)
    outcome0 ~~ resvar*outcome0
    outcome1 ~~ resvar*outcome1
    outcome2 ~~ resvar*outcome2
    outcome3 ~~ resvar*outcome3

    # timepoints 1 step apart; technically the covariance is e.g. a*sqrt(outcome0)*sqrt(outcome1), 
    # but since the variances are constrained to be equal, we don't have to be so verbose.
    outcome0 ~~ a*outcome1
    outcome1 ~~ a*outcome2
    outcome2 ~~ a*outcome3
    
    # two steps apart
    outcome0 ~~ b*outcome2
    outcome1 ~~ b*outcome3
    
    # three steps apart
    outcome0 ~~ c*outcome3
    
    # fix parameters according to ar1
    b == a^2
    c == a^3
"
## lavaan 0.6-3 ended normally after 287 iterations
## 
##   Optimization method                           NLMINB
##   Number of free parameters                         15
## 
##   Number of observations                           500
## 
##   Estimator                                         ML
##   Model Fit Test Statistic                      14.881
##   Degrees of freedom                                 7
##   P-value (Chi-square)                           0.038
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Information saturated (h1) model          Structured
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   i =~                                                                  
##     outcome0          1.000                               0.980    0.884
##     outcome1          1.000                               0.980    0.603
##     outcome2          1.000                               0.980    0.399
##     outcome3          1.000                               0.980    0.291
##   s =~                                                                  
##     outcome0          0.000                               0.000    0.000
##     outcome1          1.000                               0.991    0.610
##     outcome2          2.000                               1.983    0.808
##     outcome3          3.000                               2.974    0.882
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##  .outcome0 ~~                                                           
##    .outcome1   (a)    0.029    0.021    1.397    0.162    0.029    0.110
##  .outcome1 ~~                                                           
##    .outcome2   (a)    0.029    0.021    1.397    0.162    0.029    0.110
##  .outcome2 ~~                                                           
##    .outcome3   (a)    0.029    0.021    1.397    0.162    0.029    0.110
##  .outcome0 ~~                                                           
##    .outcome2   (b)    0.001    0.001    0.698    0.485    0.001    0.003
##  .outcome1 ~~                                                           
##    .outcome3   (b)    0.001    0.001    0.698    0.485    0.001    0.003
##  .outcome0 ~~                                                           
##    .outcome3   (c)    0.000    0.000    0.466    0.641    0.000    0.000
##   i ~~                                                                  
##     s                 0.217    0.051    4.293    0.000    0.223    0.223
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .outcome0          0.000                               0.000    0.000
##    .outcome1          0.000                               0.000    0.000
##    .outcome2          0.000                               0.000    0.000
##    .outcome3          0.000                               0.000    0.000
##     i                 0.492    0.048   10.195    0.000    0.502    0.502
##     s                 0.263    0.046    5.765    0.000    0.265    0.265
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .outcom0 (rsvr)    0.267    0.025   10.714    0.000    0.267    0.218
##    .outcom1 (rsvr)    0.267    0.025   10.714    0.000    0.267    0.101
##    .outcom2 (rsvr)    0.267    0.025   10.714    0.000    0.267    0.044
##    .outcom3 (rsvr)    0.267    0.025   10.714    0.000    0.267    0.024
##     i                 0.960    0.079   12.155    0.000    1.000    1.000
##     s                 0.983    0.066   14.883    0.000    1.000    1.000
## 
## Constraints:
##                                                |Slack|
##     b - (a^2)                                    0.000
##     c - (a^3)                                    0.000
## 
## Attaching package: 'nlme'
## The following object is masked from 'package:lme4':
## 
##     lmList
## The following object is masked from 'package:dplyr':
## 
##     collapse
## Linear mixed-effects model fit by maximum likelihood
##  Data: d 
##    AIC  BIC logLik
##   5837 5876  -2911
## 
## Random effects:
##  Formula: ~1 + time | subject
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev Corr  
## (Intercept) 0.977  (Intr)
## time        0.991  0.225 
## Residual    0.522        
## 
## Correlation Structure: AR(1)
##  Formula: ~time | subject 
##  Parameter estimate(s):
##   Phi 
## 0.124 
## Fixed effects: outcome ~ time 
##             Value Std.Error   DF t-value p-value
## (Intercept) 0.492    0.0483 1499   10.19       0
## time        0.263    0.0456 1499    5.76       0
##  Correlation: 
##      (Intr)
## time 0.121 
## 
## Standardized Within-Group Residuals:
##     Min      Q1     Med      Q3     Max 
## -2.2956 -0.4593  0.0108  0.4533  2.7605 
## 
## Number of Observations: 2000
## Number of Groups: 500

If you’re curious, here are the equivalent Bayesian multilevel models implemented using stan and the brms package

4 Parallel process LCM example

From Michael Clark

# parallel process --------------------------------------------------------
# See EXAMPLE 6.13 in Mplus User's Guide

# let's simulate data with a negative slope average and positive correlation among intercepts and other process slopes
set.seed(1234)
n = 500
timepoints = 4
time = rep(0:3, times=n)
subject = rep(1:n, each=4)

# first we'll draw intercepts with overall mean .5 and -.5 for the two
# processes, and let them have a slight correlation. Their variance is 1.
intCorr = matrix(c(1,.2,.2,1), ncol=2) 
colnames(intCorr) = rownames(intCorr) = c('i1', 'i2')
intCorr
##     i1  i2
## i1 1.0 0.2
## i2 0.2 1.0
interceptP1 = .5
interceptP2 = -.5

ranInts = MASS::mvrnorm(n, mu=c(0,0), Sigma = intCorr, empirical=T)
ranInts = data.frame(ranInts)
head(ranInts)
i1 i2
-1.477 0.454
0.639 -0.953
0.774 1.138
-2.197 -1.007
-0.192 1.847
0.537 0.034
cor(ranInts)
##     i1  i2
## i1 1.0 0.2
## i2 0.2 1.0
colMeans(ranInts)
##        i1        i2 
## -1.53e-17 -4.89e-18
# now create slopes with intercept/mean .4, -.4, but the same positive
# relationship with their respective intercept. Their variances are also 1.
slopeP1 = .4
slopeP2 = -.4

s1 = .3*ranInts$i2  + rnorm(n)
s2 = .3*ranInts$i1  + rnorm(n)

ranef = data.frame(ranInts, s1, s2)
head(ranef)
i1 i2 s1 s2
-1.477 0.454 -1.069 1.351
0.639 -0.953 0.016 -1.173
0.774 1.138 -1.198 -0.475
-2.197 -1.007 0.333 -1.215
-0.192 1.847 1.257 -0.368
0.537 0.034 -1.896 -0.215
# so we have slight positive correlations among all random intercepts and slopes
y1 = (interceptP1 + ranef$i1[subject]) + (slopeP1+ranef$s1[subject])*time + rnorm(n*timepoints, sd=.5)
d1 = data.frame(Subject=subject, time=time, y1)
head(d1)
Subject time y1
1 0 -1.464
1 1 -1.697
1 2 -2.371
1 3 -2.389
2 0 0.311
2 1 1.032
library(ggplot2)
ggplot(aes(x=time, y=y1), data=d1) + 
  geom_line(aes(group=Subject), alpha=.1) + 
  geom_smooth(method='lm',color='red')

y2 = (interceptP2 + ranef$i2[subject]) + (slopeP2+ranef$s2[subject])*time + rnorm(n*timepoints, sd=.5)
d2 = data.frame(Subject=subject, time=time, y2)

# process 2 shows the downward overall trend as expected
ggplot(aes(x=time, y=y2), data=d2) + 
  geom_line(aes(group=Subject), alpha=.1) + 
  geom_smooth(method='lm',color='red')

# Widen from long form for lavaan
library(tidyr)
negslopepospath1 = d1 %>% spread(time, y1)
colnames(negslopepospath1) = c('Subject', paste0('y1', 1:4))
head(negslopepospath1)
Subject y11 y12 y13 y14
1 -1.464 -1.697 -2.371 -2.39
2 0.311 1.032 1.100 2.64
3 1.051 -0.444 -0.732 -1.62
4 -1.549 -0.685 -0.867 1.01
5 0.541 2.082 3.754 5.50
6 1.523 -0.723 -1.932 -2.05
negslopepospath2 = d2 %>% spread(time, y2)
colnames(negslopepospath2) = c('Subject', paste0('y2', 1:4))

# combine
dparallel = dplyr::left_join(negslopepospath1, negslopepospath2)
## Joining, by = "Subject"
head(dparallel)
Subject y11 y12 y13 y14 y21 y22 y23 y24
1 -1.464 -1.697 -2.371 -2.39 -0.528 0.498 1.758 3.76
2 0.311 1.032 1.100 2.64 -1.102 -2.540 -4.662 -6.87
3 1.051 -0.444 -0.732 -1.62 0.992 -0.340 -0.977 -1.51
4 -1.549 -0.685 -0.867 1.01 -1.163 -4.155 -5.816 -5.50
5 0.541 2.082 3.754 5.50 0.787 0.084 -0.747 -1.26
6 1.523 -0.723 -1.932 -2.05 -0.792 -1.418 -2.184 -1.59
mainModel = "
i1 =~ 1*y11 + 1*y12 + 1*y13 + 1*y14
s1 =~ 0*y11 + 1*y12 + 2*y13 + 3*y14


i2 =~ 1*y21 + 1*y22 + 1*y23 + 1*y24
s2 =~ 0*y21 + 1*y22 + 2*y23 + 3*y24

s1 ~ i2
s2 ~ i1
"

mainRes  = growth(mainModel, data=dparallel)
summary(mainRes)
## lavaan 0.6-3 ended normally after 42 iterations
## 
##   Optimization method                           NLMINB
##   Number of free parameters                         20
## 
##   Number of observations                           500
## 
##   Estimator                                         ML
##   Model Fit Test Statistic                      20.499
##   Degrees of freedom                                24
##   P-value (Chi-square)                           0.668
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Information saturated (h1) model          Structured
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   i1 =~                                               
##     y11               1.000                           
##     y12               1.000                           
##     y13               1.000                           
##     y14               1.000                           
##   s1 =~                                               
##     y11               0.000                           
##     y12               1.000                           
##     y13               2.000                           
##     y14               3.000                           
##   i2 =~                                               
##     y21               1.000                           
##     y22               1.000                           
##     y23               1.000                           
##     y24               1.000                           
##   s2 =~                                               
##     y21               0.000                           
##     y22               1.000                           
##     y23               2.000                           
##     y24               3.000                           
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   s1 ~                                                
##     i2                0.286    0.047    6.027    0.000
##   s2 ~                                                
##     i1                0.282    0.049    5.725    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   i1 ~~                                               
##     i2                0.225    0.055    4.101    0.000
##  .s1 ~~                                               
##    .s2               -0.012    0.046   -0.267    0.789
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .y11               0.000                           
##    .y12               0.000                           
##    .y13               0.000                           
##    .y14               0.000                           
##    .y21               0.000                           
##    .y22               0.000                           
##    .y23               0.000                           
##    .y24               0.000                           
##     i1                0.487    0.050    9.659    0.000
##    .s1                0.574    0.048   11.847    0.000
##     i2               -0.459    0.048   -9.595    0.000
##    .s2               -0.567    0.053  -10.679    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .y11               0.267    0.043    6.275    0.000
##    .y12               0.234    0.022   10.548    0.000
##    .y13               0.214    0.029    7.297    0.000
##    .y14               0.328    0.068    4.866    0.000
##    .y21               0.244    0.039    6.231    0.000
##    .y22               0.208    0.020   10.341    0.000
##    .y23               0.229    0.029    7.886    0.000
##    .y24               0.280    0.065    4.307    0.000
##     i1                1.090    0.082   13.303    0.000
##    .s1                0.865    0.059   14.631    0.000
##     i2                0.980    0.074   13.267    0.000
##    .s2                1.055    0.071   14.892    0.000
fscores = lavPredict(mainRes)
broom::tidy(data.frame(fscores))
## Warning: 'tidy.data.frame' is deprecated.
## See help("Deprecated")
column n mean sd median trimmed mad min max range skew kurtosis se
i1 500 0.487 0.971 0.509 0.485 0.696 -2.49 3.00 5.49 -0.050 2.94 0.043
s1 500 0.442 0.948 0.491 0.458 0.629 -2.34 2.94 5.28 -0.192 2.92 0.042
i2 500 -0.459 0.920 -0.458 -0.450 0.577 -3.72 2.00 5.71 -0.121 3.12 0.041
s2 500 -0.429 1.048 -0.419 -0.426 0.688 -3.73 2.61 6.35 -0.059 2.98 0.047
lm(s2~., fscores)
## 
## Call:
## lm(formula = s2 ~ ., data = fscores)
## 
## Coefficients:
## (Intercept)           i1           s1           i2  
##      -0.492        0.309       -0.068        0.127
cor(fscores)
##       i1     s1    i2     s2
## i1 1.000 0.2318 0.253 0.3001
## s1 0.232 1.0000 0.320 0.0406
## i2 0.253 0.3203 1.000 0.1642
## s2 0.300 0.0406 0.164 1.0000
qplot(s1, i2, data=data.frame(fscores)) + geom_smooth(method='lm', se=F)

fv = lavPredict(mainRes, 'ov')
summary(mainRes, standardized=T)
## lavaan 0.6-3 ended normally after 42 iterations
## 
##   Optimization method                           NLMINB
##   Number of free parameters                         20
## 
##   Number of observations                           500
## 
##   Estimator                                         ML
##   Model Fit Test Statistic                      20.499
##   Degrees of freedom                                24
##   P-value (Chi-square)                           0.668
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Information saturated (h1) model          Structured
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   i1 =~                                                                 
##     y11               1.000                               1.044    0.896
##     y12               1.000                               1.044    0.674
##     y13               1.000                               1.044    0.452
##     y14               1.000                               1.044    0.325
##   s1 =~                                                                 
##     y11               0.000                               0.000    0.000
##     y12               1.000                               0.972    0.628
##     y13               2.000                               1.944    0.841
##     y14               3.000                               2.916    0.908
##   i2 =~                                                                 
##     y21               1.000                               0.990    0.895
##     y22               1.000                               0.990    0.632
##     y23               1.000                               0.990    0.403
##     y24               1.000                               0.990    0.287
##   s2 =~                                                                 
##     y21               0.000                               0.000    0.000
##     y22               1.000                               1.068    0.682
##     y23               2.000                               2.137    0.870
##     y24               3.000                               3.205    0.929
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   s1 ~                                                                  
##     i2                0.286    0.047    6.027    0.000    0.291    0.291
##   s2 ~                                                                  
##     i1                0.282    0.049    5.725    0.000    0.275    0.275
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   i1 ~~                                                                 
##     i2                0.225    0.055    4.101    0.000    0.218    0.218
##  .s1 ~~                                                                 
##    .s2               -0.012    0.046   -0.267    0.789   -0.013   -0.013
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .y11               0.000                               0.000    0.000
##    .y12               0.000                               0.000    0.000
##    .y13               0.000                               0.000    0.000
##    .y14               0.000                               0.000    0.000
##    .y21               0.000                               0.000    0.000
##    .y22               0.000                               0.000    0.000
##    .y23               0.000                               0.000    0.000
##    .y24               0.000                               0.000    0.000
##     i1                0.487    0.050    9.659    0.000    0.467    0.467
##    .s1                0.574    0.048   11.847    0.000    0.590    0.590
##     i2               -0.459    0.048   -9.595    0.000   -0.464   -0.464
##    .s2               -0.567    0.053  -10.679    0.000   -0.531   -0.531
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .y11               0.267    0.043    6.275    0.000    0.267    0.197
##    .y12               0.234    0.022   10.548    0.000    0.234    0.098
##    .y13               0.214    0.029    7.297    0.000    0.214    0.040
##    .y14               0.328    0.068    4.866    0.000    0.328    0.032
##    .y21               0.244    0.039    6.231    0.000    0.244    0.199
##    .y22               0.208    0.020   10.341    0.000    0.208    0.085
##    .y23               0.229    0.029    7.886    0.000    0.229    0.038
##    .y24               0.280    0.065    4.307    0.000    0.280    0.024
##     i1                1.090    0.082   13.303    0.000    1.000    1.000
##    .s1                0.865    0.059   14.631    0.000    0.915    0.915
##     i2                0.980    0.074   13.267    0.000    1.000    1.000
##    .s2                1.055    0.071   14.892    0.000    0.924    0.924
d3heatmap::d3heatmap(cor(fv, fscores))
d3heatmap::d3heatmap(cor(select(dparallel, -Subject), ranef), Rowv = F, Colv = F)
process1Model = "
  i1 =~ 1*y11 + 1*y12 + 1*y13 + 1*y14
  s1 =~ 0*y11 + 1*y12 + 2*y13 + 3*y14
"
p1Res = growth(process1Model, data=dparallel)
fscoresP1 = lavPredict(p1Res)

process2Model = "
  i2 =~ 1*y21 + 1*y22 + 1*y23 + 1*y24
  s2 =~ 0*y21 + 1*y22 + 2*y23 + 3*y24
"
p2Res = growth(process2Model, data=dparallel)
fscoresP2 = lavPredict(p2Res)

fscoresSeparate = data.frame(fscoresP1, fscoresP2)

pathMod = "
  s1 ~ i2
  s2 ~ i1
  
  i1~~i2
"

pathModRes = sem(pathMod, data=fscoresSeparate, fixed.x = F)
summary(pathModRes)  # you'd have come to the same conclusions
## lavaan 0.6-3 ended normally after 16 iterations
## 
##   Optimization method                           NLMINB
##   Number of free parameters                          8
## 
##   Number of observations                           500
## 
##   Estimator                                         ML
##   Model Fit Test Statistic                      31.590
##   Degrees of freedom                                 2
##   P-value (Chi-square)                           0.000
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Information saturated (h1) model          Structured
##   Standard Errors                             Standard
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   s1 ~                                                
##     i2                0.287    0.044    6.489    0.000
##   s2 ~                                                
##     i1                0.293    0.047    6.227    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   i2 ~~                                               
##     i1                0.197    0.040    4.925    0.000
##  .s1 ~~                                               
##    .s2               -0.022    0.041   -0.551    0.581
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .s1                0.812    0.051   15.811    0.000
##    .s2                1.015    0.064   15.811    0.000
##     i2                0.829    0.052   15.811    0.000
##     i1                0.919    0.058   15.811    0.000
summary(mainRes)
## lavaan 0.6-3 ended normally after 42 iterations
## 
##   Optimization method                           NLMINB
##   Number of free parameters                         20
## 
##   Number of observations                           500
## 
##   Estimator                                         ML
##   Model Fit Test Statistic                      20.499
##   Degrees of freedom                                24
##   P-value (Chi-square)                           0.668
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Information saturated (h1) model          Structured
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   i1 =~                                               
##     y11               1.000                           
##     y12               1.000                           
##     y13               1.000                           
##     y14               1.000                           
##   s1 =~                                               
##     y11               0.000                           
##     y12               1.000                           
##     y13               2.000                           
##     y14               3.000                           
##   i2 =~                                               
##     y21               1.000                           
##     y22               1.000                           
##     y23               1.000                           
##     y24               1.000                           
##   s2 =~                                               
##     y21               0.000                           
##     y22               1.000                           
##     y23               2.000                           
##     y24               3.000                           
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   s1 ~                                                
##     i2                0.286    0.047    6.027    0.000
##   s2 ~                                                
##     i1                0.282    0.049    5.725    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   i1 ~~                                               
##     i2                0.225    0.055    4.101    0.000
##  .s1 ~~                                               
##    .s2               -0.012    0.046   -0.267    0.789
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .y11               0.000                           
##    .y12               0.000                           
##    .y13               0.000                           
##    .y14               0.000                           
##    .y21               0.000                           
##    .y22               0.000                           
##    .y23               0.000                           
##    .y24               0.000                           
##     i1                0.487    0.050    9.659    0.000
##    .s1                0.574    0.048   11.847    0.000
##     i2               -0.459    0.048   -9.595    0.000
##    .s2               -0.567    0.053  -10.679    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .y11               0.267    0.043    6.275    0.000
##    .y12               0.234    0.022   10.548    0.000
##    .y13               0.214    0.029    7.297    0.000
##    .y14               0.328    0.068    4.866    0.000
##    .y21               0.244    0.039    6.231    0.000
##    .y22               0.208    0.020   10.341    0.000
##    .y23               0.229    0.029    7.886    0.000
##    .y24               0.280    0.065    4.307    0.000
##     i1                1.090    0.082   13.303    0.000
##    .s1                0.865    0.059   14.631    0.000
##     i2                0.980    0.074   13.267    0.000
##    .s2                1.055    0.071   14.892    0.000

  1. One can set REML=F so as to use standard maximum likelihood and make the results directly comparable to lavaan.

  2. Those familiar with the model matrix for regression will recognize ‘intercept as 1’.