1 Brief intro to SEM

Structural equation modeling (SEM) can be a very useful tool in determining relationships between variables. Often SEM is used in a “confirmatory” manner, when determining whether a certain model is valid (i.e., comparing the goodness-of-fit of nested models). We can even extend SEM to study interactions across groups. SEM is sometimes referred to as causal modeling, path analysis (with latent variables), or covariances structure analysis. It subsumes a bunch of other techniques, like multiple regression, confirmatory factor analysis, ANOVA, etc.

You supply the observed relationship between variables (i.e., the covariance or correlation matrix), the # of observations, and a formal model specificiation, and SEM basically estimates parameters that will give you the “best” reproduction of the covariance matrix. The better your model fit, the better your reproduction of the covariance matrix (hence, lower chi-squared = better model)!

1.1 Mediation

For more information on how to conduct classic mediation with lavaan, check out the tutorial here.

1.2 Latent variables

Often we are interested in investigating latent, abstract variables (like “intelligence”) by obtaining multiple observable measures (e.g., high school GPA, SAT and ACT scores). Using SEM we can easily include latent variables!

1.3 Sample size

Source

SEM necessitates large sample sizes! In the literature, sample sizes commonly run 200 - 400 for models with 10 - 15 indicators. One survey of 72 SEM studies found the median sample size was 198. A sample of 150 is considered too small unless the covariance coefficients are relatively large. With over ten variables, sample size under 200 generally means parameter estimates are unstable and significance tests lack power.

One rule of thumb found in the literature is that sample size should be at least 50 more than 8 times the number of variables in the model. Mitchell (1993) advances the rule of thumb that there be 10 to 20 times as many cases as variables. Another rule of thumb, based on Stevens (1996), is to have at least 15 cases per measured variable or indicator. The researcher should go beyond these minimum sample size recommendations particularly when data are non-normal (skewed, kurtotic) or incomplete. Note also that to compute the asymptotic covariance matrix, one needs \(\frac{k(k+1)}{2}\) observations, where \(k\) is the number of variables.

1.4 Lavaan in action

Kievit, R. A., Davis, S. W., Mitchell, D. J., Taylor, J. R., Duncan, J., Henson, R. N., & Cam-CAN Research team. (2014). Distinct aspects of frontal lobe structure mediate age-related differences in fluid intelligence and multitasking. Nature communications, 5.

“Following best practice in SEM, we first report our measurement model on the full sample of N=567 (for further details on the sample and model fitting, see Methods). For this model, we hypothesize that two latent variables (fluid intelligence and multitasking; tasks shown in Fig. 1, see Methods for more detail) capture the covariance between the six behavioural variables described in the Methods section, freely estimating every factor loading. This model fits the data well, \(\chi^2\) = 15.40, degrees of freedom (df) = 8, P = 0.052, root mean square error of approximation (RMSEA) = 0.04 (0.00-0.070), comparative fit index (CFI) = 0.993, standardized root mean square residual (SRMR)=0.023, Satorra-Bentler scaling factor=1.009. As the two latent factors are positively correlated (standardized estimate = 0.325, Z = 6.17, P<0.001), we can ask whether a more parsimonious model with only a single cognitive factor (for example, ‘executive function’) shows better fit. Such a model would be compatible with a unitary perspective on the age-related decline of higher cognitive function. However, this one-factor model fits very poorly: \(\chi^2\) =334.149, df=9, P<0.00001, RMSEA=0.252 (0.231-0.275), CFI = 0.685, SRMR = 0.121, Satorra-Bentler scaling factor = 1.109, significantly worse than the two-factor model (\(\chi^2\) = 46.224, dfdiff = 1, P<0.00001).”

“For this reason, partial least squares is often considered a pragmatic alternative to the more principled, theory-driven SEM approach, which is why we here prefer the latter to test our a priori conceptualization”

“SEM were fit using the package Lavaan in R, plots were generated using ggplot2 (ref. 69). We used the following guidelines for judging good fit: RMSEA<0.05 (acceptable: 0.05-0.08), CFI>0.97 (acceptable: 0.95-0.97) and SRMR<0.05 (acceptable: 0.05-0.10) and report the Satorra-Bentler scaling factor for each fitted model. All models were fit using Maximum Likelihood Estimation using robust s.e., and report overall model fit using the Satorra-Bentler scaled test statistic”

2 Lavaan steps & syntax

2.1 Brief dataset description

Here we have data on environmental, behavioral, genetic, and control variables. The main environmental risk was adolescent-onset use of cannabis (earlyon = 1/0, i.e., yes/no). The variable, schizo (= 1/0), indexes whether or not a person is diagnosed with schizophreniform disorder at age 26, and psysym is a self-report of the severity of psychotic symptoms at that age. The specific gene used to define the genotypes was the COMT gene, which has two alleles, namely, valine (V) and methionine (M), that affect the level of dopamine activity. We can treat “genotype” as a quantitative variable (COMT) with values 0 (for MM individuals), 1 (for VM) and 2 (for VV). Many control variables were measured, such as, chpsycq a quantitative measure of the severity of childhood psychotic symptoms.

2.2 Define your data!

Lavaan can take a correlation or covariance matrix (or even a dataframe) as input. Here, we have a covariance matrix. Remember to look at your data!

d = read.table("http://www.stanford.edu/class/psych253/data/comt-covar7.txt", sep = "\t")

d

V1
Data Ninput=7 Nobs=803
CMatrix
.183
.007 .500
.009 .001 .035
.556 .207 .018 13.025
.061 .010 -.007 .466 .164
.030 -.275 .096 .899 .216 25.646
.188 .134 .015 .695 .060 -.163 .321
Labels EarlyOn COMT Schizo Psysym Conduct Chpsyq GenxEnv

lower = '
.183
.007 .500
.009 .001 .035
.556 .207 .018 13.025
.061 .010 -.007 .466 .164
.030 -.275 .096 .899 .216 25.646
.188 .134 .015 .695 .060 -.163 .321'

labels1 = c("EarlyOn", "COMT", "Schizo", "Psysym", "Conduct", "Chpsyq", "GenxEnv")
comt7.cov = getCov(lower, names = labels1); comt7.cov

##         EarlyOn   COMT Schizo Psysym Conduct Chpsyq GenxEnv
## EarlyOn   0.183  0.007  0.009  0.556   0.061  0.030   0.188
## COMT      0.007  0.500  0.001  0.207   0.010 -0.275   0.134
## Schizo    0.009  0.001  0.035  0.018  -0.007  0.096   0.015
## Psysym    0.556  0.207  0.018 13.025   0.466  0.899   0.695
## Conduct   0.061  0.010 -0.007  0.466   0.164  0.216   0.060
## Chpsyq    0.030 -0.275  0.096  0.899   0.216 25.646  -0.163
## GenxEnv   0.188  0.134  0.015  0.695   0.060 -0.163   0.321

plot_matrix(comt7.cov)

This plot shows the relative weight of the variance/covariance terms in our matrix. Clearly, the variance of chpsyq is the largest in our matrix, which is consistent with the numbers above.

2.3 Plan the model

Before running SEM, it’s helpful to plan out your hypothesized “model” of how the variables are related. For instance, run lm() or glm() models to get a sense for relationships! Better yet, have a priori hypotheses before collecting data (based off of the literature, etc.), and you can generate your model before you even have data!

It’s often helpful to sketch out your model beforehand, so grab a piece of paper (or find a whiteboard) for the homework, and use that to draw out your models! Remember that circles are used to describe latent variables, and squares for observed variables. Some predictor variables can be exogenous, meaning they are of “external origin” and their causes are not included in the model (i.e., no arrows are pointing towards it). The remaining variables are endogenous (e.g., a DV), meaning that they are the effects of other variables; in SEM, endogenous variables can also predict other variables!

Here we have a model that we want to recreate:

Notice in this path diagram, unlike the one shown above, there are no latent variables (i.e., no measurement model). In this case we are only modeling the structural model (i.e., path analysis). This is review from Week 3.

2.4 Define the model

As a note, remember that the numbers of parameters you can estimate depends on the number of unique cells in your variance/covariance matrix. That is, the number of diagonal terms plus the number of terms in one of the off-diagonal triangles (lower or upper, but not both). This equates to: \(k(k+1)/2\) where k is the number of manifest variables in your data. In the case of these data, \(\# = 7(7+1)/2 = 28\) parameters that can be estimated.

Similar to lm() and lmer(), etc., the model format has the DV on the left side of the operator, and the IV on the right side. In terms of the structural diagram, you could think of the arrow going from right to left. When the operator is a ~ this is regression, such that the DV(s) are “predicted by” the IV(s). When the operator is =~ we are defining a latent variable, reading it like L is “measured by” IVs. Variances/covariances are specified with a ~~ (e.g., DV is “correlated with” DV), and intercepts are simple regression formulas with 1 as the predictor. We include all these formulas inside single quotes.

Regressions
- Y is predicted by X1 and X2: Y ~ X1 + X2
- Y1 is predicted by X1, and Y2 is predicted by X1: Y1 + Y2 ~ X1
Latent Variables
- Latent variable L is measured by variables U and V: L =~ U + V
Variances & covariances
- The variance of X: X ~~ X
- The covariance of X and Y1: X ~~ Y1
- The covariance of X and Y2 is fixed at 0: X ~~ 0*Y2
Intercepts
- The intercept of X: X ~ 1

For some practice generating the structural plot and equations, go here!

comt7.model1 = '
  Schizo ~ EarlyOn + GenxEnv + Psysym
  Psysym ~ EarlyOn + Conduct + Schizo
    Conduct ~ EarlyOn
    Chpsyq ~ COMT
'

In this model, we want to specify the following:

Schizo is predicted by: EarlyOn, GenxEnv, Psysym
Psysym is predicted by: EarlyOn, Conduct, Schizo
Conduct is predicted by: EarlyOn
Chpsyp is predicted by: COMT

Note that this model doesn’t include any definitions of latent variables, or explicit variances/covariances. However, the variances can still be estimated, and the covariances between exogenous variables can still be estimated, as we’ll see in a second!

2.5 Fit the model

Once we have the model specified, the covariance matrix all set, and we know the number of observations (n=803 in this case), we can actually fit the model!

Note that as long as we have fixed.x = F, the model will estimate the variance, and covariance of your IVs as free parameters automatically. These parameters must be explicitly removed if you don’t want them. If our IVs were latent variables we could remove this argument as the variances and covariances among these variables would be automatically estimated.

comt7.fit1 = sem(comt7.model1, fixed.x = F, 
                 sample.cov = comt7.cov, 
                 sample.nobs = 803) 

#if we were working with raw data, instead of a covariance matrix:
# comt7.fit1 = sem(comt7.model1, fixed.x = F, 
#                  data=data)

2.6 Analyze/visualize the model fit

summary(comt7.fit1)

## lavaan (0.5-23.1097) converged normally after  77 iterations
## 
##   Number of observations                           803
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic               33.578
##   Degrees of freedom                                10
##   P-value (Chi-square)                           0.000
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   Schizo ~                                            
##     EarlyOn           0.086    0.038    2.279    0.023
##     GenxEnv           0.084    0.024    3.577    0.000
##     Psysym           -0.041    0.010   -4.003    0.000
##   Psysym ~                                            
##     EarlyOn           1.438    0.433    3.322    0.001
##     Conduct           2.866    0.452    6.335    0.000
##     Schizo           13.109    3.301    3.971    0.000
##   Conduct ~                                           
##     EarlyOn           0.333    0.031   10.661    0.000
##   Chpsyq ~                                            
##     COMT             -0.550    0.252   -2.183    0.029
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   EarlyOn ~~                                          
##     GenxEnv           0.188    0.011   17.368    0.000
##     COMT              0.007    0.011    0.656    0.512
##   GenxEnv ~~                                          
##     COMT              0.134    0.015    8.989    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .Schizo            0.052    0.009    5.529    0.000
##    .Psysym           16.318    2.973    5.488    0.000
##    .Conduct           0.143    0.007   20.037    0.000
##    .Chpsyq           25.463    1.271   20.037    0.000
##     EarlyOn           0.183    0.009   20.037    0.000
##     GenxEnv           0.321    0.016   20.037    0.000
##     COMT              0.499    0.025   20.037    0.000

# Just the parameter estimates
parameterEstimates(comt7.fit1)

lhs	op	rhs	est	se	z	pvalue	ci.lower	ci.upper
Schizo	~	EarlyOn	0.086	0.038	2.279	0.023	0.012	0.160
Schizo	~	GenxEnv	0.084	0.024	3.577	0.000	0.038	0.131
Schizo	~	Psysym	-0.041	0.010	-4.003	0.000	-0.061	-0.021
Psysym	~	EarlyOn	1.438	0.433	3.322	0.001	0.590	2.287
Psysym	~	Conduct	2.866	0.452	6.335	0.000	1.979	3.753
Psysym	~	Schizo	13.109	3.301	3.971	0.000	6.639	19.579
Conduct	~	EarlyOn	0.333	0.031	10.661	0.000	0.272	0.395
Chpsyq	~	COMT	-0.550	0.252	-2.183	0.029	-1.044	-0.056
Schizo	~~	Schizo	0.052	0.009	5.529	0.000	0.033	0.070
Psysym	~~	Psysym	16.318	2.973	5.488	0.000	10.491	22.146
Conduct	~~	Conduct	0.143	0.007	20.037	0.000	0.129	0.158
Chpsyq	~~	Chpsyq	25.463	1.271	20.037	0.000	22.972	27.954
EarlyOn	~~	EarlyOn	0.183	0.009	20.037	0.000	0.165	0.201
EarlyOn	~~	GenxEnv	0.188	0.011	17.368	0.000	0.167	0.209
EarlyOn	~~	COMT	0.007	0.011	0.656	0.512	-0.014	0.028
GenxEnv	~~	GenxEnv	0.321	0.016	20.037	0.000	0.289	0.352
GenxEnv	~~	COMT	0.134	0.015	8.989	0.000	0.105	0.163
COMT	~~	COMT	0.499	0.025	20.037	0.000	0.451	0.548

# Plot!
semPaths(comt7.fit1, what='std', 
         node.label.cex=5,
         edge.label.cex=1.25, curvePivot = TRUE, 
         fade=FALSE)

## Warning in qgraph(Edgelist, labels = nLab, bidirectional = Bidir, directed
## = Directed, : The following arguments are not documented and likely not
## arguments of qgraph and thus ignored: node.label.cex

Looking at the model output, we can see that this model is not doing very well, since the estimated covariance matrix is significantly different from the actual covariance matrix, \(\chi^2\)(10) = 33.578, p < 0.001.

We can see, however, that we’re also doing a pretty good job predicting things. As predicted from our original causal diagram, adolescent-onset use of cannabis (EarlyOn) significantly predicts whether or not a person is diagnozed w/schizophreniform disorder at age 26 (Schizo), z=2.3, p < 0.05. Schizo is also significantly predicted by GenxEnv and Psysym (\(p\)s < 0.05). So that’s good! However, interpreting parameters in the context of a poorly fitting model is unadvisable. So let’s take a better look at where we’re failing…

2.7 Assess where you’re messing up!

One way to do this is to plot the strength of the residuals. Larger values (circles) suggest greater misfit (error) of our model .

plot_matrix(residuals(comt7.fit1)$cov)

residuals(comt7.fit1)

## $type
## [1] "raw"
## 
## $cov
##         Schizo Psysym Condct Chpsyq ErlyOn GnxEnv COMT  
## Schizo   0.000                                          
## Psysym   0.001  0.045                                   
## Conduct  0.001  0.012  0.000                            
## Chpsyq   0.100  0.960  0.217  0.000                     
## EarlyOn  0.000  0.000  0.000  0.034  0.000              
## GenxEnv -0.001  0.032 -0.003 -0.089  0.000  0.000       
## COMT    -0.006  0.094  0.008  0.000  0.000  0.000  0.000
## 
## $mean
##  Schizo  Psysym Conduct  Chpsyq EarlyOn GenxEnv    COMT 
##       0       0       0       0       0       0       0

We might want to add in some relationship between Psysym and Chpsyq, or between Schizo and Chpsyq, or between Conduct and Chpsyq.

We can also evaluate misfit empirically by looking at modification indices, which tell us the amount of improvement in our model we would have if we were to estimate a given parameter.

mi = modificationIndices(comt7.fit1)
head(mi[order(-mi$mi), ], 10)  #shows the unestimated parameters with the ten highest modification indices

	lhs	op	rhs	mi	epc	sepc.lv	sepc.all	sepc.nox
26	Schizo	~	Chpsyq	12.54	0.006	0.006	0.152	0.152
21	Schizo	~~	Chpsyq	12.36	0.142	0.142	0.150	0.150
24	Conduct	~~	Chpsyq	9.69	0.210	0.210	0.103	0.103
38	Chpsyq	~	Conduct	9.42	1.351	1.351	0.108	0.108
33	Conduct	~	Chpsyq	9.25	0.008	0.008	0.100	0.100
36	Chpsyq	~	Schizo	8.38	2.756	2.756	0.102	0.102
50	GenxEnv	~	Chpsyq	5.04	-0.005	-0.005	-0.043	-0.043
56	COMT	~	Chpsyq	4.95	0.019	0.019	0.136	0.136
44	EarlyOn	~	Chpsyq	4.36	0.004	0.004	0.043	0.043
30	Psysym	~	COMT	2.50	0.328	0.328	0.064	0.091

Here the “mi” column shows the decrease in the model chi-square that would occur if a given parameter was added.

2.8 Update the model

Let’s add in Chpsyq predicting Schizo, and a covariance between Conduct and Chpsyq.

comt7.model2 = '
  Schizo ~ EarlyOn + GenxEnv + Psysym + Chpsyq
  Psysym ~ EarlyOn + Conduct + Schizo
  Conduct ~ EarlyOn
  Chpsyq ~ COMT
  Conduct ~~ Chpsyq
'

comt7.fit2 = sem(comt7.model2, sample.cov = comt7.cov, sample.nobs = 803)
summary(comt7.fit2)

## lavaan (0.5-23.1097) converged normally after  79 iterations
## 
##   Number of observations                           803
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic               11.007
##   Degrees of freedom                                 8
##   P-value (Chi-square)                           0.201
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   Schizo ~                                            
##     EarlyOn           0.073    0.035    2.064    0.039
##     GenxEnv           0.091    0.023    3.961    0.000
##     Psysym           -0.039    0.009   -4.334    0.000
##     Chpsyq            0.006    0.002    3.399    0.001
##   Psysym ~                                            
##     EarlyOn           1.479    0.408    3.621    0.000
##     Conduct           2.827    0.422    6.704    0.000
##     Schizo           12.545    2.936    4.273    0.000
##   Conduct ~                                           
##     EarlyOn           0.332    0.031   10.676    0.000
##   Chpsyq ~                                            
##     COMT             -0.572    0.250   -2.286    0.022
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##  .Conduct ~~                                          
##    .Chpsyq            0.210    0.068    3.097    0.002
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .Schizo            0.049    0.008    6.276    0.000
##    .Psysym           15.841    2.556    6.197    0.000
##    .Conduct           0.143    0.007   20.037    0.000
##    .Chpsyq           25.463    1.271   20.037    0.000

2.9 Check if you improved

anova(comt7.fit2, comt7.fit1)

	Df	AIC	BIC	Chisq	Chisq diff	Df diff	Pr(>Chisq)
comt7.fit2	8	12365	12430	11.0	NA	NA	NA
comt7.fit1	10	12395	12480	33.6	22.6	2	0

This new model is doing significantly better than our first model! We have a lower AIC value, and our chi-squared is significantly lower! That means we’re doing a better job reproducing the covariance matrix! We also have a non-significant model chi-square now, suggesting this model fits the data well. We can stop attempting to improve the model here.

3 Including latent variables (the measurement model)

Holzinger and Swineford (1939) administered a battery of achievement tests to a sample of 145 grade-school-aged children in a Chicago school. Nine of these tests include the following:

1) Visual perception
2) Shape identification
3) Cube sorting
4) Paragraph completion
5) Sentence completion
6) Words in a minute
7) Addition
8) Dot counting 
9) Capitalization recognition

We might hypothesize that some number of latent achievement factors underlie these 9 indicators. Based on theory, we believe that a capacity for Visual Information Processing (VIS), Processing Speed (SPEED), and Verbal Capacity (VERB) are each important achievement markers that we are interested in among young students. We hypothesize that the first 3 tests measure VIS, the second 3 VERB, and the last 3 SPEED. Let’s confirm this assumption by examining the correlation matrix among our nine variables to see if these indicators seem to hang together as we hypothesized.

3.1 Reading in the correlation matrix:

cor2B <-'1.0
        .318 1.0
        .436 .419 1.0
        .335 .234 .323 1.0
        .304 .157 .283 .722 1.0
        .326 .195 .350 .714 .685 1.0
        .116 .057 .056 .203 .246 .170 1.0
        .314 .145 .229 .095 .181 .113 .585 1.0
        .489 .239 .361 .309 .345 .280 .408 .512 1.0'  
cor2B <- getCov(cor2B, names=c("vis", "cub", "shape", "para","sent", "word", "add", "count", "cap")) #create cor matrix object
cor2B #print cor matrix

##         vis   cub shape  para  sent  word   add count   cap
## vis   1.000 0.318 0.436 0.335 0.304 0.326 0.116 0.314 0.489
## cub   0.318 1.000 0.419 0.234 0.157 0.195 0.057 0.145 0.239
## shape 0.436 0.419 1.000 0.323 0.283 0.350 0.056 0.229 0.361
## para  0.335 0.234 0.323 1.000 0.722 0.714 0.203 0.095 0.309
## sent  0.304 0.157 0.283 0.722 1.000 0.685 0.246 0.181 0.345
## word  0.326 0.195 0.350 0.714 0.685 1.000 0.170 0.113 0.280
## add   0.116 0.057 0.056 0.203 0.246 0.170 1.000 0.585 0.408
## count 0.314 0.145 0.229 0.095 0.181 0.113 0.585 1.000 0.512
## cap   0.489 0.239 0.361 0.309 0.345 0.280 0.408 0.512 1.000

plot_matrix(cor2B) #visualize the cor matrix

3.2 Measurement component

It appears that so far our assumptions about the latent structure of these data are relatively accurate. However, we will want to formally test our hypotheses using factor analysis. This will be an important first step before we examine any causal paths among our constructs.

Let’s run a CFA of our data, testing this 3-factor structure.

ach.model1<-'

#measurement model
VIS=~vis+cub+shape
VERB=~para+sent+word
SPEED=~add+count+cap
'

ach.cfa1<-sem(ach.model1,sample.cov=cor2B,sample.nobs=145,std.lv=T) #std.lv=T fixes factor var to 1 (UVI)

summary(ach.cfa1)

## lavaan (0.5-23.1097) converged normally after  20 iterations
## 
##   Number of observations                           145
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic               52.983
##   Degrees of freedom                                24
##   P-value (Chi-square)                           0.001
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   VIS =~                                              
##     vis               0.670    0.090    7.417    0.000
##     cub               0.511    0.092    5.569    0.000
##     shape             0.682    0.090    7.540    0.000
##   VERB =~                                             
##     para              0.864    0.070   12.391    0.000
##     sent              0.827    0.071   11.650    0.000
##     word              0.823    0.071   11.565    0.000
##   SPEED =~                                            
##     add               0.659    0.085    7.790    0.000
##     count             0.794    0.084    9.464    0.000
##     cap               0.679    0.084    8.035    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   VIS ~~                                              
##     VERB              0.543    0.086    6.348    0.000
##     SPEED             0.511    0.096    5.322    0.000
##   VERB ~~                                             
##     SPEED             0.320    0.093    3.462    0.001
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .vis               0.544    0.096    5.662    0.000
##    .cub               0.732    0.100    7.326    0.000
##    .shape             0.529    0.096    5.485    0.000
##    .para              0.246    0.051    4.837    0.000
##    .sent              0.309    0.053    5.806    0.000
##    .word              0.316    0.054    5.904    0.000
##    .add               0.559    0.086    6.470    0.000
##    .count             0.363    0.088    4.110    0.000
##    .cap               0.532    0.086    6.205    0.000
##     VIS               1.000                           
##     VERB              1.000                           
##     SPEED             1.000

semPaths(ach.cfa1)

(For pedagogical purposes, let’s compare this model with a model using ULI, instead of UVI:)

ach.model1b<-'

#measurement model
VIS=~vis+cub+shape
VERB=~para+sent+word
SPEED=~add+count+cap
'

ach.cfa1b<-sem(ach.model1b,sample.cov=cor2B,sample.nobs=145,std.lv=F) #std.lv=F requires that the first indicator of each factor is fixed to 1 (UVI)

summary(ach.cfa1b)

## lavaan (0.5-23.1097) converged normally after  28 iterations
## 
##   Number of observations                           145
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic               52.983
##   Degrees of freedom                                24
##   P-value (Chi-square)                           0.001
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   VIS =~                                              
##     vis               1.000                           
##     cub               0.763    0.163    4.681    0.000
##     shape             1.017    0.186    5.462    0.000
##   VERB =~                                             
##     para              1.000                           
##     sent              0.957    0.084   11.461    0.000
##     word              0.952    0.084   11.400    0.000
##   SPEED =~                                            
##     add               1.000                           
##     count             1.205    0.187    6.448    0.000
##     cap               1.029    0.164    6.284    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   VIS ~~                                              
##     VERB              0.315    0.076    4.145    0.000
##     SPEED             0.226    0.064    3.557    0.000
##   VERB ~~                                             
##     SPEED             0.183    0.063    2.898    0.004
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .vis               0.544    0.096    5.662    0.000
##    .cub               0.732    0.100    7.326    0.000
##    .shape             0.529    0.096    5.485    0.000
##    .para              0.246    0.051    4.837    0.000
##    .sent              0.309    0.053    5.806    0.000
##    .word              0.316    0.054    5.904    0.000
##    .add               0.559    0.086    6.470    0.000
##    .count             0.363    0.088    4.110    0.000
##    .cap               0.532    0.086    6.205    0.000
##     VIS               0.449    0.121    3.709    0.000
##     VERB              0.747    0.121    6.195    0.000
##     SPEED             0.435    0.112    3.895    0.000

(We see that the model has identical fit.)

It appears that our model does not adequately capture our data. However, examining the standardized loadings of our indicators suggest that each is doing a pretty good job of reflecting the underlying achievement constructs:

inspect(ach.cfa1,"std")

## $lambda
##         VIS  VERB SPEED
## vis   0.673 0.000 0.000
## cub   0.513 0.000 0.000
## shape 0.684 0.000 0.000
## para  0.000 0.867 0.000
## sent  0.000 0.830 0.000
## word  0.000 0.826 0.000
## add   0.000 0.000 0.662
## count 0.000 0.000 0.797
## cap   0.000 0.000 0.681
## 
## $theta
##       vis   cub   shape para  sent  word  add   count cap  
## vis   0.548                                                
## cub   0.000 0.737                                          
## shape 0.000 0.000 0.532                                    
## para  0.000 0.000 0.000 0.248                              
## sent  0.000 0.000 0.000 0.000 0.311                        
## word  0.000 0.000 0.000 0.000 0.000 0.318                  
## add   0.000 0.000 0.000 0.000 0.000 0.000 0.562            
## count 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.365      
## cap   0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.536
## 
## $psi
##       VIS   VERB  SPEED
## VIS   1.000            
## VERB  0.543 1.000      
## SPEED 0.511 0.320 1.000

Perhaps we have incorrectly assumed simple structure (i.e., no “complex indicators” which load on more than one factor). We can examine the modification indices of our model to see if we should free up any cross-loadings in this case.

mi = modificationIndices(ach.cfa1)
head(mi[order(-mi$mi), ], 10)

	lhs	op	rhs	mi	epc	sepc.lv	sepc.all	sepc.nox
76	add	~~	count	25.13	0.611	0.611	0.615	0.615
30	VIS	=~	cap	24.82	0.564	0.564	0.566	0.566
28	VIS	=~	add	10.94	-0.372	-0.372	-0.374	-0.374
36	VERB	=~	cap	9.95	0.262	0.262	0.263	0.263
35	VERB	=~	count	9.94	-0.274	-0.274	-0.275	-0.275
50	vis	~~	cap	9.14	0.175	0.175	0.176	0.176
78	count	~~	cap	8.64	-0.378	-0.378	-0.380	-0.380
61	shape	~~	add	4.59	-0.124	-0.124	-0.125	-0.125
51	cub	~~	shape	4.40	0.173	0.173	0.175	0.175
48	vis	~~	add	4.22	-0.120	-0.120	-0.121	-0.121

The first two suggested parameters appear relatively similar in terms of improving model fit. add ~~ count would estimate correlated errors between the addition and dot counting indicators. VIS =~ cap would allow the capitalization recognition indicator to load not only on Processing Speed but also on Visual Information Processing. Given the content of this test and our understanding of these constructs based on theory, we elect to free the second of these two parameters in a subsequent model. This will eliminate the simple structure of our model. (Note, if we were to have freed the correlation between errors of add and count, we would have eliminated the local independence assumption. This will produce off-diagonal elements in the theta-epsilon matrix, making this no longer simply a diagonal matrix.)

ach.model2<-'

#measurement model
VIS=~vis+cub+shape+cap  #adding cap to load on VIS as a complex indicator
VERB=~para+sent+word
SPEED=~add+count+cap
'

ach.cfa2<-sem(ach.model2,sample.cov=cor2B,sample.nobs=145,std.lv=T) #std.lv=T fixes factor var to 1 (UVI)

summary(ach.cfa2)

## lavaan (0.5-23.1097) converged normally after  21 iterations
## 
##   Number of observations                           145
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic               29.209
##   Degrees of freedom                                23
##   P-value (Chi-square)                           0.173
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   VIS =~                                              
##     vis               0.706    0.086    8.186    0.000
##     cub               0.482    0.090    5.344    0.000
##     shape             0.648    0.087    7.456    0.000
##     cap               0.455    0.088    5.171    0.000
##   VERB =~                                             
##     para              0.865    0.070   12.414    0.000
##     sent              0.827    0.071   11.650    0.000
##     word              0.822    0.071   11.554    0.000
##   SPEED =~                                            
##     add               0.679    0.088    7.706    0.000
##     count             0.856    0.091    9.405    0.000
##     cap               0.417    0.088    4.750    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   VIS ~~                                              
##     VERB              0.557    0.081    6.891    0.000
##     SPEED             0.390    0.104    3.740    0.000
##   VERB ~~                                             
##     SPEED             0.223    0.096    2.330    0.020
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .vis               0.495    0.089    5.546    0.000
##    .cub               0.761    0.100    7.644    0.000
##    .shape             0.574    0.090    6.361    0.000
##    .cap               0.464    0.072    6.446    0.000
##    .para              0.245    0.051    4.825    0.000
##    .sent              0.309    0.053    5.820    0.000
##    .word              0.317    0.053    5.929    0.000
##    .add               0.533    0.092    5.778    0.000
##    .count             0.260    0.112    2.320    0.020
##     VIS               1.000                           
##     VERB              1.000                           
##     SPEED             1.000

semPaths(ach.cfa2)

Aha! Our model now provides good fit to the data. Notice also that this model is identified as it satisfies Rule 9.3 for identification of models with complex indicators.

3.3 Adding in the structural component

Now, suppose that we also assume that these three underlying constructs (VIS, VERB, and SPEED) are related causally to each other. Currently we have only estimated nondirectional covariances among the three constructs (done by default in the sem() function). Rather, we might hypothesize that visual processing and processing speed contribute to verbal capacity. Thus, we would also like to include directed structural paths among our constructs in our model. In order to both estimate latent constructs and examine relations between them, we will need a full SEM analysis.

ach.model3<-'

#measurement model
VIS=~vis+cub+shape+cap
VERB=~para+sent+word
SPEED=~add+count+cap

#structural model
VERB~SPEED+VIS
'

ach.cfa3<-sem(ach.model3,sample.cov=cor2B,sample.nobs=145,std.lv=T)

summary(ach.cfa3)

## lavaan (0.5-23.1097) converged normally after  24 iterations
## 
##   Number of observations                           145
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic               29.209
##   Degrees of freedom                                23
##   P-value (Chi-square)                           0.173
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   VIS =~                                              
##     vis               0.706    0.086    8.186    0.000
##     cub               0.482    0.090    5.344    0.000
##     shape             0.648    0.087    7.456    0.000
##     cap               0.455    0.088    5.171    0.000
##   VERB =~                                             
##     para              0.718    0.067   10.700    0.000
##     sent              0.687    0.067   10.235    0.000
##     word              0.683    0.067   10.172    0.000
##   SPEED =~                                            
##     add               0.679    0.088    7.706    0.000
##     count             0.856    0.091    9.405    0.000
##     cap               0.417    0.088    4.750    0.000
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   VERB ~                                              
##     SPEED             0.008    0.129    0.063    0.950
##     VIS               0.668    0.158    4.217    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   VIS ~~                                              
##     SPEED             0.390    0.104    3.740    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .vis               0.495    0.089    5.546    0.000
##    .cub               0.761    0.100    7.644    0.000
##    .shape             0.574    0.090    6.361    0.000
##    .cap               0.464    0.072    6.446    0.000
##    .para              0.245    0.051    4.825    0.000
##    .sent              0.309    0.053    5.820    0.000
##    .word              0.317    0.053    5.929    0.000
##    .add               0.533    0.092    5.778    0.000
##    .count             0.260    0.112    2.320    0.020
##     VIS               1.000                           
##    .VERB              1.000                           
##     SPEED             1.000

semPaths(ach.cfa3)

Notice that our model has identical fit to the previous model. Why is this? Our model output suggests that the covariance between our endogenous variables SPEED and VIS has been estimated, despite us not including this explicitly in the model. This is a result of the sem() function attempting to include all relevant covariances among latent factors in our model. However, the consequence of this is that we (as before) have a “just identified” structural component to our model. We would prefer an overidentified structural component, as this would provide more parsimony. Let’s try forcing lavaan to remove the covariance between SPEED and VIS. (However, we may be concerned about doing so, given that this parameter is highly significant in our output.)

ach.model4<-'

#measurement model
VIS=~vis+cub+shape+cap
VERB=~para+sent+word
SPEED=~add+count+cap

#structural model
VERB~SPEED+VIS

#removing the covariance between SPEED and VIS
VIS~~0*SPEED
'

ach.cfa4<-sem(ach.model4,sample.cov=cor2B,sample.nobs=145,std.lv=T)

summary(ach.cfa4)

## lavaan (0.5-23.1097) converged normally after  20 iterations
## 
##   Number of observations                           145
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic               39.510
##   Degrees of freedom                                24
##   P-value (Chi-square)                           0.024
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   VIS =~                                              
##     vis               0.686    0.088    7.776    0.000
##     cub               0.499    0.091    5.484    0.000
##     shape             0.670    0.088    7.579    0.000
##     cap               0.478    0.077    6.200    0.000
##   VERB =~                                             
##     para              0.721    0.066   10.890    0.000
##     sent              0.691    0.066   10.414    0.000
##     word              0.686    0.066   10.320    0.000
##   SPEED =~                                            
##     add               0.755    0.091    8.281    0.000
##     count             0.769    0.091    8.404    0.000
##     cap               0.493    0.077    6.440    0.000
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   VERB ~                                              
##     SPEED             0.168    0.111    1.524    0.127
##     VIS               0.611    0.135    4.543    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   VIS ~~                                              
##     SPEED             0.000                           
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .vis               0.522    0.093    5.631    0.000
##    .cub               0.744    0.100    7.475    0.000
##    .shape             0.544    0.093    5.872    0.000
##    .cap               0.442    0.073    6.017    0.000
##    .para              0.246    0.051    4.836    0.000
##    .sent              0.307    0.053    5.780    0.000
##    .word              0.318    0.054    5.932    0.000
##    .add               0.422    0.102    4.160    0.000
##    .count             0.402    0.103    3.893    0.000
##     VIS               1.000                           
##    .VERB              1.000                           
##     SPEED             1.000

semPaths(ach.cfa4)

Unfortunately, this has caused a significant reduction in our model fit, which we can confirm statistically with an anova test:

anova(ach.cfa3,ach.cfa4)

	Df	AIC	BIC	Chisq	Chisq diff	Df diff	Pr(>Chisq)
ach.cfa3	23	3267	3333	29.2	NA	NA	NA
ach.cfa4	24	3276	3338	39.5	10.3	1	0.001

In sum, this, in combination with the nonsignificant estimate of VERB regressed onto SPEED, suggests that our initial hypothesis is not supported by the data. Perhaps we can examine an alternative model that may better represent our data. We notice that in the output the VERB~VIS and VIS~~SPEED parameters are both significant. It is possible that visual acuity is a necessary foundational capacity contributing to both processing speed and linguistic functioning. This makes sense, given the need to visually encode information prior to higher-order executive functions that utilize visual input. Let’s test a model in which VIS is a cause of both SPEED and VERB. However, to avoid a just-identified model, we also require the covariance between SPEED and VERB (after being regressed on VIS) to be zero.

ach.model5<-'

#measurement model
VIS=~vis+cub+shape+cap
VERB=~para+sent+word
SPEED=~add+count+cap

#structural model
VERB+SPEED~VIS

#removing the covariance between VERB and SPEED
VERB~~0*SPEED
'

ach.cfa5<-sem(ach.model5,sample.cov=cor2B,sample.nobs=145,std.lv=T)

summary(ach.cfa5)

## lavaan (0.5-23.1097) converged normally after  22 iterations
## 
##   Number of observations                           145
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic               29.212
##   Degrees of freedom                                24
##   P-value (Chi-square)                           0.212
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   VIS =~                                              
##     vis               0.706    0.086    8.210    0.000
##     cub               0.481    0.090    5.341    0.000
##     shape             0.647    0.087    7.460    0.000
##     cap               0.456    0.088    5.187    0.000
##   VERB =~                                             
##     para              0.718    0.067   10.780    0.000
##     sent              0.686    0.067   10.301    0.000
##     word              0.682    0.067   10.239    0.000
##   SPEED =~                                            
##     add               0.623    0.082    7.587    0.000
##     count             0.790    0.092    8.581    0.000
##     cap               0.382    0.083    4.618    0.000
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   VERB ~                                              
##     VIS               0.673    0.139    4.847    0.000
##   SPEED ~                                             
##     VIS               0.425    0.129    3.291    0.001
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##  .VERB ~~                                             
##    .SPEED             0.000                           
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .vis               0.495    0.089    5.579    0.000
##    .cub               0.762    0.100    7.648    0.000
##    .shape             0.574    0.090    6.379    0.000
##    .cap               0.464    0.072    6.452    0.000
##    .para              0.245    0.051    4.822    0.000
##    .sent              0.309    0.053    5.824    0.000
##    .word              0.317    0.053    5.928    0.000
##    .add               0.535    0.092    5.797    0.000
##    .count             0.257    0.113    2.276    0.023
##     VIS               1.000                           
##    .VERB              1.000                           
##    .SPEED             1.000

semPaths(ach.cfa5)

This model provides good fit to the data, suggesting the potential general causal influence of visual processing on both processing speed and verbal capacity.

3.4 Wrench in the works

It is worth noting (and complicates matters considerably) that based on the results of Model 3, we might also have hypothesized that processing speed contributes to visual information processing, which in turn contributes to verbal capacity (SPEED→VIS→VERB). If we were to test this model, we would discover identical model fit. This result points out the important caveat to SEM that there will always be at least one other testable model that provides statistically identical fit to a model of interest. In this case, both a combination of theory and the real-life properties of the data (e.g., longitudinal data) are necessary in order to determine which model may be most appropriate. However, you should always remember that another equally tenable process may be producing the data you have collected.

4 A more complicated example

Here, Sadler & Woody (2003) test an interpersonal theory of dyadic interaction: Person M’s ‘behavior’ when interacting with person F (i.e., M’s situational behavior) is influenced by (i) M’s long-term tendency to elicit the behavior (i.e., M’s behavioral trait), and (ii) F’s situational behavior. The same is assumed true for F.

The four variables, trait and behavior for each of M and F, are represented as latent variables that can be measured in multiple ways. “Reciprocal influence” is conceptualized as M’s situational behavior influencing F’s situational behavior, and vice versa. In this study, M and F were strangers, so M’s trait (which is unknown to F) couldn’t affect F’s situational behavior. However, if M and F were friends, such links would be expected to be significant and should be included in the model. Trait is measured by self-report and a friend’s report; and situational behavior is rated by self, partner and an observer. Rater biases are introduced through the use of covariance arrows.

4.1 Variable coding:

Female or male (M/F)
Situational or trait (S/T)
Observer, partner, friend, or self rating (O/I/F/S)
Rating (R)

4.2 Load in data

low.dom = '
1
.415 1
.460 .351 1
-.321 -.374 -.310 1
-.239 -.221 -.133 .626 1
-.185 -.164 -.272 .533 .345 1
.349 .307 .496 -.180 -.081 -.067 1
.308 .302 .562 -.222 -.156 -.118 .573 1
.038 -.115 -.048 .296 .167 .320 .008 -.036 1
-.072 -.160 -.124 .317 .167 .248 -.152 -.175 .414 1 
'
labels1 = c("FSOR", "FSIR", "FSSR", "MSOR", "MSIR", "MSSR", "FTFR", "FTSR", "MTSR", "MTFR")
dom.cov = getCov(low.dom, names = labels1)
plot_matrix(dom.cov)

Notice that since these covariance units are standardized (i.e., correlation matrix), they are all similar to each other (between -1 and 1) and therefore we see a lot more circles on the grid than in our first dataset, which was unstandardized and had a much wider range.

4.3 Unconstrained paths

Here, we have latent variables that are measured by observed variables. Specifically, female trait dominance (FTD) is measured by friends and self trait ratings (FTSR and FTFT). Similarly, female situational dominance (FSD) is measured by self, informant (partner), and observers’ ratings. The same is true for males. Thus, we’ll have to define these latent variables in the model, and then see how these latent variables are predicted by other latent variables!

We want to estimate the paths A and B (the effect of the individuals’ traits on their situational behavior) and C and D (the effects of individuals’ situational behavior on each other). Let’s first allow these paths to be unconstrained (i.e., we can get a different estimate for each path).

dyad.model1 = '
  # Latent variable definitions for trait, FTD & MTD, and situation, FSD & MSD  
    FTD =~ FTSR + FTFR
    MTD =~ MTSR + MTFR
    FSD =~ FSSR + FSIR + FSOR
    MSD =~ MSSR + MSIR + MSOR
    
    # regressions
    FSD ~ FTD + MSD
    MSD ~ MTD + FSD
     
    # variances and covariances
    # Residual correls among M ratings, MTSR, MSSR, FSIR as we assume that ratings will be similar within men
    MTSR ~~ FSIR
  MSSR ~~ FSIR
    MSSR ~~ MTSR
    
    # Residual correls among F ratings, FTSR, FSSR, MSIR as we assume that ratings will be similar within women
    FTSR ~~ MSIR
  FSSR ~~ MSIR
    FSSR ~~ FTSR
    
    # Residual correls among Observer situational ratings, MSOR, FSOR   
    MSOR ~~ FSOR    
'

dyad.fit1 = sem(dyad.model1, fixed.x = F, sample.cov = dom.cov, sample.nobs = 112)
summary(dyad.fit1) #good p value

## lavaan (0.5-23.1097) converged normally after  41 iterations
## 
##   Number of observations                           112
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic               17.306
##   Degrees of freedom                                23
##   P-value (Chi-square)                           0.794
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   FTD =~                                              
##     FTSR              1.000                           
##     FTFR              1.120    0.231    4.858    0.000
##   MTD =~                                              
##     MTSR              1.000                           
##     MTFR              1.129    0.382    2.952    0.003
##   FSD =~                                              
##     FSSR              1.000                           
##     FSIR              0.833    0.167    4.984    0.000
##     FSOR              0.928    0.175    5.297    0.000
##   MSD =~                                              
##     MSSR              1.000                           
##     MSIR              1.127    0.207    5.435    0.000
##     MSOR              1.678    0.302    5.563    0.000
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   FSD ~                                               
##     FTD               0.696    0.136    5.116    0.000
##     MSD              -0.283    0.160   -1.769    0.077
##   MSD ~                                               
##     MTD               0.418    0.152    2.741    0.006
##     FSD              -0.231    0.115   -2.008    0.045
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##  .MTSR ~~                                             
##    .FSIR             -0.044    0.073   -0.599    0.549
##  .FSIR ~~                                             
##    .MSSR              0.058    0.069    0.838    0.402
##  .MTSR ~~                                             
##    .MSSR              0.130    0.073    1.785    0.074
##  .FTSR ~~                                             
##    .MSIR             -0.028    0.059   -0.474    0.635
##  .FSSR ~~                                             
##    .MSIR              0.074    0.061    1.213    0.225
##  .FTSR ~~                                             
##    .FSSR              0.147    0.074    1.985    0.047
##  .FSOR ~~                                             
##    .MSOR              0.008    0.057    0.144    0.886
##   FTD ~~                                              
##     MTD              -0.073    0.062   -1.187    0.235
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .FTSR              0.485    0.111    4.380    0.000
##    .FTFR              0.375    0.124    3.025    0.002
##    .MTSR              0.629    0.142    4.422    0.000
##    .MTFR              0.545    0.163    3.342    0.001
##    .FSSR              0.498    0.098    5.070    0.000
##    .FSIR              0.661    0.105    6.291    0.000
##    .FSOR              0.571    0.101    5.665    0.000
##    .MSSR              0.677    0.099    6.842    0.000
##    .MSIR              0.571    0.091    6.280    0.000
##    .MSOR              0.088    0.106    0.837    0.403
##     FTD               0.491    0.143    3.419    0.001
##     MTD               0.350    0.152    2.310    0.021
##    .FSD               0.158    0.073    2.167    0.030
##    .MSD               0.189    0.065    2.920    0.004

semPaths(dyad.fit1, what='std', 
         node.label.cex=5,
         edge.label.cex=1.25, curvePivot = TRUE, 
         fade=FALSE)

## Warning in qgraph(Edgelist, labels = nLab, bidirectional = Bidir, directed
## = Directed, : The following arguments are not documented and likely not
## arguments of qgraph and thus ignored: node.label.cex

This model is ok, but let’s see if we can get a simpler model that works! Let’s try taking out most of the covariance terms. Notice that rather than building up to a more complicated better model as in the previous example, we are trimming down to a more parsimonious better model.

dyad.model2 = '
  # Latent variable definitions for trait, FTD & MTD, and situation, FSD & MSD  
    FTD =~ FTSR + FTFR
    MTD =~ MTSR + MTFR
    FSD =~ FSSR + FSIR + FSOR
    MSD =~ MSSR + MSIR + MSOR
    
    # regressions
    FSD ~ FTD + MSD
    MSD ~ MTD + FSD
     
    # Residual correls among M ratings, MTSR, MSSR
    MSSR ~~ MTSR
    
    # Residual correls among F ratings, FTSR, FSSR
    FSSR ~~ FTSR
'
dyad.fit2 = sem(dyad.model2, fixed.x = F, sample.cov = dom.cov, sample.nobs = 112) 
summary(dyad.fit2) # Wow, even better !

## lavaan (0.5-23.1097) converged normally after  38 iterations
## 
##   Number of observations                           112
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic               20.970
##   Degrees of freedom                                28
##   P-value (Chi-square)                           0.827
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   FTD =~                                              
##     FTSR              1.000                           
##     FTFR              1.122    0.231    4.862    0.000
##   MTD =~                                              
##     MTSR              1.000                           
##     MTFR              1.110    0.370    2.996    0.003
##   FSD =~                                              
##     FSSR              1.000                           
##     FSIR              0.818    0.167    4.899    0.000
##     FSOR              0.916    0.172    5.321    0.000
##   MSD =~                                              
##     MSSR              1.000                           
##     MSIR              1.170    0.220    5.308    0.000
##     MSOR              1.804    0.338    5.334    0.000
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   FSD ~                                               
##     FTD               0.713    0.137    5.194    0.000
##     MSD              -0.281    0.158   -1.778    0.075
##   MSD ~                                               
##     MTD               0.392    0.145    2.711    0.007
##     FSD              -0.209    0.107   -1.949    0.051
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##  .MTSR ~~                                             
##    .MSSR              0.133    0.073    1.823    0.068
##  .FTSR ~~                                             
##    .FSSR              0.145    0.075    1.937    0.053
##   FTD ~~                                              
##     MTD              -0.074    0.062   -1.196    0.232
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .FTSR              0.486    0.110    4.401    0.000
##    .FTFR              0.375    0.124    3.029    0.002
##    .MTSR              0.624    0.142    4.380    0.000
##    .MTFR              0.550    0.160    3.445    0.001
##    .FSSR              0.498    0.098    5.073    0.000
##    .FSIR              0.663    0.105    6.316    0.000
##    .FSOR              0.580    0.100    5.782    0.000
##    .MSSR              0.697    0.100    6.978    0.000
##    .MSIR              0.590    0.092    6.384    0.000
##    .MSOR              0.036    0.113    0.319    0.750
##     FTD               0.489    0.143    3.415    0.001
##     MTD               0.358    0.153    2.339    0.019
##    .FSD               0.156    0.072    2.168    0.030
##    .MSD               0.179    0.061    2.918    0.004

semPaths(dyad.fit2, what='std', 
         node.label.cex=5,
         edge.label.cex=1.25, curvePivot = TRUE, 
         fade=FALSE)

## Warning in qgraph(Edgelist, labels = nLab, bidirectional = Bidir, directed
## = Directed, : The following arguments are not documented and likely not
## arguments of qgraph and thus ignored: node.label.cex

anova(dyad.fit1,dyad.fit2)

	Df	AIC	BIC	Chisq	Chisq diff	Df diff	Pr(>Chisq)
dyad.fit1	23	2925	3012	17.3	NA	NA	NA
dyad.fit2	28	2919	2992	21.0	3.66	5	0.599

This simpler model is definitely the better one! We have more dfs (fewer parameters estimated), a lower AIC, and then \(\chi^2\) isn’t significantly worse!

4.4 Constrained paths

Let’s see if the effects of one’s partner’s situational dominance on one’s own situational dominance doesn’t differ depending on which partner you are. We want to constrain the paths in which M’s situational dominance (MSD) predicts F’s situational domainance (FSD) and vice versa to be equal. We have 1 less parameter to estimate if we constrain them to be the same, so it would be more parsimonious if we can have a constrained model! Notice that the estimates of these parameters in the previous models we have run are nearly identical, given us confidence that this approach may be well-founded.

Using the * operator below “names” the regression coefficient. This is useful mainly because if we give two coefficients the same name, the function automatically constrains them to be the same.

dyad.model3 = '
  # Latent variable definitions for trait, FTD & MTD, and situation, FSD & MSD  
    FTD =~ FTSR + FTFR
    MTD =~ MTSR + MTFR
    FSD =~ FSSR + FSIR + FSOR
    MSD =~ MSSR + MSIR + MSOR
    
    # regressions
    FSD ~ FTD + a*MSD  #constraining paths to be equal
    MSD ~ MTD + a*FSD  #constraining paths to be equal
     
    # Residual correls among M ratings, MTSR, MSSR
    MSSR ~~ MTSR
    
    # Residual correls among F ratings, FTSR, FSSR
    FSSR ~~ FTSR
'

dyad.fit3 = sem(dyad.model3, fixed.x = T, sample.cov = dom.cov, sample.nobs = 112) 
summary(dyad.fit3)

## lavaan (0.5-23.1097) converged normally after  35 iterations
## 
##   Number of observations                           112
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic               21.055
##   Degrees of freedom                                29
##   P-value (Chi-square)                           0.857
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   FTD =~                                              
##     FTSR              1.000                           
##     FTFR              1.118    0.229    4.881    0.000
##   MTD =~                                              
##     MTSR              1.000                           
##     MTFR              1.091    0.365    2.991    0.003
##   FSD =~                                              
##     FSSR              1.000                           
##     FSIR              0.823    0.167    4.918    0.000
##     FSOR              0.925    0.172    5.366    0.000
##   MSD =~                                              
##     MSSR              1.000                           
##     MSIR              1.155    0.211    5.484    0.000
##     MSOR              1.775    0.316    5.625    0.000
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   FSD ~                                               
##     FTD               0.714    0.138    5.186    0.000
##     MSD        (a)   -0.237    0.061   -3.874    0.000
##   MSD ~                                               
##     MTD               0.390    0.144    2.717    0.007
##     FSD        (a)   -0.237    0.061   -3.874    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##  .MTSR ~~                                             
##    .MSSR              0.132    0.073    1.819    0.069
##  .FTSR ~~                                             
##    .FSSR              0.148    0.074    1.985    0.047
##   FTD ~~                                              
##     MTD              -0.073    0.062   -1.167    0.243
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .FTSR              0.485    0.110    4.412    0.000
##    .FTFR              0.380    0.123    3.088    0.002
##    .MTSR              0.617    0.144    4.283    0.000
##    .MTFR              0.556    0.159    3.502    0.000
##    .FSSR              0.499    0.098    5.081    0.000
##    .FSIR              0.663    0.105    6.305    0.000
##    .FSOR              0.576    0.100    5.742    0.000
##    .MSSR              0.695    0.100    6.959    0.000
##    .MSIR              0.588    0.092    6.390    0.000
##    .MSOR              0.040    0.111    0.358    0.720
##     FTD               0.489    0.143    3.429    0.001
##     MTD               0.366    0.156    2.352    0.019
##    .FSD               0.161    0.072    2.237    0.025
##    .MSD               0.182    0.061    2.996    0.003

anova(dyad.fit1,dyad.fit2,dyad.fit3)

	Df	AIC	BIC	Chisq	Chisq diff	Df diff	Pr(>Chisq)
dyad.fit1	23	2925	3012	17.3	NA	NA	NA
dyad.fit2	28	2919	2992	21.0	3.663	5	0.599
dyad.fit3	29	2917	2988	21.1	0.085	1	0.771

It appears that constraining these parameters to be equal improved our model fit. So they probably aren’t actually different.

4.5 Wrapping up

semPaths(dyad.fit3, what='std', 
         node.label.cex=5,
         edge.label.cex=1.25, curvePivot = TRUE, 
         fade=FALSE)

## Warning in qgraph(Edgelist, labels = nLab, bidirectional = Bidir, directed
## = Directed, : The following arguments are not documented and likely not
## arguments of qgraph and thus ignored: node.label.cex

In the end, then, we see that trait dominance predicts situational dominance for both men and women, although this relationship appears stronger in women than men.

We also see that there is no significant association between men and women’s trait dominance, but there is a small-to-moderate significant negative relationship between men and women’s situational dominance, suggesting that when one partner is more dominant, the other acts less so (consistent with interpersonal theory of complementarity).

Our model trimming approach suggests that men and women report on each others’ level of situational dominance in different (uncorrelated) ways, suggesting that informant reports and self-reports of the same person’s behavior may be unrelated in this context.

Finally, we have relaxed the local independence assumption in terms of self-reported trait and situational assessments of dominance, suggesting that beyond the effect of the latent dominance constructs themselves, these variables are related purely based on shared variance due to the self-report assessment method.

Full SEM Estimation (adapted from Stanford Psych 253)

PSY 597 Week 7

Benjamin Johnson; Michael Hallquist

5 Oct 2017