mu <- rep(0,3)
Sigma <- tribble(
~a, ~b, ~c,
1, 0.2, 0.3,
0.2, 1, 0.7,
0.3, 0.7, 1
)
abc = data.frame(MASS::mvrnorm(1000, mu=mu, Sigma=Sigma, empirical=TRUE))
colnames(abc) = LETTERS[1:3]
psych::describe(abc)
vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
A | 1 | 1000 | 0 | 1 | -0.022 | -0.017 | 1.081 | -2.82 | 3.52 | 6.34 | 0.190 | -0.071 | 0.032 |
B | 2 | 1000 | 0 | 1 | 0.029 | 0.010 | 1.056 | -2.96 | 2.81 | 5.78 | -0.071 | -0.296 | 0.032 |
C | 3 | 1000 | 0 | 1 | 0.023 | 0.003 | 0.969 | -3.50 | 3.61 | 7.11 | -0.002 | 0.035 | 0.032 |
cov(abc)
## A B C
## A 1.0 0.2 0.3
## B 0.2 1.0 0.7
## C 0.3 0.7 1.0
syntax <- '
B ~ A
C ~ B
'
m <- sem(syntax, data=abc, meanstructure=FALSE)
summary(m)
## lavaan 0.6-3 ended normally after 11 iterations
##
## Optimization method NLMINB
## Number of free parameters 4
##
## Number of observations 1000
##
## Estimator ML
## Model Fit Test Statistic 53.704
## Degrees of freedom 1
## P-value (Chi-square) 0.000
##
## Parameter Estimates:
##
## Information Expected
## Information saturated (h1) model Structured
## Standard Errors Standard
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## B ~
## A 0.200 0.031 6.455 0.000
## C ~
## B 0.700 0.023 30.997 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .B 0.959 0.043 22.361 0.000
## .C 0.509 0.023 22.361 0.000
semPaths_default(m, what="std", edge.label.cex=1.5)
Note that the residual variance estimates are also estimates of \(1 - R^2\) because we have standardized variables.
parTable(m)
id | lhs | op | rhs | user | block | group | free | ustart | exo | label | plabel | start | est | se |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | B | ~ | A | 1 | 1 | 1 | 1 | NA | 0 | .p1. | 0.000 | 0.200 | 0.031 | |
2 | C | ~ | B | 1 | 1 | 1 | 2 | NA | 0 | .p2. | 0.000 | 0.700 | 0.023 | |
3 | B | ~~ | B | 0 | 1 | 1 | 3 | NA | 0 | .p3. | 0.499 | 0.959 | 0.043 | |
4 | C | ~~ | C | 0 | 1 | 1 | 4 | NA | 0 | .p4. | 0.499 | 0.509 | 0.023 | |
5 | A | ~~ | A | 0 | 1 | 1 | 0 | NA | 1 | .p5. | 0.999 | 0.999 | 0.000 |
parameterestimates(m)
lhs | op | rhs | est | se | z | pvalue | ci.lower | ci.upper |
---|---|---|---|---|---|---|---|---|
B | ~ | A | 0.200 | 0.031 | 6.46 | 0 | 0.139 | 0.261 |
C | ~ | B | 0.700 | 0.023 | 31.00 | 0 | 0.656 | 0.744 |
B | ~~ | B | 0.959 | 0.043 | 22.36 | 0 | 0.875 | 1.043 |
C | ~~ | C | 0.509 | 0.023 | 22.36 | 0 | 0.465 | 0.554 |
A | ~~ | A | 0.999 | 0.000 | NA | NA | 0.999 | 0.999 |
Note that the variance of A is exogenous, but estimated. More specifically, it will be set at the sample estimate, meaning that the variance of A is not part of the likelihood function and does not contribute to any fit statistics.
Use fitted()
to obtain the model-implied covariance matrix, \(\boldsymbol{\Sigma}(\hat{\boldsymbol{\theta}})\):
fitted(m)
## $cov
## B C A
## B 0.999
## C 0.699 0.999
## A 0.200 0.140 0.999
Notice how low the covariance is between A and C compared to our groundtruth value of 0.7. That is, the association between A and C is not fully explained by the indirect association of A with C via B. Also note that the matrix order is not necessarily the same as the sample covariance matrix. This is just a trivial rearrangement of the row/column order, not something meaningful.
Use cov()
to obtain the sample covariance matrix, \(\mathbf{S_{XX}}\):
cov(abc)
## A B C
## A 1.0 0.2 0.3
## B 0.2 1.0 0.7
## C 0.3 0.7 1.0
We know from our simulation that there is a relationship between A and C that is not entirely attributable to the indirect effect of A on B via B. Thus, we have an omitted path problem in which a key covariance term is not captured by our model.
We will cover modification indices in more detail soon, but for now, know that the modificationIndices
function in lavaan
provides estimates of how much the model fit (\(\chi^2\)) would fit if a specific parameter were added to the model.
modificationIndices(m)
lhs | op | rhs | mi | epc | sepc.lv | sepc.all | sepc.nox | |
---|---|---|---|---|---|---|---|---|
6 | B | ~~ | C | 52.3 | -0.799 | -0.799 | -1.143 | -1.143 |
7 | B | ~ | C | 52.3 | -1.569 | -1.569 | -1.569 | -1.569 |
8 | C | ~ | A | 52.3 | 0.167 | 0.167 | 0.167 | 0.167 |
10 | A | ~ | C | 52.3 | 0.327 | 0.327 | 0.327 | 0.327 |
We will return to the issue of models that fit the data equally well, but it’s important to note that in this case, there is only one degree of freedom, and four equally good possibilities for capturing the relationship between A and C. This highlights the importance of having clear theoretical predictions about which variables are correlated versus which are likely to be (weak) causes or effects. As in regression, one can reverse predictor versus criterion with minimal change in syntax, but the choice to do so should be theoretically motivated. Likewise, in SEM, if one does now have clear predictions about the relationship between two variables, sometimes an undirected covariance between them is the appropriately weak parameterization.
syn2 <- '
B ~ A
C ~ B
A ~ C
'
m2 <- sem(syn2, abc, meanstructure=FALSE)
summary(m2)
## lavaan 0.6-3 ended normally after 16 iterations
##
## Optimization method NLMINB
## Number of free parameters 6
##
## Number of observations 1000
##
## Estimator ML
## Model Fit Test Statistic 0.000
## Degrees of freedom 0
## Minimum Function Value 0.0000000000000
##
## Parameter Estimates:
##
## Information Expected
## Information saturated (h1) model Structured
## Standard Errors Standard
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## B ~
## A -0.022 0.047 -0.460 0.646
## C ~
## B 0.703 0.024 29.427 0.000
## A ~
## C 0.314 0.043 7.357 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .B 1.008 0.050 20.277 0.000
## .C 0.510 0.023 22.360 0.000
## .A 0.909 0.041 22.351 0.000
Note that we now have a just-identified model (0 degrees of freedom) because we have 3 variances and 3 covariances in \(\mathbf{S_{XX}}\) and we have 6 free parameters estimated here.
We could also conceive of a situation in which the model that lacks an \(A-C\) relationship is nevertheless reasonably good.
mu <- rep(0,3)
Sigma <- tribble(
~a, ~b, ~c,
1, 0.2, 0.16,
0.2, 1, 0.7,
0.16, 0.7, 1
)
abc = data.frame(MASS::mvrnorm(1000, mu=mu, Sigma=Sigma, empirical=TRUE))
colnames(abc) = LETTERS[1:3]
psych::describe(abc)
vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
A | 1 | 1000 | 0 | 1 | 0.020 | 0.010 | 1.026 | -3.49 | 2.99 | 6.48 | -0.110 | 0.037 | 0.032 |
B | 2 | 1000 | 0 | 1 | -0.019 | 0.000 | 0.969 | -2.88 | 3.12 | 6.00 | 0.032 | 0.023 | 0.032 |
C | 3 | 1000 | 0 | 1 | -0.025 | -0.007 | 0.987 | -3.02 | 2.92 | 5.94 | 0.065 | -0.127 | 0.032 |
cov(abc)
## A B C
## A 1.00 0.2 0.16
## B 0.20 1.0 0.70
## C 0.16 0.7 1.00
syntax <- '
B ~ A
C ~ B
'
m <- sem(syntax, data=abc, meanstructure=FALSE)
summary(m)
## lavaan 0.6-3 ended normally after 12 iterations
##
## Optimization method NLMINB
## Number of free parameters 4
##
## Number of observations 1000
##
## Estimator ML
## Model Fit Test Statistic 0.817
## Degrees of freedom 1
## P-value (Chi-square) 0.366
##
## Parameter Estimates:
##
## Information Expected
## Information saturated (h1) model Structured
## Standard Errors Standard
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## B ~
## A 0.200 0.031 6.455 0.000
## C ~
## B 0.700 0.023 30.997 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .B 0.959 0.043 22.361 0.000
## .C 0.509 0.023 22.361 0.000
#semPaths(m)
fitted(m)
## $cov
## B C A
## B 0.999
## C 0.699 0.999
## A 0.200 0.140 0.999
We can see that the A-C relationship is 0.14 in the output and 0.16 in the groundtruth. Thus, we are capturing their association by modeling the indirect effect alone.
sds <- c(1, 10, 100)
abc_rescale <- cor2cov(cov(abc), sds)
print(abc_rescale)
## A B C
## A 1 2 16
## B 2 100 700
## C 16 700 10000
syntax <- '
B ~ b1*A
C ~ b2*B
med := b1*b2
'
m_ind <- sem(syntax, data=abc, meanstructure=FALSE)
summary(m_ind)
## lavaan 0.6-3 ended normally after 12 iterations
##
## Optimization method NLMINB
## Number of free parameters 4
##
## Number of observations 1000
##
## Estimator ML
## Model Fit Test Statistic 0.817
## Degrees of freedom 1
## P-value (Chi-square) 0.366
##
## Parameter Estimates:
##
## Information Expected
## Information saturated (h1) model Structured
## Standard Errors Standard
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## B ~
## A (b1) 0.200 0.031 6.455 0.000
## C ~
## B (b2) 0.700 0.023 30.997 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .B 0.959 0.043 22.361 0.000
## .C 0.509 0.023 22.361 0.000
##
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|)
## med 0.140 0.022 6.319 0.000
#semPaths(m_ind)
parTable(m_ind)
id | lhs | op | rhs | user | block | group | free | ustart | exo | label | plabel | start | est | se |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | B | ~ | A | 1 | 1 | 1 | 1 | NA | 0 | b1 | .p1. | 0.000 | 0.200 | 0.031 |
2 | C | ~ | B | 1 | 1 | 1 | 2 | NA | 0 | b2 | .p2. | 0.000 | 0.700 | 0.023 |
3 | B | ~~ | B | 0 | 1 | 1 | 3 | NA | 0 | .p3. | 0.499 | 0.959 | 0.043 | |
4 | C | ~~ | C | 0 | 1 | 1 | 4 | NA | 0 | .p4. | 0.500 | 0.509 | 0.023 | |
5 | A | ~~ | A | 0 | 1 | 1 | 0 | NA | 1 | .p5. | 0.999 | 0.999 | 0.000 | |
6 | med | := | b1*b2 | 1 | 0 | 0 | 0 | NA | 0 | med | 0.000 | 0.140 | 0.022 |
Notice how there is an additional parameter estimated, but it is not free because it is a simple product of b1
and b2
.
Use inspect()
inspect(m_ind)
## $lambda
## B C A
## B 0 0 0
## C 0 0 0
## A 0 0 0
##
## $theta
## B C A
## B 0
## C 0 0
## A 0 0 0
##
## $psi
## B C A
## B 3
## C 0 4
## A 0 0 0
##
## $beta
## B C A
## B 0 0 1
## C 2 0 0
## A 0 0 0