1 Simulate data
2 Simple model
3 Under the hood: parameter table
4 Parameter estimates
5 Model-implied covariance
6 Sample covariance
7 Model modification
- 7.1 Equivalent models
- 7.2 Making the remaining covariance trivial
8 Ill conditioning
9 Indirect effects
- 9.1 parTable for indirect effects
- 9.2 Full details for the model with indirect effects

1 Simulate data

mu <- rep(0,3)
Sigma <- tribble(
  ~a, ~b, ~c,
  1, 0.2, 0.3,
  0.2, 1, 0.7,
  0.3, 0.7, 1
)

abc = data.frame(MASS::mvrnorm(1000, mu=mu, Sigma=Sigma, empirical=TRUE))
colnames(abc) = LETTERS[1:3]
psych::describe(abc)

	vars	n	sd	median	trimmed	mad	min	max	range	skew	kurtosis	se
A	1	1000	1	-0.022	-0.017	1.081	-2.82	3.52	6.34	0.190	-0.071	0.032
B	2	1000	1	0.029	0.010	1.056	-2.96	2.81	5.78	-0.071	-0.296	0.032
C	3	1000	1	0.023	0.003	0.969	-3.50	3.61	7.11	-0.002	0.035	0.032

cov(abc)

##     A   B   C
## A 1.0 0.2 0.3
## B 0.2 1.0 0.7
## C 0.3 0.7 1.0

2 Simple model

syntax <- '
B ~ A
C ~ B
'

m <- sem(syntax, data=abc, meanstructure=FALSE)
summary(m)

## lavaan 0.6-3 ended normally after 11 iterations
## 
##   Optimization method                           NLMINB
##   Number of free parameters                          4
## 
##   Number of observations                          1000
## 
##   Estimator                                         ML
##   Model Fit Test Statistic                      53.704
##   Degrees of freedom                                 1
##   P-value (Chi-square)                           0.000
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Information saturated (h1) model          Structured
##   Standard Errors                             Standard
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   B ~                                                 
##     A                 0.200    0.031    6.455    0.000
##   C ~                                                 
##     B                 0.700    0.023   30.997    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .B                 0.959    0.043   22.361    0.000
##    .C                 0.509    0.023   22.361    0.000

semPaths_default(m, what="std", edge.label.cex=1.5)

Note that the residual variance estimates are also estimates of \(1 - R^2\) because we have standardized variables.

3 Under the hood: parameter table

parTable(m)

id	lhs	op	rhs	user	block	group	free	ustart	exo	plabel	start	est	se
1	B	~	A	1	1	1	1	NA	0	.p1.	0.000	0.200	0.031
2	C	~	B	1	1	1	2	NA	0	.p2.	0.000	0.700	0.023
3	B	~~	B	0	1	1	3	NA	0	.p3.	0.499	0.959	0.043
4	C	~~	C	0	1	1	4	NA	0	.p4.	0.499	0.509	0.023
5	A	~~	A	0	1	1	0	NA	1	.p5.	0.999	0.999	0.000

4 Parameter estimates

parameterestimates(m)

lhs	op	rhs	est	se	z	pvalue	ci.lower	ci.upper
B	~	A	0.200	0.031	6.46	0	0.139	0.261
C	~	B	0.700	0.023	31.00	0	0.656	0.744
B	~~	B	0.959	0.043	22.36	0	0.875	1.043
C	~~	C	0.509	0.023	22.36	0	0.465	0.554
A	~~	A	0.999	0.000	NA	NA	0.999	0.999

Note that the variance of A is exogenous, but estimated. More specifically, it will be set at the sample estimate, meaning that the variance of A is not part of the likelihood function and does not contribute to any fit statistics.

5 Model-implied covariance

Use fitted() to obtain the model-implied covariance matrix, \(\boldsymbol{\Sigma}(\hat{\boldsymbol{\theta}})\):

fitted(m)

## $cov
##   B     C     A    
## B 0.999            
## C 0.699 0.999      
## A 0.200 0.140 0.999

Notice how low the covariance is between A and C compared to our groundtruth value of 0.7. That is, the association between A and C is not fully explained by the indirect association of A with C via B. Also note that the matrix order is not necessarily the same as the sample covariance matrix. This is just a trivial rearrangement of the row/column order, not something meaningful.

6 Sample covariance

Use cov() to obtain the sample covariance matrix, \(\mathbf{S_{XX}}\):

cov(abc)

##     A   B   C
## A 1.0 0.2 0.3
## B 0.2 1.0 0.7
## C 0.3 0.7 1.0

7 Model modification

We know from our simulation that there is a relationship between A and C that is not entirely attributable to the indirect effect of A on B via B. Thus, we have an omitted path problem in which a key covariance term is not captured by our model.

We will cover modification indices in more detail soon, but for now, know that the modificationIndices function in lavaan provides estimates of how much the model fit (\(\chi^2\)) would fit if a specific parameter were added to the model.

modificationIndices(m)

	lhs	op	rhs	mi	epc	sepc.lv	sepc.all	sepc.nox
6	B	~~	C	52.3	-0.799	-0.799	-1.143	-1.143
7	B	~	C	52.3	-1.569	-1.569	-1.569	-1.569
8	C	~	A	52.3	0.167	0.167	0.167	0.167
10	A	~	C	52.3	0.327	0.327	0.327	0.327

7.1 Equivalent models

We will return to the issue of models that fit the data equally well, but it’s important to note that in this case, there is only one degree of freedom, and four equally good possibilities for capturing the relationship between A and C. This highlights the importance of having clear theoretical predictions about which variables are correlated versus which are likely to be (weak) causes or effects. As in regression, one can reverse predictor versus criterion with minimal change in syntax, but the choice to do so should be theoretically motivated. Likewise, in SEM, if one does now have clear predictions about the relationship between two variables, sometimes an undirected covariance between them is the appropriately weak parameterization.

syn2 <- '
B ~ A
C ~ B
A ~ C
'

m2 <- sem(syn2, abc, meanstructure=FALSE)
summary(m2)

## lavaan 0.6-3 ended normally after 16 iterations
## 
##   Optimization method                           NLMINB
##   Number of free parameters                          6
## 
##   Number of observations                          1000
## 
##   Estimator                                         ML
##   Model Fit Test Statistic                       0.000
##   Degrees of freedom                                 0
##   Minimum Function Value               0.0000000000000
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Information saturated (h1) model          Structured
##   Standard Errors                             Standard
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   B ~                                                 
##     A                -0.022    0.047   -0.460    0.646
##   C ~                                                 
##     B                 0.703    0.024   29.427    0.000
##   A ~                                                 
##     C                 0.314    0.043    7.357    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .B                 1.008    0.050   20.277    0.000
##    .C                 0.510    0.023   22.360    0.000
##    .A                 0.909    0.041   22.351    0.000

Note that we now have a just-identified model (0 degrees of freedom) because we have 3 variances and 3 covariances in \(\mathbf{S_{XX}}\) and we have 6 free parameters estimated here.

7.2 Making the remaining covariance trivial

We could also conceive of a situation in which the model that lacks an \(A-C\) relationship is nevertheless reasonably good.

mu <- rep(0,3)
Sigma <- tribble(
  ~a, ~b, ~c,
  1, 0.2, 0.16,
  0.2, 1, 0.7,
  0.16, 0.7, 1
)

abc = data.frame(MASS::mvrnorm(1000, mu=mu, Sigma=Sigma, empirical=TRUE))
colnames(abc) = LETTERS[1:3]
psych::describe(abc)

	vars	n	sd	median	trimmed	mad	min	max	range	skew	kurtosis	se
A	1	1000	1	0.020	0.010	1.026	-3.49	2.99	6.48	-0.110	0.037	0.032
B	2	1000	1	-0.019	0.000	0.969	-2.88	3.12	6.00	0.032	0.023	0.032
C	3	1000	1	-0.025	-0.007	0.987	-3.02	2.92	5.94	0.065	-0.127	0.032

cov(abc)

##      A   B    C
## A 1.00 0.2 0.16
## B 0.20 1.0 0.70
## C 0.16 0.7 1.00

syntax <- '
B ~ A
C ~ B
'

m <- sem(syntax, data=abc, meanstructure=FALSE)
summary(m)

## lavaan 0.6-3 ended normally after 12 iterations
## 
##   Optimization method                           NLMINB
##   Number of free parameters                          4
## 
##   Number of observations                          1000
## 
##   Estimator                                         ML
##   Model Fit Test Statistic                       0.817
##   Degrees of freedom                                 1
##   P-value (Chi-square)                           0.366
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Information saturated (h1) model          Structured
##   Standard Errors                             Standard
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   B ~                                                 
##     A                 0.200    0.031    6.455    0.000
##   C ~                                                 
##     B                 0.700    0.023   30.997    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .B                 0.959    0.043   22.361    0.000
##    .C                 0.509    0.023   22.361    0.000

#semPaths(m)
fitted(m)

## $cov
##   B     C     A    
## B 0.999            
## C 0.699 0.999      
## A 0.200 0.140 0.999

We can see that the A-C relationship is 0.14 in the output and 0.16 in the groundtruth. Thus, we are capturing their association by modeling the indirect effect alone.

8 Ill conditioning

sds <- c(1, 10, 100)
abc_rescale <- cor2cov(cov(abc), sds)
print(abc_rescale)

##    A   B     C
## A  1   2    16
## B  2 100   700
## C 16 700 10000

9 Indirect effects

syntax <- '
B ~ b1*A
C ~ b2*B
med := b1*b2
'

m_ind <- sem(syntax, data=abc, meanstructure=FALSE)
summary(m_ind)

## lavaan 0.6-3 ended normally after 12 iterations
## 
##   Optimization method                           NLMINB
##   Number of free parameters                          4
## 
##   Number of observations                          1000
## 
##   Estimator                                         ML
##   Model Fit Test Statistic                       0.817
##   Degrees of freedom                                 1
##   P-value (Chi-square)                           0.366
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Information saturated (h1) model          Structured
##   Standard Errors                             Standard
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   B ~                                                 
##     A         (b1)    0.200    0.031    6.455    0.000
##   C ~                                                 
##     B         (b2)    0.700    0.023   30.997    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .B                 0.959    0.043   22.361    0.000
##    .C                 0.509    0.023   22.361    0.000
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     med               0.140    0.022    6.319    0.000

#semPaths(m_ind)

9.1 parTable for indirect effects

parTable(m_ind)

id	lhs	op	rhs	user	block	group	free	ustart	exo	label	plabel	start	est	se
1	B	~	A	1	1	1	1	NA	0	b1	.p1.	0.000	0.200	0.031
2	C	~	B	1	1	1	2	NA	0	b2	.p2.	0.000	0.700	0.023
3	B	~~	B	0	1	1	3	NA	0		.p3.	0.499	0.959	0.043
4	C	~~	C	0	1	1	4	NA	0		.p4.	0.500	0.509	0.023
5	A	~~	A	0	1	1	0	NA	1		.p5.	0.999	0.999	0.000
6	med	:=	b1*b2	1	0	0	0	NA	0	med		0.000	0.140	0.022

Notice how there is an additional parameter estimated, but it is not free because it is a simple product of b1 and b2.

9.2 Full details for the model with indirect effects

Use inspect()

inspect(m_ind)

## $lambda
##   B C A
## B 0 0 0
## C 0 0 0
## A 0 0 0
## 
## $theta
##   B C A
## B 0    
## C 0 0  
## A 0 0 0
## 
## $psi
##   B C A
## B 3    
## C 0 4  
## A 0 0 0
## 
## $beta
##   B C A
## B 0 0 1
## C 2 0 0
## A 0 0 0

Path Analysis Demo

PSY 597 Week 3

Michael Hallquist

22 Jan 2019