The goal of this exercise is for you to become more familiar with fitting basic structural equation models using the lavaan
R package. We will start first with a very simple dataset consisting of SAT and ACT scores collected in 700 individuals. To learn additional details, see: ?psych::sat.act
.
Here is a quick description of the data:
## n mean sd median min max skew kurtosis
## sex* 700 1.65 0.48 2 1 2 -0.61 -1.62
## education 700 3.16 1.43 3 0 5 -0.68 -0.07
## age 700 25.59 9.50 22 13 65 1.64 2.42
## ACT 700 28.55 4.82 29 3 36 -0.66 0.53
## SATV 700 612.23 112.90 620 200 800 -0.64 0.33
## SATQ 687 610.22 115.64 620 200 800 -0.59 -0.02
As you can see, there are SAT verbal scores (SATV), quantitative scores (SATQ), and overall ACT scores for 700 people (a bit of missingness on SATQ).
What if we believe that scores on the three achievement tests reflect something like an overall measure of academic aptitude/achievement? How would you fit a one-factor CFA to these data?
#one-factor CFA here
mstring <- 'aptitude =~ ACT + SATV + SATQ'
mone <- sem(mstring, data=sat.act)
summary(mone)
## lavaan 0.6-4 ended normally after 35 iterations
##
## Optimization method NLMINB
## Number of free parameters 6
##
## Used Total
## Number of observations 687 700
##
## Estimator ML
## Model Fit Test Statistic 0.000
## Degrees of freedom 0
##
## Parameter Estimates:
##
## Information Expected
## Information saturated (h1) model Structured
## Standard Errors Standard
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|)
## aptitude =~
## ACT 1.000
## SATV 25.736 1.504 17.110 0.000
## SATQ 27.517 1.606 17.132 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .ACT 11.402 0.812 14.041 0.000
## .SATV 4933.587 440.574 11.198 0.000
## .SATQ 4340.880 464.670 9.342 0.000
## aptitude 11.902 1.218 9.772 0.000
#How can you print out the standardized factor loadings here?
standardizedsolution(mone)
## lhs op rhs est.std se z pvalue ci.lower ci.upper
## 1 aptitude =~ ACT 0.715 0.025 29.14 0 0.667 0.763
## 2 aptitude =~ SATV 0.784 0.023 34.37 0 0.740 0.829
## 3 aptitude =~ SATQ 0.822 0.022 37.17 0 0.778 0.865
## 4 ACT ~~ ACT 0.489 0.035 13.96 0 0.421 0.558
## 5 SATV ~~ SATV 0.385 0.036 10.76 0 0.315 0.455
## 6 SATQ ~~ SATQ 0.325 0.036 8.95 0 0.254 0.396
## 7 aptitude ~~ aptitude 1.000 0.000 NA NA 1.000 1.000
How well does the model fit according to global indices including CFA, SRMR, RMSEA, and \(\chi^2\)? If you know SEM: is the model overidentified, underidentified, or just-identified? How does this qualify your interpretation of fit?
#Print out and evaluate global model fit
fitmeasures(mone)
## npar fmin chisq
## 6.00e+00 0.00e+00 0.00e+00
## df pvalue baseline.chisq
## 0.00e+00 NA 7.21e+02
## baseline.df baseline.pvalue cfi
## 3.00e+00 0.00e+00 1.00e+00
## tli nnfi rfi
## 1.00e+00 1.00e+00 1.00e+00
## nfi pnfi ifi
## 1.00e+00 0.00e+00 1.00e+00
## rni logl unrestricted.logl
## 1.00e+00 -1.02e+04 -1.02e+04
## aic bic ntotal
## 2.03e+04 2.04e+04 6.87e+02
## bic2 rmsea rmsea.ci.lower
## 2.03e+04 0.00e+00 0.00e+00
## rmsea.ci.upper rmsea.pvalue rmr
## 0.00e+00 NA 0.00e+00
## rmr_nomean srmr srmr_bentler
## 0.00e+00 0.00e+00 0.00e+00
## srmr_bentler_nomean crmr crmr_nomean
## 0.00e+00 0.00e+00 0.00e+00
## srmr_mplus srmr_mplus_nomean cn_05
## 0.00e+00 0.00e+00 NA
## cn_01 gfi agfi
## NA 1.00e+00 1.00e+00
## pgfi mfi ecvi
## 0.00e+00 1.00e+00 1.70e-02
Assuming that SAT and ACT scores were recorded first (i.e., SAT and ACT came before post-secondary education), does higher academic achievement (our latent factor) predict higher ultimate educational level?
mstring <- 'aptitude =~ ACT + SATV + SATQ
education ~ aptitude'
mtwo <- sem(mstring, data=sat.act)
## Warning in lav_data_full(data = data, group = group, cluster = cluster, :
## lavaan WARNING: some observed variances are (at least) a factor 1000 times
## larger than others; use varTable(fit) to investigate
summary(mtwo)
## lavaan 0.6-4 ended normally after 37 iterations
##
## Optimization method NLMINB
## Number of free parameters 8
##
## Used Total
## Number of observations 687 700
##
## Estimator ML
## Model Fit Test Statistic 17.876
## Degrees of freedom 2
## P-value (Chi-square) 0.000
##
## Parameter Estimates:
##
## Information Expected
## Information saturated (h1) model Structured
## Standard Errors Standard
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|)
## aptitude =~
## ACT 1.000
## SATV 25.581 1.490 17.171 0.000
## SATQ 27.268 1.584 17.210 0.000
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## education ~
## aptitude 0.037 0.017 2.126 0.033
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .ACT 11.269 0.809 13.926 0.000
## .SATV 4941.176 439.177 11.251 0.000
## .SATQ 4404.278 462.161 9.530 0.000
## .education 2.010 0.109 18.502 0.000
## aptitude 12.035 1.223 9.838 0.000
Do the lavaan model warnings worry you?
How strong is the association (think correlation or standardized regression coefficient)?
What about missing data? How is it being handled, and could you do better?
#modifications and details here
Are there significant mean differences in achievement scores between men and women? If so, interpret the direction of the effect.
#recode to make the direction and name clearer in the output
sat.act$female <- as.numeric(sat.act$sex=="female")
mstring <- 'aptitude =~ ACT + SATV + SATQ
aptitude ~ female'
mtwo <- sem(mstring, data=sat.act)
summary(mtwo)
## lavaan 0.6-4 ended normally after 40 iterations
##
## Optimization method NLMINB
## Number of free parameters 7
##
## Used Total
## Number of observations 687 700
##
## Estimator ML
## Model Fit Test Statistic 21.993
## Degrees of freedom 2
## P-value (Chi-square) 0.000
##
## Parameter Estimates:
##
## Information Expected
## Information saturated (h1) model Structured
## Standard Errors Standard
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|)
## aptitude =~
## ACT 1.000
## SATV 25.611 1.501 17.061 0.000
## SATQ 27.956 1.635 17.097 0.000
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## aptitude ~
## female -0.814 0.301 -2.705 0.007
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .ACT 11.510 0.814 14.145 0.000
## .SATV 5080.366 439.970 11.547 0.000
## .SATQ 4134.721 466.480 8.864 0.000
## .aptitude 11.643 1.199 9.714 0.000
If you like, examine whether the association between achievement scores and ultimate educational attainment differs between men and women. Note that this is best done as a multiple-groups SEM using the group=
argument in lavaan
(specifically, when calling the sem
function). Also note that you may wish to control which parameters are free to differ between sexes and which must be equal using group.equal
. See ?lavOptions
for details. Also examine the modification indices to see if model fit could be improved.
mstring <- 'aptitude =~ ACT + SATV + SATQ
education ~ c(mb, fb)*aptitude'
mtwo <- sem(mstring, data=sat.act, group="female", group.equal=c("loadings"))
## Warning in lav_data_full(data = data, group = group, cluster = cluster, :
## lavaan WARNING: some observed variances are (at least) a factor 1000 times
## larger than others; use varTable(fit) to investigate
summary(mtwo)
## lavaan 0.6-4 ended normally after 66 iterations
##
## Optimization method NLMINB
## Number of free parameters 24
## Number of equality constraints 2
## Row rank of the constraints matrix 2
##
## Used Total
## Number of observations per group
## 1 442 453
## 0 245 247
##
## Estimator ML
## Model Fit Test Statistic 20.794
## Degrees of freedom 6
## P-value (Chi-square) 0.002
##
## Chi-square for each group:
##
## 1 12.783
## 0 8.011
##
## Parameter Estimates:
##
## Information Expected
## Information saturated (h1) model Structured
## Standard Errors Standard
##
##
## Group 1 [1]:
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|)
## aptitude =~
## ACT 1.000
## SATV (.p2.) 25.747 1.488 17.308 0.000
## SATQ (.p3.) 27.089 1.562 17.339 0.000
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## education ~
## aptitude (mb) 0.040 0.021 1.898 0.058
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|)
## .ACT 28.400 0.225 126.269 0.000
## .SATV 610.658 5.379 113.534 0.000
## .SATQ 595.995 5.333 111.761 0.000
## .education 3.265 0.064 50.890 0.000
## aptitude 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .ACT 11.000 0.966 11.382 0.000
## .SATV 5256.358 534.625 9.832 0.000
## .SATQ 4233.311 523.009 8.094 0.000
## .education 1.801 0.121 14.832 0.000
## aptitude 11.360 1.282 8.859 0.000
##
##
## Group 2 [0]:
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|)
## aptitude =~
## ACT 1.000
## SATV (.p2.) 25.747 1.488 17.308 0.000
## SATQ (.p3.) 27.089 1.562 17.339 0.000
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## education ~
## aptitude (fb) 0.040 0.029 1.358 0.174
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|)
## .ACT 28.820 0.319 90.485 0.000
## .SATV 615.359 7.253 84.842 0.000
## .SATQ 635.873 7.488 84.923 0.000
## .education 3.004 0.098 30.640 0.000
## aptitude 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .ACT 11.788 1.324 8.903 0.000
## .SATV 4226.206 624.947 6.763 0.000
## .SATQ 4146.679 661.092 6.272 0.000
## .education 2.334 0.211 11.050 0.000
## aptitude 13.067 1.748 7.474 0.000
Next, let’s look at a more detailed dataset that allows for more complex modeling of latent structure. We’ll just be dealing with measurement models here – that is, variants of CFA. This is the iqitems
dataset from the psych
package, which has questions from different subscales of an IQ test.
describe(iqitems) %>% select(-vars, -trimmed, -mad, -se, -range)
## n mean sd median min max skew kurtosis
## reason.4 1523 3.39 1.24 4 0 6 -1.28 1.25
## reason.16 1524 3.39 1.14 4 0 6 -1.61 1.70
## reason.17 1523 3.80 1.34 4 0 6 -1.17 2.08
## reason.19 1523 4.79 1.84 6 0 6 -1.28 0.24
## letter.7 1524 4.95 1.70 6 0 6 -1.72 1.98
## letter.33 1523 2.79 1.25 3 0 6 -0.06 0.64
## letter.34 1523 3.38 1.30 4 0 6 -1.15 0.65
## letter.58 1525 3.26 1.52 4 0 6 -0.71 -0.61
## matrix.45 1523 4.14 1.36 5 0 6 -1.44 1.71
## matrix.46 1524 2.47 1.36 2 0 6 0.97 0.56
## matrix.47 1523 2.59 1.37 2 0 6 1.01 0.63
## matrix.55 1524 3.69 1.56 4 0 6 -0.31 -0.30
## rotate.3 1523 4.65 2.16 4 0 8 -0.04 -0.64
## rotate.4 1523 4.77 2.47 4 0 8 -0.18 -1.32
## rotate.6 1523 4.40 2.56 5 0 8 -0.26 -1.26
## rotate.8 1524 4.63 2.40 4 0 8 -0.14 -1.25
Note that the names of the items denote the respective IQ subscale from which they came. Thus, we are interested in whether we can first corroborate that there are four components of intelligence: mental rotation, matrix reasoning, letter sequences, and basic verbal reasoning. There are four items from each domain.
ggcorrplot(lavCor(iqitems))
First, fit the four-factor model based on the expected constructs, printing out the standardized loading in the summary
.
mstring <- '
reasoning =~ reason.4 + reason.16 + reason.17 + reason.19
letter =~ letter.7 + letter.33 + letter.34 + letter.58
matrix =~ matrix.45 + matrix.46 + matrix.47 + matrix.55
rotate =~ rotate.3 + rotate.4 + rotate.6 + rotate.8
'
mfour <- sem(mstring, data=iqitems, estimator="mlr")
## Warning in lav_object_post_check(object): lavaan WARNING: covariance matrix of latent variables
## is not positive definite;
## use lavInspect(fit, "cov.lv") to investigate.
summary(mfour, standardized=TRUE, fit.measures=TRUE)
## lavaan 0.6-4 ended normally after 53 iterations
##
## Optimization method NLMINB
## Number of free parameters 38
##
## Used Total
## Number of observations 1523 1525
##
## Estimator ML Robust
## Model Fit Test Statistic 675.711 557.729
## Degrees of freedom 98 98
## P-value (Chi-square) 0.000 0.000
## Scaling correction factor 1.212
## for the Yuan-Bentler correction (Mplus variant)
##
## Model test baseline model:
##
## Minimum Function Test Statistic 3958.154 3101.905
## Degrees of freedom 120 120
## P-value 0.000 0.000
##
## User model versus baseline model:
##
## Comparative Fit Index (CFI) 0.849 0.846
## Tucker-Lewis Index (TLI) 0.816 0.811
##
## Robust Comparative Fit Index (CFI) 0.854
## Robust Tucker-Lewis Index (TLI) 0.821
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -44411.981 -44411.981
## Scaling correction factor 1.407
## for the MLR correction
## Loglikelihood unrestricted model (H1) -44074.126 -44074.126
## Scaling correction factor 1.266
## for the MLR correction
##
## Number of free parameters 38 38
## Akaike (AIC) 88899.963 88899.963
## Bayesian (BIC) 89102.443 89102.443
## Sample-size adjusted Bayesian (BIC) 88981.727 88981.727
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.062 0.055
## 90 Percent Confidence Interval 0.058 0.067 0.051 0.060
## P-value RMSEA <= 0.05 0.000 0.013
##
## Robust RMSEA 0.061
## 90 Percent Confidence Interval 0.056 0.066
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.055 0.055
##
## Parameter Estimates:
##
## Information Observed
## Observed information based on Hessian
## Standard Errors Robust.huber.white
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## reasoning =~
## reason.4 1.000 0.674 0.546
## reason.16 1.055 0.068 15.496 0.000 0.711 0.622
## reason.17 0.885 0.075 11.807 0.000 0.597 0.445
## reason.19 1.581 0.106 14.878 0.000 1.066 0.579
## letter =~
## letter.7 1.000 1.068 0.630
## letter.33 0.459 0.039 11.694 0.000 0.490 0.393
## letter.34 0.677 0.048 14.179 0.000 0.723 0.558
## letter.58 0.593 0.048 12.328 0.000 0.633 0.417
## matrix =~
## matrix.45 1.000 0.860 0.635
## matrix.46 0.226 0.073 3.108 0.002 0.194 0.143
## matrix.47 0.307 0.070 4.389 0.000 0.264 0.192
## matrix.55 0.718 0.077 9.275 0.000 0.617 0.397
## rotate =~
## rotate.3 1.000 0.978 0.452
## rotate.4 0.637 0.097 6.585 0.000 0.624 0.252
## rotate.6 1.400 0.253 5.528 0.000 1.370 0.535
## rotate.8 1.695 0.304 5.573 0.000 1.658 0.691
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## reasoning ~~
## letter 0.729 0.075 9.738 0.000 1.013 1.013
## matrix 0.524 0.059 8.871 0.000 0.904 0.904
## rotate 0.326 0.084 3.889 0.000 0.494 0.494
## letter ~~
## matrix 0.798 0.086 9.258 0.000 0.870 0.870
## rotate 0.485 0.121 4.012 0.000 0.465 0.465
## matrix ~~
## rotate 0.484 0.124 3.912 0.000 0.576 0.576
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .reason.4 1.072 0.062 17.429 0.000 1.072 0.702
## .reason.16 0.803 0.049 16.448 0.000 0.803 0.614
## .reason.17 1.439 0.078 18.433 0.000 1.439 0.802
## .reason.19 2.251 0.107 21.019 0.000 2.251 0.665
## .letter.7 1.735 0.123 14.121 0.000 1.735 0.603
## .letter.33 1.315 0.062 21.057 0.000 1.315 0.846
## .letter.34 1.154 0.064 18.007 0.000 1.154 0.689
## .letter.58 1.906 0.072 26.296 0.000 1.906 0.826
## .matrix.45 1.096 0.086 12.705 0.000 1.096 0.597
## .matrix.46 1.813 0.075 24.339 0.000 1.813 0.980
## .matrix.47 1.814 0.076 23.813 0.000 1.814 0.963
## .matrix.55 2.044 0.081 25.268 0.000 2.044 0.843
## .rotate.3 3.720 0.217 17.114 0.000 3.720 0.795
## .rotate.4 5.714 0.186 30.763 0.000 5.714 0.936
## .rotate.6 4.672 0.278 16.797 0.000 4.672 0.713
## .rotate.8 3.009 0.357 8.431 0.000 3.009 0.522
## reasoning 0.455 0.059 7.674 0.000 1.000 1.000
## letter 1.140 0.136 8.384 0.000 1.000 1.000
## matrix 0.739 0.104 7.128 0.000 1.000 1.000
## rotate 0.957 0.257 3.722 0.000 1.000 1.000
Does the warning provided by lavaan give you reason for concern? If so, what’s the diagnosis? (Hint: use a variant of inspect
aka lavInspect
.)
inspect(mfour, "cor.lv")
## resnng letter matrix rotate
## reasoning 1.000
## letter 1.013 1.000
## matrix 0.904 0.870 1.000
## rotate 0.494 0.465 0.576 1.000
How would you interpret the quality of global fit based on fit indices?
Based on the pattern of observed correlations and the results of the initial four-factor model, conceptualize and fit a simpler model with fewer factors.
mstring <- '
reasoning =~ reason.4 + reason.16 + reason.17 + reason.19 +
letter.7 + letter.33 + letter.34 + letter.58
matrix =~ matrix.45 + matrix.46 + matrix.47 + matrix.55
rotate =~ rotate.3 + rotate.4 + rotate.6 + rotate.8
'
mthree <- sem(mstring, data=iqitems, estimator="mlr")
summary(mthree, standardized=TRUE, fit.measures=TRUE)
## lavaan 0.6-4 ended normally after 58 iterations
##
## Optimization method NLMINB
## Number of free parameters 35
##
## Used Total
## Number of observations 1523 1525
##
## Estimator ML Robust
## Model Fit Test Statistic 677.456 556.552
## Degrees of freedom 101 101
## P-value (Chi-square) 0.000 0.000
## Scaling correction factor 1.217
## for the Yuan-Bentler correction (Mplus variant)
##
## Model test baseline model:
##
## Minimum Function Test Statistic 3958.154 3101.905
## Degrees of freedom 120 120
## P-value 0.000 0.000
##
## User model versus baseline model:
##
## Comparative Fit Index (CFI) 0.850 0.847
## Tucker-Lewis Index (TLI) 0.822 0.818
##
## Robust Comparative Fit Index (CFI) 0.854
## Robust Tucker-Lewis Index (TLI) 0.827
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -44412.854 -44412.854
## Scaling correction factor 1.407
## for the MLR correction
## Loglikelihood unrestricted model (H1) -44074.126 -44074.126
## Scaling correction factor 1.266
## for the MLR correction
##
## Number of free parameters 35 35
## Akaike (AIC) 88895.707 88895.707
## Bayesian (BIC) 89082.202 89082.202
## Sample-size adjusted Bayesian (BIC) 88971.016 88971.016
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.061 0.054
## 90 Percent Confidence Interval 0.057 0.066 0.050 0.058
## P-value RMSEA <= 0.05 0.000 0.034
##
## Robust RMSEA 0.060
## 90 Percent Confidence Interval 0.055 0.065
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.055 0.055
##
## Parameter Estimates:
##
## Information Observed
## Observed information based on Hessian
## Standard Errors Robust.huber.white
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## reasoning =~
## reason.4 1.000 0.680 0.550
## reason.16 1.056 0.069 15.396 0.000 0.718 0.627
## reason.17 0.881 0.075 11.745 0.000 0.599 0.447
## reason.19 1.580 0.107 14.764 0.000 1.074 0.583
## letter.7 1.566 0.103 15.132 0.000 1.064 0.628
## letter.33 0.724 0.059 12.213 0.000 0.492 0.394
## letter.34 1.061 0.076 13.915 0.000 0.721 0.557
## letter.58 0.933 0.078 11.952 0.000 0.634 0.418
## matrix =~
## matrix.45 1.000 0.860 0.635
## matrix.46 0.225 0.073 3.085 0.002 0.193 0.142
## matrix.47 0.308 0.070 4.415 0.000 0.265 0.193
## matrix.55 0.717 0.077 9.265 0.000 0.617 0.396
## rotate =~
## rotate.3 1.000 0.976 0.451
## rotate.4 0.637 0.097 6.577 0.000 0.621 0.252
## rotate.6 1.403 0.252 5.557 0.000 1.369 0.535
## rotate.8 1.703 0.304 5.593 0.000 1.662 0.693
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## reasoning ~~
## matrix 0.517 0.058 8.957 0.000 0.885 0.885
## rotate 0.317 0.082 3.892 0.000 0.479 0.479
## matrix ~~
## rotate 0.483 0.123 3.911 0.000 0.575 0.575
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .reason.4 1.065 0.061 17.396 0.000 1.065 0.697
## .reason.16 0.794 0.048 16.649 0.000 0.794 0.606
## .reason.17 1.436 0.078 18.415 0.000 1.436 0.800
## .reason.19 2.234 0.106 21.090 0.000 2.234 0.660
## .letter.7 1.742 0.118 14.794 0.000 1.742 0.606
## .letter.33 1.313 0.062 21.134 0.000 1.313 0.844
## .letter.34 1.156 0.063 18.417 0.000 1.156 0.690
## .letter.58 1.904 0.072 26.545 0.000 1.904 0.826
## .matrix.45 1.096 0.086 12.687 0.000 1.096 0.597
## .matrix.46 1.814 0.074 24.361 0.000 1.814 0.980
## .matrix.47 1.813 0.076 23.801 0.000 1.813 0.963
## .matrix.55 2.044 0.081 25.274 0.000 2.044 0.843
## .rotate.3 3.724 0.216 17.257 0.000 3.724 0.796
## .rotate.4 5.717 0.184 31.008 0.000 5.717 0.937
## .rotate.6 4.674 0.276 16.950 0.000 4.674 0.714
## .rotate.8 2.995 0.356 8.415 0.000 2.995 0.520
## reasoning 0.462 0.060 7.748 0.000 1.000 1.000
## matrix 0.739 0.104 7.128 0.000 1.000 1.000
## rotate 0.953 0.256 3.727 0.000 1.000 1.000
Does this model yield a better fit? Hint: you can use the anova()
function to compare alternative factor models using a likelihood ratio test.
anova(mfour, mthree)
## Scaled Chi Square Difference Test (method = "satorra.bentler.2001")
##
## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
## mfour 98 88900 89102 676
## mthree 101 88896 89082 677 1.24 3 0.74
Look at the residual correlations of the model. What correlations are being poorly estimated? Does this inform your assessment of how to improve the model?
resid(mthree, "cor")
## $type
## [1] "cor.bollen"
##
## $cov
## resn.4 rsn.16 rsn.17 rsn.19 lttr.7 ltt.33 ltt.34 ltt.58 mtr.45
## reason.4 0.000
## reason.16 0.017 0.000
## reason.17 -0.021 -0.016 0.000
## reason.19 -0.007 -0.009 -0.001 0.000
## letter.7 -0.006 0.025 -0.004 0.008 0.000
## letter.33 -0.017 -0.037 0.032 0.003 -0.026 0.000
## letter.34 0.006 0.014 -0.009 0.009 0.032 -0.049 0.000
## letter.58 -0.003 0.008 0.000 -0.032 -0.012 0.116 -0.038 0.000
## matrix.45 0.028 -0.005 0.024 0.034 0.012 -0.009 0.018 0.018 0.000
## matrix.46 -0.004 -0.056 0.011 -0.048 -0.096 0.058 -0.078 -0.007 -0.030
## matrix.47 -0.032 -0.033 -0.009 -0.040 -0.060 0.081 -0.013 0.003 -0.038
## matrix.55 -0.021 -0.018 0.025 -0.001 -0.021 0.007 -0.021 -0.041 -0.002
## rotate.3 -0.007 0.008 0.065 0.001 -0.036 0.070 -0.027 0.024 -0.021
## rotate.4 0.008 0.031 0.114 -0.012 0.030 0.052 -0.002 -0.006 0.015
## rotate.6 0.062 0.031 -0.023 0.057 -0.009 0.043 0.000 0.041 -0.033
## rotate.8 -0.021 -0.027 -0.016 -0.027 -0.061 0.057 -0.027 0.024 -0.063
## mtr.46 mtr.47 mtr.55 rott.3 rott.4 rott.6 rott.8
## reason.4
## reason.16
## reason.17
## reason.19
## letter.7
## letter.33
## letter.34
## letter.58
## matrix.45
## matrix.46 0.000
## matrix.47 0.195 0.000
## matrix.55 0.078 0.031 0.000
## rotate.3 0.149 0.107 0.046 0.000
## rotate.4 0.138 0.143 0.095 0.298 0.000
## rotate.6 0.028 0.046 -0.009 -0.082 -0.173 0.000
## rotate.8 0.092 0.068 0.032 -0.024 -0.059 0.063 0.000
ggcorrplot(resid(mthree, "cor")$cov, type="lower")
Rotate 3 and 4 have a very high residual correlation!
Do the modification indices suggest any plausible parameters that are omitted from the model? Alternatively, do the modification indices suggest problems with particular items?
modificationindices(mthree, minimum.value=20)
## lhs op rhs mi epc sepc.lv sepc.all sepc.nox
## 39 reasoning =~ matrix.45 70.0 4.369 2.969 2.192 2.192
## 40 reasoning =~ matrix.46 38.9 -1.577 -1.072 -0.788 -0.788
## 67 rotate =~ matrix.45 44.2 -0.635 -0.619 -0.457 -0.457
## 68 rotate =~ matrix.46 36.5 0.381 0.372 0.273 0.273
## 69 rotate =~ matrix.47 26.8 0.330 0.322 0.234 0.234
## 137 letter.33 ~~ letter.58 32.8 0.245 0.245 0.155 0.155
## 170 matrix.46 ~~ matrix.47 63.0 0.373 0.373 0.206 0.206
## 172 matrix.46 ~~ rotate.3 25.3 0.353 0.353 0.136 0.136
## 173 matrix.46 ~~ rotate.4 20.6 0.380 0.380 0.118 0.118
## 178 matrix.47 ~~ rotate.4 23.4 0.406 0.406 0.126 0.126
## 185 rotate.3 ~~ rotate.4 224.0 1.964 1.964 0.426 0.426
## 186 rotate.3 ~~ rotate.6 35.1 -0.888 -0.888 -0.213 -0.213
## 188 rotate.4 ~~ rotate.6 94.3 -1.515 -1.515 -0.293 -0.293
## 189 rotate.4 ~~ rotate.8 25.1 -0.801 -0.801 -0.194 -0.194
## 190 rotate.6 ~~ rotate.8 109.4 2.593 2.593 0.693 0.693
Can you improve the model further? Consider:
In principle, there may be a general factor of intelligence that explains the correlations among the subscales here. Fit a hierarchical factor model in which ‘g’ is a superordinate factor that explains the lower-order factors. Does this model fit significantly worse than a model in which all lower-order factors simply correlate, but there is no ‘g’?
mstring <- '
reasoning =~ reason.4 + reason.16 + reason.17 + reason.19
letter =~ letter.7 + letter.33 + letter.34 + letter.58
matrix =~ matrix.45 + matrix.46 + matrix.47 + matrix.55
rotate =~ rotate.3 + rotate.4 + rotate.6 + rotate.8
g =~ reasoning + letter + matrix + rotate
'
mhier <- sem(mstring, data=iqitems, estimator="mlr")
## Warning in lav_object_post_check(object): lavaan WARNING: some estimated lv
## variances are negative
summary(mhier, standardized=TRUE, fit.measures=TRUE)
## lavaan 0.6-4 ended normally after 50 iterations
##
## Optimization method NLMINB
## Number of free parameters 36
##
## Used Total
## Number of observations 1523 1525
##
## Estimator ML Robust
## Model Fit Test Statistic 690.268 567.665
## Degrees of freedom 100 100
## P-value (Chi-square) 0.000 0.000
## Scaling correction factor 1.216
## for the Yuan-Bentler correction (Mplus variant)
##
## Model test baseline model:
##
## Minimum Function Test Statistic 3958.154 3101.905
## Degrees of freedom 120 120
## P-value 0.000 0.000
##
## User model versus baseline model:
##
## Comparative Fit Index (CFI) 0.846 0.843
## Tucker-Lewis Index (TLI) 0.815 0.812
##
## Robust Comparative Fit Index (CFI) 0.851
## Robust Tucker-Lewis Index (TLI) 0.821
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -44419.260 -44419.260
## Scaling correction factor 1.405
## for the MLR correction
## Loglikelihood unrestricted model (H1) -44074.126 -44074.126
## Scaling correction factor 1.266
## for the MLR correction
##
## Number of free parameters 36 36
## Akaike (AIC) 88910.520 88910.520
## Bayesian (BIC) 89102.343 89102.343
## Sample-size adjusted Bayesian (BIC) 88987.981 88987.981
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.062 0.055
## 90 Percent Confidence Interval 0.058 0.067 0.051 0.059
## P-value RMSEA <= 0.05 0.000 0.013
##
## Robust RMSEA 0.061
## 90 Percent Confidence Interval 0.056 0.066
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.057 0.057
##
## Parameter Estimates:
##
## Information Observed
## Observed information based on Hessian
## Standard Errors Robust.huber.white
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## reasoning =~
## reason.4 1.000 0.675 0.546
## reason.16 1.052 0.068 15.496 0.000 0.710 0.620
## reason.17 0.884 0.075 11.803 0.000 0.597 0.445
## reason.19 1.581 0.106 14.865 0.000 1.067 0.580
## letter =~
## letter.7 1.000 1.064 0.627
## letter.33 0.464 0.039 11.766 0.000 0.493 0.395
## letter.34 0.678 0.048 14.164 0.000 0.721 0.557
## letter.58 0.598 0.048 12.349 0.000 0.636 0.419
## matrix =~
## matrix.45 1.000 0.891 0.658
## matrix.46 0.186 0.064 2.905 0.004 0.166 0.122
## matrix.47 0.267 0.061 4.354 0.000 0.238 0.173
## matrix.55 0.680 0.072 9.404 0.000 0.606 0.389
## rotate =~
## rotate.3 1.000 0.954 0.441
## rotate.4 0.616 0.093 6.614 0.000 0.588 0.238
## rotate.6 1.466 0.239 6.146 0.000 1.398 0.546
## rotate.8 1.753 0.285 6.150 0.000 1.672 0.697
## g =~
## reasoning 1.000 1.019 1.019
## letter 1.523 0.108 14.087 0.000 0.985 0.985
## matrix 1.149 0.088 13.046 0.000 0.887 0.887
## rotate 0.693 0.127 5.455 0.000 0.500 0.500
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .reason.4 1.071 0.061 17.439 0.000 1.071 0.702
## .reason.16 0.805 0.049 16.456 0.000 0.805 0.615
## .reason.17 1.439 0.078 18.445 0.000 1.439 0.802
## .reason.19 2.248 0.107 20.993 0.000 2.248 0.664
## .letter.7 1.744 0.123 14.180 0.000 1.744 0.606
## .letter.33 1.312 0.062 21.043 0.000 1.312 0.844
## .letter.34 1.156 0.064 17.985 0.000 1.156 0.690
## .letter.58 1.902 0.073 26.228 0.000 1.902 0.825
## .matrix.45 1.041 0.084 12.354 0.000 1.041 0.567
## .matrix.46 1.824 0.075 24.443 0.000 1.824 0.985
## .matrix.47 1.827 0.076 24.020 0.000 1.827 0.970
## .matrix.55 2.058 0.080 25.752 0.000 2.058 0.849
## .rotate.3 3.768 0.194 19.404 0.000 3.768 0.805
## .rotate.4 5.758 0.163 35.406 0.000 5.758 0.943
## .rotate.6 4.593 0.255 18.039 0.000 4.593 0.701
## .rotate.8 2.964 0.322 9.194 0.000 2.964 0.515
## .reasoning -0.018 0.022 -0.801 0.423 -0.039 -0.039
## .letter 0.034 0.064 0.528 0.598 0.030 0.030
## .matrix 0.170 0.071 2.390 0.017 0.214 0.214
## .rotate 0.683 0.151 4.535 0.000 0.750 0.750
## g 0.473 0.062 7.642 0.000 1.000 1.000
anova(mfour, mhier)
## Scaled Chi Square Difference Test (method = "satorra.bentler.2001")
##
## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
## mfour 98 88900 89102 676
## mhier 100 88911 89102 690 10.2 2 0.0062 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Noting that the four-factor model is bad.) BIC is indifferent between the alternatives, the LRT says model fit is significantly worse for hierarchical (but we have a lot of data, so a lot of power to reject the null). AIC is 11 points lower for the four-factor model. Altogether, this is equivocal support for a four-factor model. We could debate which one is ‘better,’ but there is not a clear winner.