Overview

The goal of this exercise is for you to become more familiar with fitting basic structural equation models using the lavaan R package. We will start first with a very simple dataset consisting of SAT and ACT scores collected in 700 individuals. To learn additional details, see: ?psych::sat.act.

Here is a quick description of the data:

##             n   mean     sd median min max  skew kurtosis
## sex*      700   1.65   0.48      2   1   2 -0.61    -1.62
## education 700   3.16   1.43      3   0   5 -0.68    -0.07
## age       700  25.59   9.50     22  13  65  1.64     2.42
## ACT       700  28.55   4.82     29   3  36 -0.66     0.53
## SATV      700 612.23 112.90    620 200 800 -0.64     0.33
## SATQ      687 610.22 115.64    620 200 800 -0.59    -0.02

As you can see, there are SAT verbal scores (SATV), quantitative scores (SATQ), and overall ACT scores for 700 people (a bit of missingness on SATQ).

Test a single-factor of achievement

What if we believe that scores on the three achievement tests reflect something like an overall measure of academic aptitude/achievement? How would you fit a one-factor CFA to these data?

#one-factor CFA here
mstring <- 'aptitude =~ ACT + SATV + SATQ'
mone <- sem(mstring, data=sat.act)
summary(mone)
## lavaan 0.6-4 ended normally after 35 iterations
## 
##   Optimization method                           NLMINB
##   Number of free parameters                          6
## 
##                                                   Used       Total
##   Number of observations                           687         700
## 
##   Estimator                                         ML
##   Model Fit Test Statistic                       0.000
##   Degrees of freedom                                 0
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Information saturated (h1) model          Structured
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   aptitude =~                                         
##     ACT               1.000                           
##     SATV             25.736    1.504   17.110    0.000
##     SATQ             27.517    1.606   17.132    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .ACT              11.402    0.812   14.041    0.000
##    .SATV           4933.587  440.574   11.198    0.000
##    .SATQ           4340.880  464.670    9.342    0.000
##     aptitude         11.902    1.218    9.772    0.000

Standardized estimates

#How can you print out the standardized factor loadings here?
standardizedsolution(mone)
##        lhs op      rhs est.std    se     z pvalue ci.lower ci.upper
## 1 aptitude =~      ACT   0.715 0.025 29.14      0    0.667    0.763
## 2 aptitude =~     SATV   0.784 0.023 34.37      0    0.740    0.829
## 3 aptitude =~     SATQ   0.822 0.022 37.17      0    0.778    0.865
## 4      ACT ~~      ACT   0.489 0.035 13.96      0    0.421    0.558
## 5     SATV ~~     SATV   0.385 0.036 10.76      0    0.315    0.455
## 6     SATQ ~~     SATQ   0.325 0.036  8.95      0    0.254    0.396
## 7 aptitude ~~ aptitude   1.000 0.000    NA     NA    1.000    1.000

Fit diagnosis

How well does the model fit according to global indices including CFA, SRMR, RMSEA, and \(\chi^2\)? If you know SEM: is the model overidentified, underidentified, or just-identified? How does this qualify your interpretation of fit?

#Print out and evaluate global model fit
fitmeasures(mone)
##                npar                fmin               chisq 
##            6.00e+00            0.00e+00            0.00e+00 
##                  df              pvalue      baseline.chisq 
##            0.00e+00                  NA            7.21e+02 
##         baseline.df     baseline.pvalue                 cfi 
##            3.00e+00            0.00e+00            1.00e+00 
##                 tli                nnfi                 rfi 
##            1.00e+00            1.00e+00            1.00e+00 
##                 nfi                pnfi                 ifi 
##            1.00e+00            0.00e+00            1.00e+00 
##                 rni                logl   unrestricted.logl 
##            1.00e+00           -1.02e+04           -1.02e+04 
##                 aic                 bic              ntotal 
##            2.03e+04            2.04e+04            6.87e+02 
##                bic2               rmsea      rmsea.ci.lower 
##            2.03e+04            0.00e+00            0.00e+00 
##      rmsea.ci.upper        rmsea.pvalue                 rmr 
##            0.00e+00                  NA            0.00e+00 
##          rmr_nomean                srmr        srmr_bentler 
##            0.00e+00            0.00e+00            0.00e+00 
## srmr_bentler_nomean                crmr         crmr_nomean 
##            0.00e+00            0.00e+00            0.00e+00 
##          srmr_mplus   srmr_mplus_nomean               cn_05 
##            0.00e+00            0.00e+00                  NA 
##               cn_01                 gfi                agfi 
##                  NA            1.00e+00            1.00e+00 
##                pgfi                 mfi                ecvi 
##            0.00e+00            1.00e+00            1.70e-02

Predictors of achivement

Assuming that SAT and ACT scores were recorded first (i.e., SAT and ACT came before post-secondary education), does higher academic achievement (our latent factor) predict higher ultimate educational level?

mstring <- 'aptitude =~ ACT + SATV + SATQ
education ~ aptitude'
mtwo <- sem(mstring, data=sat.act)
## Warning in lav_data_full(data = data, group = group, cluster = cluster, :
## lavaan WARNING: some observed variances are (at least) a factor 1000 times
## larger than others; use varTable(fit) to investigate
summary(mtwo)
## lavaan 0.6-4 ended normally after 37 iterations
## 
##   Optimization method                           NLMINB
##   Number of free parameters                          8
## 
##                                                   Used       Total
##   Number of observations                           687         700
## 
##   Estimator                                         ML
##   Model Fit Test Statistic                      17.876
##   Degrees of freedom                                 2
##   P-value (Chi-square)                           0.000
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Information saturated (h1) model          Structured
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   aptitude =~                                         
##     ACT               1.000                           
##     SATV             25.581    1.490   17.171    0.000
##     SATQ             27.268    1.584   17.210    0.000
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   education ~                                         
##     aptitude          0.037    0.017    2.126    0.033
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .ACT              11.269    0.809   13.926    0.000
##    .SATV           4941.176  439.177   11.251    0.000
##    .SATQ           4404.278  462.161    9.530    0.000
##    .education         2.010    0.109   18.502    0.000
##     aptitude         12.035    1.223    9.838    0.000

Additional questions

Do the lavaan model warnings worry you?

How strong is the association (think correlation or standardized regression coefficient)?

What about missing data? How is it being handled, and could you do better?

#modifications and details here

Sex differences in achievement

Are there significant mean differences in achievement scores between men and women? If so, interpret the direction of the effect.

#recode to make the direction and name clearer in the output
sat.act$female <- as.numeric(sat.act$sex=="female")
mstring <- 'aptitude =~ ACT + SATV + SATQ
aptitude ~ female'
mtwo <- sem(mstring, data=sat.act)
summary(mtwo)
## lavaan 0.6-4 ended normally after 40 iterations
## 
##   Optimization method                           NLMINB
##   Number of free parameters                          7
## 
##                                                   Used       Total
##   Number of observations                           687         700
## 
##   Estimator                                         ML
##   Model Fit Test Statistic                      21.993
##   Degrees of freedom                                 2
##   P-value (Chi-square)                           0.000
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Information saturated (h1) model          Structured
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   aptitude =~                                         
##     ACT               1.000                           
##     SATV             25.611    1.501   17.061    0.000
##     SATQ             27.956    1.635   17.097    0.000
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   aptitude ~                                          
##     female           -0.814    0.301   -2.705    0.007
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .ACT              11.510    0.814   14.145    0.000
##    .SATV           5080.366  439.970   11.547    0.000
##    .SATQ           4134.721  466.480    8.864    0.000
##    .aptitude         11.643    1.199    9.714    0.000

Bonus question: moderation by sex

If you like, examine whether the association between achievement scores and ultimate educational attainment differs between men and women. Note that this is best done as a multiple-groups SEM using the group= argument in lavaan (specifically, when calling the sem function). Also note that you may wish to control which parameters are free to differ between sexes and which must be equal using group.equal. See ?lavOptions for details. Also examine the modification indices to see if model fit could be improved.

mstring <- 'aptitude =~ ACT + SATV + SATQ
education ~ c(mb, fb)*aptitude'
mtwo <- sem(mstring, data=sat.act, group="female", group.equal=c("loadings"))
## Warning in lav_data_full(data = data, group = group, cluster = cluster, :
## lavaan WARNING: some observed variances are (at least) a factor 1000 times
## larger than others; use varTable(fit) to investigate
summary(mtwo)
## lavaan 0.6-4 ended normally after 66 iterations
## 
##   Optimization method                           NLMINB
##   Number of free parameters                         24
##   Number of equality constraints                     2
##   Row rank of the constraints matrix                 2
## 
##                                                   Used       Total
##   Number of observations per group         
##   1                                                442         453
##   0                                                245         247
## 
##   Estimator                                         ML
##   Model Fit Test Statistic                      20.794
##   Degrees of freedom                                 6
##   P-value (Chi-square)                           0.002
## 
## Chi-square for each group:
## 
##   1                                             12.783
##   0                                              8.011
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Information saturated (h1) model          Structured
##   Standard Errors                             Standard
## 
## 
## Group 1 [1]:
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   aptitude =~                                         
##     ACT               1.000                           
##     SATV    (.p2.)   25.747    1.488   17.308    0.000
##     SATQ    (.p3.)   27.089    1.562   17.339    0.000
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   education ~                                         
##     aptitude  (mb)    0.040    0.021    1.898    0.058
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .ACT              28.400    0.225  126.269    0.000
##    .SATV            610.658    5.379  113.534    0.000
##    .SATQ            595.995    5.333  111.761    0.000
##    .education         3.265    0.064   50.890    0.000
##     aptitude          0.000                           
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .ACT              11.000    0.966   11.382    0.000
##    .SATV           5256.358  534.625    9.832    0.000
##    .SATQ           4233.311  523.009    8.094    0.000
##    .education         1.801    0.121   14.832    0.000
##     aptitude         11.360    1.282    8.859    0.000
## 
## 
## Group 2 [0]:
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   aptitude =~                                         
##     ACT               1.000                           
##     SATV    (.p2.)   25.747    1.488   17.308    0.000
##     SATQ    (.p3.)   27.089    1.562   17.339    0.000
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   education ~                                         
##     aptitude  (fb)    0.040    0.029    1.358    0.174
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .ACT              28.820    0.319   90.485    0.000
##    .SATV            615.359    7.253   84.842    0.000
##    .SATQ            635.873    7.488   84.923    0.000
##    .education         3.004    0.098   30.640    0.000
##     aptitude          0.000                           
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .ACT              11.788    1.324    8.903    0.000
##    .SATV           4226.206  624.947    6.763    0.000
##    .SATQ           4146.679  661.092    6.272    0.000
##    .education         2.334    0.211   11.050    0.000
##     aptitude         13.067    1.748    7.474    0.000

A more complex model

Next, let’s look at a more detailed dataset that allows for more complex modeling of latent structure. We’ll just be dealing with measurement models here – that is, variants of CFA. This is the iqitems dataset from the psych package, which has questions from different subscales of an IQ test.

describe(iqitems) %>% select(-vars, -trimmed, -mad, -se, -range)
##              n mean   sd median min max  skew kurtosis
## reason.4  1523 3.39 1.24      4   0   6 -1.28     1.25
## reason.16 1524 3.39 1.14      4   0   6 -1.61     1.70
## reason.17 1523 3.80 1.34      4   0   6 -1.17     2.08
## reason.19 1523 4.79 1.84      6   0   6 -1.28     0.24
## letter.7  1524 4.95 1.70      6   0   6 -1.72     1.98
## letter.33 1523 2.79 1.25      3   0   6 -0.06     0.64
## letter.34 1523 3.38 1.30      4   0   6 -1.15     0.65
## letter.58 1525 3.26 1.52      4   0   6 -0.71    -0.61
## matrix.45 1523 4.14 1.36      5   0   6 -1.44     1.71
## matrix.46 1524 2.47 1.36      2   0   6  0.97     0.56
## matrix.47 1523 2.59 1.37      2   0   6  1.01     0.63
## matrix.55 1524 3.69 1.56      4   0   6 -0.31    -0.30
## rotate.3  1523 4.65 2.16      4   0   8 -0.04    -0.64
## rotate.4  1523 4.77 2.47      4   0   8 -0.18    -1.32
## rotate.6  1523 4.40 2.56      5   0   8 -0.26    -1.26
## rotate.8  1524 4.63 2.40      4   0   8 -0.14    -1.25

Note that the names of the items denote the respective IQ subscale from which they came. Thus, we are interested in whether we can first corroborate that there are four components of intelligence: mental rotation, matrix reasoning, letter sequences, and basic verbal reasoning. There are four items from each domain.

ggcorrplot(lavCor(iqitems))

Fit the a priori four-factor CFA

First, fit the four-factor model based on the expected constructs, printing out the standardized loading in the summary.

mstring <- '
reasoning =~ reason.4 + reason.16 + reason.17 + reason.19
letter =~ letter.7 + letter.33 + letter.34 + letter.58
matrix =~ matrix.45 + matrix.46 + matrix.47 + matrix.55
rotate =~ rotate.3 + rotate.4 + rotate.6 + rotate.8
'
mfour <- sem(mstring, data=iqitems, estimator="mlr")
## Warning in lav_object_post_check(object): lavaan WARNING: covariance matrix of latent variables
##                 is not positive definite;
##                 use lavInspect(fit, "cov.lv") to investigate.
summary(mfour, standardized=TRUE, fit.measures=TRUE)
## lavaan 0.6-4 ended normally after 53 iterations
## 
##   Optimization method                           NLMINB
##   Number of free parameters                         38
## 
##                                                   Used       Total
##   Number of observations                          1523        1525
## 
##   Estimator                                         ML      Robust
##   Model Fit Test Statistic                     675.711     557.729
##   Degrees of freedom                                98          98
##   P-value (Chi-square)                           0.000       0.000
##   Scaling correction factor                                  1.212
##     for the Yuan-Bentler correction (Mplus variant)
## 
## Model test baseline model:
## 
##   Minimum Function Test Statistic             3958.154    3101.905
##   Degrees of freedom                               120         120
##   P-value                                        0.000       0.000
## 
## User model versus baseline model:
## 
##   Comparative Fit Index (CFI)                    0.849       0.846
##   Tucker-Lewis Index (TLI)                       0.816       0.811
## 
##   Robust Comparative Fit Index (CFI)                         0.854
##   Robust Tucker-Lewis Index (TLI)                            0.821
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -44411.981  -44411.981
##   Scaling correction factor                                  1.407
##     for the MLR correction
##   Loglikelihood unrestricted model (H1)     -44074.126  -44074.126
##   Scaling correction factor                                  1.266
##     for the MLR correction
## 
##   Number of free parameters                         38          38
##   Akaike (AIC)                               88899.963   88899.963
##   Bayesian (BIC)                             89102.443   89102.443
##   Sample-size adjusted Bayesian (BIC)        88981.727   88981.727
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.062       0.055
##   90 Percent Confidence Interval          0.058  0.067       0.051  0.060
##   P-value RMSEA <= 0.05                          0.000       0.013
## 
##   Robust RMSEA                                               0.061
##   90 Percent Confidence Interval                             0.056  0.066
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.055       0.055
## 
## Parameter Estimates:
## 
##   Information                                 Observed
##   Observed information based on                Hessian
##   Standard Errors                   Robust.huber.white
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   reasoning =~                                                          
##     reason.4          1.000                               0.674    0.546
##     reason.16         1.055    0.068   15.496    0.000    0.711    0.622
##     reason.17         0.885    0.075   11.807    0.000    0.597    0.445
##     reason.19         1.581    0.106   14.878    0.000    1.066    0.579
##   letter =~                                                             
##     letter.7          1.000                               1.068    0.630
##     letter.33         0.459    0.039   11.694    0.000    0.490    0.393
##     letter.34         0.677    0.048   14.179    0.000    0.723    0.558
##     letter.58         0.593    0.048   12.328    0.000    0.633    0.417
##   matrix =~                                                             
##     matrix.45         1.000                               0.860    0.635
##     matrix.46         0.226    0.073    3.108    0.002    0.194    0.143
##     matrix.47         0.307    0.070    4.389    0.000    0.264    0.192
##     matrix.55         0.718    0.077    9.275    0.000    0.617    0.397
##   rotate =~                                                             
##     rotate.3          1.000                               0.978    0.452
##     rotate.4          0.637    0.097    6.585    0.000    0.624    0.252
##     rotate.6          1.400    0.253    5.528    0.000    1.370    0.535
##     rotate.8          1.695    0.304    5.573    0.000    1.658    0.691
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   reasoning ~~                                                          
##     letter            0.729    0.075    9.738    0.000    1.013    1.013
##     matrix            0.524    0.059    8.871    0.000    0.904    0.904
##     rotate            0.326    0.084    3.889    0.000    0.494    0.494
##   letter ~~                                                             
##     matrix            0.798    0.086    9.258    0.000    0.870    0.870
##     rotate            0.485    0.121    4.012    0.000    0.465    0.465
##   matrix ~~                                                             
##     rotate            0.484    0.124    3.912    0.000    0.576    0.576
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .reason.4          1.072    0.062   17.429    0.000    1.072    0.702
##    .reason.16         0.803    0.049   16.448    0.000    0.803    0.614
##    .reason.17         1.439    0.078   18.433    0.000    1.439    0.802
##    .reason.19         2.251    0.107   21.019    0.000    2.251    0.665
##    .letter.7          1.735    0.123   14.121    0.000    1.735    0.603
##    .letter.33         1.315    0.062   21.057    0.000    1.315    0.846
##    .letter.34         1.154    0.064   18.007    0.000    1.154    0.689
##    .letter.58         1.906    0.072   26.296    0.000    1.906    0.826
##    .matrix.45         1.096    0.086   12.705    0.000    1.096    0.597
##    .matrix.46         1.813    0.075   24.339    0.000    1.813    0.980
##    .matrix.47         1.814    0.076   23.813    0.000    1.814    0.963
##    .matrix.55         2.044    0.081   25.268    0.000    2.044    0.843
##    .rotate.3          3.720    0.217   17.114    0.000    3.720    0.795
##    .rotate.4          5.714    0.186   30.763    0.000    5.714    0.936
##    .rotate.6          4.672    0.278   16.797    0.000    4.672    0.713
##    .rotate.8          3.009    0.357    8.431    0.000    3.009    0.522
##     reasoning         0.455    0.059    7.674    0.000    1.000    1.000
##     letter            1.140    0.136    8.384    0.000    1.000    1.000
##     matrix            0.739    0.104    7.128    0.000    1.000    1.000
##     rotate            0.957    0.257    3.722    0.000    1.000    1.000

Diagnosing problems

Does the warning provided by lavaan give you reason for concern? If so, what’s the diagnosis? (Hint: use a variant of inspect aka lavInspect.)

inspect(mfour, "cor.lv") 
##           resnng letter matrix rotate
## reasoning 1.000                      
## letter    1.013  1.000               
## matrix    0.904  0.870  1.000        
## rotate    0.494  0.465  0.576  1.000

How well does the model fit?

How would you interpret the quality of global fit based on fit indices?

Fit a simpler model

Based on the pattern of observed correlations and the results of the initial four-factor model, conceptualize and fit a simpler model with fewer factors.

mstring <- '
reasoning =~ reason.4 + reason.16 + reason.17 + reason.19 + 
  letter.7 + letter.33 + letter.34 + letter.58
matrix =~ matrix.45 + matrix.46 + matrix.47 + matrix.55
rotate =~ rotate.3 + rotate.4 + rotate.6 + rotate.8
'
mthree <- sem(mstring, data=iqitems, estimator="mlr")
summary(mthree, standardized=TRUE, fit.measures=TRUE)
## lavaan 0.6-4 ended normally after 58 iterations
## 
##   Optimization method                           NLMINB
##   Number of free parameters                         35
## 
##                                                   Used       Total
##   Number of observations                          1523        1525
## 
##   Estimator                                         ML      Robust
##   Model Fit Test Statistic                     677.456     556.552
##   Degrees of freedom                               101         101
##   P-value (Chi-square)                           0.000       0.000
##   Scaling correction factor                                  1.217
##     for the Yuan-Bentler correction (Mplus variant)
## 
## Model test baseline model:
## 
##   Minimum Function Test Statistic             3958.154    3101.905
##   Degrees of freedom                               120         120
##   P-value                                        0.000       0.000
## 
## User model versus baseline model:
## 
##   Comparative Fit Index (CFI)                    0.850       0.847
##   Tucker-Lewis Index (TLI)                       0.822       0.818
## 
##   Robust Comparative Fit Index (CFI)                         0.854
##   Robust Tucker-Lewis Index (TLI)                            0.827
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -44412.854  -44412.854
##   Scaling correction factor                                  1.407
##     for the MLR correction
##   Loglikelihood unrestricted model (H1)     -44074.126  -44074.126
##   Scaling correction factor                                  1.266
##     for the MLR correction
## 
##   Number of free parameters                         35          35
##   Akaike (AIC)                               88895.707   88895.707
##   Bayesian (BIC)                             89082.202   89082.202
##   Sample-size adjusted Bayesian (BIC)        88971.016   88971.016
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.061       0.054
##   90 Percent Confidence Interval          0.057  0.066       0.050  0.058
##   P-value RMSEA <= 0.05                          0.000       0.034
## 
##   Robust RMSEA                                               0.060
##   90 Percent Confidence Interval                             0.055  0.065
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.055       0.055
## 
## Parameter Estimates:
## 
##   Information                                 Observed
##   Observed information based on                Hessian
##   Standard Errors                   Robust.huber.white
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   reasoning =~                                                          
##     reason.4          1.000                               0.680    0.550
##     reason.16         1.056    0.069   15.396    0.000    0.718    0.627
##     reason.17         0.881    0.075   11.745    0.000    0.599    0.447
##     reason.19         1.580    0.107   14.764    0.000    1.074    0.583
##     letter.7          1.566    0.103   15.132    0.000    1.064    0.628
##     letter.33         0.724    0.059   12.213    0.000    0.492    0.394
##     letter.34         1.061    0.076   13.915    0.000    0.721    0.557
##     letter.58         0.933    0.078   11.952    0.000    0.634    0.418
##   matrix =~                                                             
##     matrix.45         1.000                               0.860    0.635
##     matrix.46         0.225    0.073    3.085    0.002    0.193    0.142
##     matrix.47         0.308    0.070    4.415    0.000    0.265    0.193
##     matrix.55         0.717    0.077    9.265    0.000    0.617    0.396
##   rotate =~                                                             
##     rotate.3          1.000                               0.976    0.451
##     rotate.4          0.637    0.097    6.577    0.000    0.621    0.252
##     rotate.6          1.403    0.252    5.557    0.000    1.369    0.535
##     rotate.8          1.703    0.304    5.593    0.000    1.662    0.693
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   reasoning ~~                                                          
##     matrix            0.517    0.058    8.957    0.000    0.885    0.885
##     rotate            0.317    0.082    3.892    0.000    0.479    0.479
##   matrix ~~                                                             
##     rotate            0.483    0.123    3.911    0.000    0.575    0.575
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .reason.4          1.065    0.061   17.396    0.000    1.065    0.697
##    .reason.16         0.794    0.048   16.649    0.000    0.794    0.606
##    .reason.17         1.436    0.078   18.415    0.000    1.436    0.800
##    .reason.19         2.234    0.106   21.090    0.000    2.234    0.660
##    .letter.7          1.742    0.118   14.794    0.000    1.742    0.606
##    .letter.33         1.313    0.062   21.134    0.000    1.313    0.844
##    .letter.34         1.156    0.063   18.417    0.000    1.156    0.690
##    .letter.58         1.904    0.072   26.545    0.000    1.904    0.826
##    .matrix.45         1.096    0.086   12.687    0.000    1.096    0.597
##    .matrix.46         1.814    0.074   24.361    0.000    1.814    0.980
##    .matrix.47         1.813    0.076   23.801    0.000    1.813    0.963
##    .matrix.55         2.044    0.081   25.274    0.000    2.044    0.843
##    .rotate.3          3.724    0.216   17.257    0.000    3.724    0.796
##    .rotate.4          5.717    0.184   31.008    0.000    5.717    0.937
##    .rotate.6          4.674    0.276   16.950    0.000    4.674    0.714
##    .rotate.8          2.995    0.356    8.415    0.000    2.995    0.520
##     reasoning         0.462    0.060    7.748    0.000    1.000    1.000
##     matrix            0.739    0.104    7.128    0.000    1.000    1.000
##     rotate            0.953    0.256    3.727    0.000    1.000    1.000

Does this model yield a better fit? Hint: you can use the anova() function to compare alternative factor models using a likelihood ratio test.

anova(mfour, mthree)
## Scaled Chi Square Difference Test (method = "satorra.bentler.2001")
## 
##         Df   AIC   BIC Chisq Chisq diff Df diff Pr(>Chisq)
## mfour   98 88900 89102   676                              
## mthree 101 88896 89082   677       1.24       3       0.74

Finding an even better model

Residuals

Look at the residual correlations of the model. What correlations are being poorly estimated? Does this inform your assessment of how to improve the model?

resid(mthree, "cor")
## $type
## [1] "cor.bollen"
## 
## $cov
##           resn.4 rsn.16 rsn.17 rsn.19 lttr.7 ltt.33 ltt.34 ltt.58 mtr.45
## reason.4   0.000                                                        
## reason.16  0.017  0.000                                                 
## reason.17 -0.021 -0.016  0.000                                          
## reason.19 -0.007 -0.009 -0.001  0.000                                   
## letter.7  -0.006  0.025 -0.004  0.008  0.000                            
## letter.33 -0.017 -0.037  0.032  0.003 -0.026  0.000                     
## letter.34  0.006  0.014 -0.009  0.009  0.032 -0.049  0.000              
## letter.58 -0.003  0.008  0.000 -0.032 -0.012  0.116 -0.038  0.000       
## matrix.45  0.028 -0.005  0.024  0.034  0.012 -0.009  0.018  0.018  0.000
## matrix.46 -0.004 -0.056  0.011 -0.048 -0.096  0.058 -0.078 -0.007 -0.030
## matrix.47 -0.032 -0.033 -0.009 -0.040 -0.060  0.081 -0.013  0.003 -0.038
## matrix.55 -0.021 -0.018  0.025 -0.001 -0.021  0.007 -0.021 -0.041 -0.002
## rotate.3  -0.007  0.008  0.065  0.001 -0.036  0.070 -0.027  0.024 -0.021
## rotate.4   0.008  0.031  0.114 -0.012  0.030  0.052 -0.002 -0.006  0.015
## rotate.6   0.062  0.031 -0.023  0.057 -0.009  0.043  0.000  0.041 -0.033
## rotate.8  -0.021 -0.027 -0.016 -0.027 -0.061  0.057 -0.027  0.024 -0.063
##           mtr.46 mtr.47 mtr.55 rott.3 rott.4 rott.6 rott.8
## reason.4                                                  
## reason.16                                                 
## reason.17                                                 
## reason.19                                                 
## letter.7                                                  
## letter.33                                                 
## letter.34                                                 
## letter.58                                                 
## matrix.45                                                 
## matrix.46  0.000                                          
## matrix.47  0.195  0.000                                   
## matrix.55  0.078  0.031  0.000                            
## rotate.3   0.149  0.107  0.046  0.000                     
## rotate.4   0.138  0.143  0.095  0.298  0.000              
## rotate.6   0.028  0.046 -0.009 -0.082 -0.173  0.000       
## rotate.8   0.092  0.068  0.032 -0.024 -0.059  0.063  0.000
ggcorrplot(resid(mthree, "cor")$cov, type="lower")

Rotate 3 and 4 have a very high residual correlation!

Modification indices

Do the modification indices suggest any plausible parameters that are omitted from the model? Alternatively, do the modification indices suggest problems with particular items?

modificationindices(mthree, minimum.value=20)
##           lhs op       rhs    mi    epc sepc.lv sepc.all sepc.nox
## 39  reasoning =~ matrix.45  70.0  4.369   2.969    2.192    2.192
## 40  reasoning =~ matrix.46  38.9 -1.577  -1.072   -0.788   -0.788
## 67     rotate =~ matrix.45  44.2 -0.635  -0.619   -0.457   -0.457
## 68     rotate =~ matrix.46  36.5  0.381   0.372    0.273    0.273
## 69     rotate =~ matrix.47  26.8  0.330   0.322    0.234    0.234
## 137 letter.33 ~~ letter.58  32.8  0.245   0.245    0.155    0.155
## 170 matrix.46 ~~ matrix.47  63.0  0.373   0.373    0.206    0.206
## 172 matrix.46 ~~  rotate.3  25.3  0.353   0.353    0.136    0.136
## 173 matrix.46 ~~  rotate.4  20.6  0.380   0.380    0.118    0.118
## 178 matrix.47 ~~  rotate.4  23.4  0.406   0.406    0.126    0.126
## 185  rotate.3 ~~  rotate.4 224.0  1.964   1.964    0.426    0.426
## 186  rotate.3 ~~  rotate.6  35.1 -0.888  -0.888   -0.213   -0.213
## 188  rotate.4 ~~  rotate.6  94.3 -1.515  -1.515   -0.293   -0.293
## 189  rotate.4 ~~  rotate.8  25.1 -0.801  -0.801   -0.194   -0.194
## 190  rotate.6 ~~  rotate.8 109.4  2.593   2.593    0.693    0.693

Can you improve the model further? Consider:

  • dropping items that aren’t loading well on any factor
  • shifting items to another factor if they don’t load where we expected
  • aggregating items that seem highly and uniquely overlapping

Bonus question: fit a hierarchical factor model

In principle, there may be a general factor of intelligence that explains the correlations among the subscales here. Fit a hierarchical factor model in which ‘g’ is a superordinate factor that explains the lower-order factors. Does this model fit significantly worse than a model in which all lower-order factors simply correlate, but there is no ‘g’?

mstring <- '
reasoning =~ reason.4 + reason.16 + reason.17 + reason.19
letter =~ letter.7 + letter.33 + letter.34 + letter.58
matrix =~ matrix.45 + matrix.46 + matrix.47 + matrix.55
rotate =~ rotate.3 + rotate.4 + rotate.6 + rotate.8
g =~ reasoning + letter + matrix + rotate
'
mhier <- sem(mstring, data=iqitems, estimator="mlr")
## Warning in lav_object_post_check(object): lavaan WARNING: some estimated lv
## variances are negative
summary(mhier, standardized=TRUE, fit.measures=TRUE)
## lavaan 0.6-4 ended normally after 50 iterations
## 
##   Optimization method                           NLMINB
##   Number of free parameters                         36
## 
##                                                   Used       Total
##   Number of observations                          1523        1525
## 
##   Estimator                                         ML      Robust
##   Model Fit Test Statistic                     690.268     567.665
##   Degrees of freedom                               100         100
##   P-value (Chi-square)                           0.000       0.000
##   Scaling correction factor                                  1.216
##     for the Yuan-Bentler correction (Mplus variant)
## 
## Model test baseline model:
## 
##   Minimum Function Test Statistic             3958.154    3101.905
##   Degrees of freedom                               120         120
##   P-value                                        0.000       0.000
## 
## User model versus baseline model:
## 
##   Comparative Fit Index (CFI)                    0.846       0.843
##   Tucker-Lewis Index (TLI)                       0.815       0.812
## 
##   Robust Comparative Fit Index (CFI)                         0.851
##   Robust Tucker-Lewis Index (TLI)                            0.821
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -44419.260  -44419.260
##   Scaling correction factor                                  1.405
##     for the MLR correction
##   Loglikelihood unrestricted model (H1)     -44074.126  -44074.126
##   Scaling correction factor                                  1.266
##     for the MLR correction
## 
##   Number of free parameters                         36          36
##   Akaike (AIC)                               88910.520   88910.520
##   Bayesian (BIC)                             89102.343   89102.343
##   Sample-size adjusted Bayesian (BIC)        88987.981   88987.981
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.062       0.055
##   90 Percent Confidence Interval          0.058  0.067       0.051  0.059
##   P-value RMSEA <= 0.05                          0.000       0.013
## 
##   Robust RMSEA                                               0.061
##   90 Percent Confidence Interval                             0.056  0.066
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.057       0.057
## 
## Parameter Estimates:
## 
##   Information                                 Observed
##   Observed information based on                Hessian
##   Standard Errors                   Robust.huber.white
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   reasoning =~                                                          
##     reason.4          1.000                               0.675    0.546
##     reason.16         1.052    0.068   15.496    0.000    0.710    0.620
##     reason.17         0.884    0.075   11.803    0.000    0.597    0.445
##     reason.19         1.581    0.106   14.865    0.000    1.067    0.580
##   letter =~                                                             
##     letter.7          1.000                               1.064    0.627
##     letter.33         0.464    0.039   11.766    0.000    0.493    0.395
##     letter.34         0.678    0.048   14.164    0.000    0.721    0.557
##     letter.58         0.598    0.048   12.349    0.000    0.636    0.419
##   matrix =~                                                             
##     matrix.45         1.000                               0.891    0.658
##     matrix.46         0.186    0.064    2.905    0.004    0.166    0.122
##     matrix.47         0.267    0.061    4.354    0.000    0.238    0.173
##     matrix.55         0.680    0.072    9.404    0.000    0.606    0.389
##   rotate =~                                                             
##     rotate.3          1.000                               0.954    0.441
##     rotate.4          0.616    0.093    6.614    0.000    0.588    0.238
##     rotate.6          1.466    0.239    6.146    0.000    1.398    0.546
##     rotate.8          1.753    0.285    6.150    0.000    1.672    0.697
##   g =~                                                                  
##     reasoning         1.000                               1.019    1.019
##     letter            1.523    0.108   14.087    0.000    0.985    0.985
##     matrix            1.149    0.088   13.046    0.000    0.887    0.887
##     rotate            0.693    0.127    5.455    0.000    0.500    0.500
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .reason.4          1.071    0.061   17.439    0.000    1.071    0.702
##    .reason.16         0.805    0.049   16.456    0.000    0.805    0.615
##    .reason.17         1.439    0.078   18.445    0.000    1.439    0.802
##    .reason.19         2.248    0.107   20.993    0.000    2.248    0.664
##    .letter.7          1.744    0.123   14.180    0.000    1.744    0.606
##    .letter.33         1.312    0.062   21.043    0.000    1.312    0.844
##    .letter.34         1.156    0.064   17.985    0.000    1.156    0.690
##    .letter.58         1.902    0.073   26.228    0.000    1.902    0.825
##    .matrix.45         1.041    0.084   12.354    0.000    1.041    0.567
##    .matrix.46         1.824    0.075   24.443    0.000    1.824    0.985
##    .matrix.47         1.827    0.076   24.020    0.000    1.827    0.970
##    .matrix.55         2.058    0.080   25.752    0.000    2.058    0.849
##    .rotate.3          3.768    0.194   19.404    0.000    3.768    0.805
##    .rotate.4          5.758    0.163   35.406    0.000    5.758    0.943
##    .rotate.6          4.593    0.255   18.039    0.000    4.593    0.701
##    .rotate.8          2.964    0.322    9.194    0.000    2.964    0.515
##    .reasoning        -0.018    0.022   -0.801    0.423   -0.039   -0.039
##    .letter            0.034    0.064    0.528    0.598    0.030    0.030
##    .matrix            0.170    0.071    2.390    0.017    0.214    0.214
##    .rotate            0.683    0.151    4.535    0.000    0.750    0.750
##     g                 0.473    0.062    7.642    0.000    1.000    1.000
anova(mfour, mhier)
## Scaled Chi Square Difference Test (method = "satorra.bentler.2001")
## 
##        Df   AIC   BIC Chisq Chisq diff Df diff Pr(>Chisq)   
## mfour  98 88900 89102   676                                 
## mhier 100 88911 89102   690       10.2       2     0.0062 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Noting that the four-factor model is bad.) BIC is indifferent between the alternatives, the LRT says model fit is significantly worse for hierarchical (but we have a lot of data, so a lot of power to reject the null). AIC is 11 points lower for the four-factor model. Altogether, this is equivocal support for a four-factor model. We could debate which one is ‘better,’ but there is not a clear winner.