Tomarken and Waller (2003) note the following problems with models that appear to fit well according to global statistics:
They also offer the following advice for avoiding overreliance on global fit statistics:
Equivalent models have the same degrees of freedom but specify different relationships among variables. Furthermore, equivalent models fit the data equally well, providing identical fit statistics and sample likelihood. These can pose a problem in SEM when our core conclusions rest on specifying the directionality of a relationship if changing the cause and effect would result in equivalent fit.
In general, directed (causal) relations cannot be inferred from covariance data without strong assumptions about the processes that generated the data. For example, if one variable causes another, the cause must precede the effect in time. Likewise, causal relationships are more plausible if one variable is manipulated experimentally and its effects on dependent variables are measured. Thus, the problem of equivalent models largely points to a limitation of relying on observational techniques and correlational measures.
Within a block of variables at the beginning of a structural model that is just identified and with unidirectional relations to subsequent variables, direct effects, correlated disturbances, and equality-constrained reciprocal effects are interchangeable. This means we can replace \(Y1 \rightarrow Y2\) with \(Y1 \leftarrow Y2\), \(Y2 \leftrightharpoons Y1\), or \(D1 \leftrightarrow D2\).
At subsequent places in the model where two endogenous variables have the same causes and their relations are unidirectional, all of the following may be substituted for one another: \(Y1 \rightarrow Y2\) with \(Y1 \leftarrow Y2\), \(Y2 \leftrightharpoons Y1\) (equality-constrained), or \(D1 \leftrightarrow D2\).
Limitations of replacing rules:
From Mulaik (2009)
\(Y1\) causes \(Y2\)
\(Y2\) causes \(Y1\)
Note that this model does not include any causal path between \(Y1\) and \(Y2\), which seems like a conceptually distinct position from Models A and B.
From Hershberger 2006
Use an index for judging model quality that is not based on global goodness-of-fit or likelihood. For example the information complexity criterion (ICOMP) approach tries to quantify model misfit (residual covariances) relative to its complexity (correlations among parameter estimates, where greater correlation indicates complexity). See semGOF
package. Among equivalent models, a model with lower ICOMP would be preferred.
Eliminate equivalent models that have problems during estimation such as many iterations, negative variance estiamtes, or parameters that have to be constrained to avoid improper solutions. This is a heuristic approach.
Combine \(R^2\) measures for endogenous variables into a single measure (e.g., mean) and prefer the model with the highest value. This is also heuristic and relatively untested.
Examine extended individual case residuals (EICR). Although the residual covariance matrix is identical among equivalent models, the residuals in terms of predicted versus observed scores on manifest variables are not. Thus, according to Raykov and Penev (2001), a model that minimizes individual case residuals should be preferred over equivalents.
Consider this example:
m <- '
Happiness ~ 0.3*Money + 0.3*Love
Travel ~ 0.4*Happiness + 0.3*Money
'
d <- simulateData(m, sample.nobs=500)
#groundtruth
f1.syn <- '
Happiness ~ Money + Love
Travel ~ Happiness + Money
'
f1 <- sem(f1.syn, d)
semPaths_default(f1)
fitmeasures(f1)["chisq"]
## chisq
## 0.87
is not equivalent to this:
#not equivalent
f2.syn <- '
Happiness ~ Money + Love + Travel
Travel ~ Money
'
f2 <- sem(f2.syn, d)
semPaths_default(f2)
fitmeasures(f2)["chisq"]
## chisq
## 4.89
Why not? Love is not a cause of Travel in either model, so the causes of the endogenous variables are not the same (does not satisfy rule 2).
This, on the other hand, would be equivalent according to rule 1. The relations among exogenous variables are just identified and have unidirectional effects on other variables. Thus, we can shift the \(Money \leftrightarrow Love\) relationship from an unanalyzed association to a directed effect.
#equivalent
f3.syn <- '
Love ~ Money
Happiness ~ Money + Love
Travel ~ Happiness + Money
'
f3 <- sem(f3.syn, d, conditional.x=FALSE)
semPaths_default(f3)
fitmeasures(f3)["chisq"]
## chisq
## 0.87
#equivalent
f4.syn <- '
Happiness ~ Love
Travel ~ Happiness + Money
Money ~ Happiness
'
f4 <- sem(f4.syn, d, conditional.x=FALSE)
semPaths_default(f4)
fitmeasures(f4)["chisq"]
## chisq
## 3.53
The closer the model is to saturated, the larger the number of equivalent models. When possible, try to test a relatively sparse structural model. More specifically, in saturated blocks, there is little information to adjudicate directionality. Thus, it is useful to include structural variables that are thought to be unrelated to others.
Models with many paths suffer from many equivalents. Can you test your key hypotheses in models with fewer variables?
Consider including ancillary variables that are not related to the original set in the model. In particular, consider adding antecedents (predictors) of variables in a source block such that they do not have any relationship to the focal (destination) block:
Bentler and Satorra (2010) defined a useful nested model test:
Conduct typical SEM on Model \(M1\). Save the \(df_{M1}\) and model-implied covariance \(\hat{\boldsymbol{\Sigma}}(\Theta)\).
Conduct a SEM on an alternative model \(M2\) using the \(\hat{\boldsymbol{\Sigma}}(\Theta)\) as data (not raw). Save the \(\chi^2_{M2}\) and \(df_{M2}\).
Compute \(d = df_{M1} - df_{M2}\) and set a small criterion, \(\epsilon\), on \(\chi^2\) for equivalence (e.g., .001)
The primary idea is that when there are more degrees of freedom in \(M1\) than \(M2\) and the model-implied covariance matrix can be reproduced by \(M2\), then \(M1\) is nested in \(M2\).
“All models are wrong but some are useful” - George Box
The goal of most statistical modeling is to approximate the key factors that generated our empirical data. The data generation processes are usually complex and largely unknown, so our models will almost always be deficient in many ways. Nevertheless, we may wish to know how close one model is to the unknown truth relative to a comparator model. This is the substance of model selection theory – i.e., how one selects among models based on relative evidence.
Kullback-Leibler information quantifies the information lost when lost when a simple model, g is used to approximate the true model, f:
\[ I(f,g) = \int{f(x) \textrm{log} \left( \frac{f(x)}{g(x|\theta)} \right) dx} \] This can also be thought of as the distance between our model, g and the population (unknown) model, f.
In the simplest case, we can think of trying to identify a probability distribution (e.g., Gaussian) that best describes a given empirical sample. Recall from the M-L lecture our attempt to fit a Gaussian distribution using a Gamma distribution. The sample likelihood for the Gamma, even at optimized (fitted) parameters, can never be as close as a Gaussian fit.
ML Gamma fit
X <- rnorm(5000, mean=50, sd=sqrt(50))
gfit <- suppressWarnings(fitdistr(X, "gamma"))
gfit$loglik
## [1] -17011
datafromgam <- rgamma(n=5000, shape=gfit$estimate["shape"], rate=gfit$estimate["rate"])
ML Gaussian fit
gfit <- suppressWarnings(fitdistr(X, "normal"))
gfit$loglik
## [1] -16973
datafromgauss <- rnorm(n=5000, mean=gfit$estimate["mean"], sd=gfit$estimate["sd"])
df <- data.frame(model=rep(c("groundtruth", "gamma", "gauss"), each=5000),
v=c(X, datafromgam, datafromgauss))
ggplot(df, aes(x=v, color=model)) + geom_density()
The sample LL is an approximation of a K-L distance between a model and ‘truth’, but read on to AIC below…
In general, we do not know the true model, which means we cannot compute the exact K-L distance. Fortunately, however, we are often interested in relative distance – that is, how much better is model A than model B? In this circumstance, the K-L distance derivation includes two terms: one is based on the expectation under the ‘true’ model (can’t compute that!), the other is based on the relative distance between the model and the truth.
Since we can’t compute the first term, it means we do not have a measure of absolute fit using likelihood-based and K-L methods. But, because we can compute the relative distance, we can also compute the relative distance between two or more models. As a result, we can compare relative evidence across a set of candidate models. All may be bad, or all may be good, by some absolute criterion, but we can ascertain which is relatively best. Global fit criteria in SEM often give some information about the absolute fit (esp. residual-based methods). Likelihood methods (K-L distance) are more about relative evidence.
K-L distances are on a true ratio scale (i.e., a meaningful zero point and equal interval spacing). As a result, differences in distances can be compared, and a zero K-L difference implies identical relative fit. Moreover, K-L distances tend to become larger as the sample size increases. By analogy, AIC differences among models tend to be larger in large samples compared to small ones. This does not mean that the scaling or interpretation changes! Rather, it underscores that discriminating among models becomes easier in large samples.
The derivation of the AIC was a major breakthrough in information theory and statistics because it provides a statistical link between K-L distance (a more theoretical notion) and maximum likelihood-based methods.
The AIC provides an approximation of the relative distance of the fitted model and the unknown (true) mechanisms that generated the data. Importantly, it is a large sample criterion (based on asymptotic properties, so 1000s, ideally) and is also not especially meaningful if the models in question are simply inappropriate or poor (e.g., fitting a binary outcome with a continuous model).
\[ AIC = -2 \textrm{log} (\mathcal{L}(\hat{\theta}|y)) + 2k \] where \(k\) is the number of free parameters, and \(\mathcal{L}(\hat{\theta}|y)\) is the sample likelihood.
Note that by multiplying the LL by -2, the criterion is now scaled such that lower values are better. By contrast, for LL, higher values indicate better fit (greater likelihood of the data given the model).
Because AIC is based on a large sample derivation (i.e., its statistical properties are ideal as the sample size goes to infinity), it may select overly complex models in smaller samples (think hundreds, not thousands). This led to the derivation of a variant of AIC, usually denoted \(AIC_C\) (Sugiura, 1978), that includes an additional penalty for sample size:
\[ AIC_C = -2 \textrm{log} (\mathcal{L}(\hat{\theta}|y)) + 2k + \frac{2k(k+1)}{n - k -1} \] When your cases:parameters ratio is below 40 (which it usually is!), Burnham and Anderson recommend this over AIC. But this was derived for univariate (one DV) regression, not SEM. There is not an equivalent in SEM (that I know of), but also not a specific reason that the above derivation is not approximately correct.
One of the most important considerations when comparing non-nested models is that the set of endogenous variables is the same and on the same scale! This is often misunderstood in applied papers and can lead to bizarre circumstances where AICs differ by thousands of points. That’s not impossible, but more likely to reflect differences in the likelihood function that render models non-comparable using LL-based methods (incl. AIC/BIC).
Recall that the sample likelihood is computed on the endogenous variables. Thus, if the endogenous set changes (e.g., changing the direction of an arrow), LL methods cannot necessarily be compared. In lavaan
, this relates to the conditional.x
option. See ?lavOptions
. If it is FALSE
, then model-implied statistics, including LL, reflect both exogenous and endogenous variables. This is likely what you’d want in cases where you’re testing alternative models with paths direction changes that would make an endogenous variable exogenous, or vice versa.
If you’re not sure, check the unrestricted (saturated) LL for the two candidate models. If they are not identical, you cannot use AIC/BIC to compare them. You might also check your setting of conditional.x
in lavaan
. Fortunately, lavaan
will usually provide a warning: “WARNING: some models are based on a different set of observed variables.” If you see that, watch out! Your models are almost definitely not comparable, either in terms of LRT (nested) or AIC/BIC (non-nested) comparisons since both are based on the sample likelihood.
See this paper from Ed Merkle: https://arxiv.org/pdf/1402.6720.pdf. And the corresponding nonnest2
package in R.
Briefly, the Vuong test is often a useful way to compare evidence among models, including non-nested models. The package plays well with lavaan
.
(Some quotes from Hallquist & Pilkonis, 2010)
AIC is an asymptotically efficient criterion for model selection, which means that it tends to select the model that minimizes prediction error as sample size increases. More generally, AIC is an information criterion primarily concerned with minimizing the relative Kullback-Leibler (K-L) distance between a statistical model and the unknown mechanisms giving rise to the observed data (Claeskens & Hjort, 2008).
An underlying assumption of AIC is that the “truth” of the data is very complex (i.e., many causal factors of varying strengths) and that no one “true model” exists (Burnham & Anderson, 2002). Thus, AIC is useful for selecting the best approximate model using a finite number of parameters when “truth” is countably infinite.
By contrast, the BIC originates from the Bayesian tradition in statistics and is concerned with the statistical property of consistency, which refers to the one “true model” being selected with increasing probability as sample size increases (Yang, 2005). Underlying the property of consistency are several assumptions worth examining: (1) a single “true” model exists that best represents the mechanisms that gave rise to the observed data, (2) one of the models in the set of models tested is the “true” model, and (3) the focus of model selection is to identify the “true” model rather than to provide a close approximation to the truth. Given these assumptions, Burnham and Anderson (2002) have argued against the use of consistent model selection criteria in the biological, medical, and social sciences because such data are highly complex, there is a low likelihood of including the “true model” in any set of models considered, and it is unlikely that a single “true model” exists.
Summarizing extensive simulation work, Burnham and Anderson (2002) offer useful rules-of-thumb for inferring the degree of certainty afforded to a particular model relative to the best fitting model in the set:
Moreover, AIC values for a set of models can be subtracted from the model with the lowest AIC value and these difference scores can be transformed into a set of Akaike weights that represent the evidential weight of a particular model relative to all other models in the set (Burnham & Anderson, 2002). The important general point is that conclusions drawn from latent variable models would benefit from being contextualized in terms of model selection uncertainty and the degree of support for one model over alternatives.
For details on Akaike weights or evidence ratios, I would recommend checking out the AICcmodavg
package. See demo below.
From Brown Ch. 8, Fig 8.1. This is a measure of coping styles administered to 275 college students. There are four first-order latent variables, which reflect different ways of coping with stress: Problem Solving, Cognitive Restructuring, Express Emotions, and Social Support. The authors also anticipated that there are two higher-order factors that account for correlations among the first-order factors: Problem-Focused Coping and Emotion-Focused Coping.
rawdat <- '
1.00
0.78 1.00
0.80 0.77 1.00
0.56 0.51 0.48 1.00
0.52 0.51 0.46 0.78 1.00
0.59 0.51 0.51 0.80 0.79 1.00
0.16 0.15 0.17 0.14 0.18 0.16 1.00
0.19 0.13 0.18 0.14 0.16 0.16 0.81 1.00
0.12 0.17 0.17 0.17 0.20 0.16 0.75 0.80 1.00
0.16 0.13 0.17 0.15 0.16 0.18 0.56 0.52 0.50 1.00
0.16 0.14 0.18 0.15 0.16 0.18 0.51 0.58 0.51 0.81 1.00
0.16 0.15 0.14 0.16 0.16 0.14 0.52 0.57 0.52 0.80 0.79 1.00
'
#standard deviations of variables
sds <- as.numeric(strsplit('1.40 2.10 1.30 2.30 2.40 1.90 2.00 2.90 2.30 3.10 2.20 1.20', '\\s+', perl=TRUE)[[1]])
cormat <- getCov(rawdat)
#this is the covariance matrix to be analyzed
covmat <- lavaan::cor2cov(cormat, sds)
rownames(covmat) <- colnames(covmat) <- c("P1", "P2", "P3", "C1", "C2", "C3", "E1", "E2", "E3", "S1", "S2", "S3")
#to get Vuong test to work as expected, need raw data, not just a covmat
df <- mvrnorm(n=275, mu=rep(0, 12), Sigma=covmat, empirical = TRUE)
bmodel <- '
ProbSol =~ P1 + P2 + P3
CogRest =~ C1 + C2 + C3
ExpEmo =~ E1 + E2 + E3
SocSupp =~ S1 + S2 + S3
#explicitly name factor correlations
ProbSol ~~ CogRest
ProbSol ~~ ExpEmo
ProbSol ~~ SocSupp
CogRest ~~ ExpEmo
CogRest ~~ SocSupp
ExpEmo ~~ SocSupp
'
#l1fac_m <- sem(bmodel, sample.cov=covmat, sample.nobs = 275)
l1fac_m <- sem(bmodel, df)
summary(l1fac_m, fit.measures=TRUE, standardized=TRUE)
## lavaan (0.5-23.1097) converged normally after 65 iterations
##
## Number of observations 275
##
## Estimator ML
## Minimum Function Test Statistic 82.970
## Degrees of freedom 48
## P-value (Chi-square) 0.001
##
## Model test baseline model:
##
## Minimum Function Test Statistic 2761.514
## Degrees of freedom 66
## P-value 0.000
##
## User model versus baseline model:
##
## Comparative Fit Index (CFI) 0.987
## Tucker-Lewis Index (TLI) 0.982
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -5642.801
## Loglikelihood unrestricted model (H1) -5601.316
##
## Number of free parameters 30
## Akaike (AIC) 11345.602
## Bayesian (BIC) 11454.106
## Sample-size adjusted Bayesian (BIC) 11358.981
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.051
## 90 Percent Confidence Interval 0.032 0.070
## P-value RMSEA <= 0.05 0.426
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.017
##
## Parameter Estimates:
##
## Information Expected
## Standard Errors Standard
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## ProbSol =~
## P1 1.000 1.273 0.911
## P2 1.424 0.071 20.173 0.000 1.812 0.864
## P3 0.896 0.043 20.778 0.000 1.140 0.879
## CogRest =~
## C1 1.000 2.038 0.888
## C2 1.027 0.051 19.964 0.000 2.094 0.874
## C3 0.842 0.040 21.117 0.000 1.715 0.904
## ExpEmo =~
## E1 1.000 1.743 0.873
## E2 1.542 0.072 21.395 0.000 2.688 0.929
## E3 1.133 0.059 19.043 0.000 1.974 0.860
## SocSupp =~
## S1 1.000 2.787 0.901
## S2 0.706 0.033 21.679 0.000 1.969 0.896
## S3 0.381 0.018 21.230 0.000 1.062 0.886
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## ProbSol ~~
## CogRest 1.718 0.209 8.217 0.000 0.662 0.662
## ExpEmo 0.453 0.149 3.042 0.002 0.204 0.204
## SocSupp 0.695 0.237 2.928 0.003 0.196 0.196
## CogRest ~~
## ExpEmo 0.712 0.238 2.991 0.003 0.201 0.201
## SocSupp 1.146 0.381 3.011 0.003 0.202 0.202
## ExpEmo ~~
## SocSupp 3.255 0.396 8.218 0.000 0.670 0.670
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .P1 0.333 0.051 6.526 0.000 0.333 0.170
## .P2 1.111 0.130 8.560 0.000 1.111 0.253
## .P3 0.384 0.048 8.043 0.000 0.384 0.228
## .C1 1.118 0.143 7.812 0.000 1.118 0.212
## .C2 1.356 0.162 8.351 0.000 1.356 0.236
## .C3 0.654 0.093 7.010 0.000 0.654 0.182
## .E1 0.946 0.112 8.425 0.000 0.946 0.237
## .E2 1.155 0.205 5.638 0.000 1.155 0.138
## .E3 1.372 0.155 8.862 0.000 1.372 0.260
## .S1 1.805 0.242 7.448 0.000 1.805 0.189
## .S2 0.947 0.124 7.655 0.000 0.947 0.196
## .S3 0.308 0.038 8.103 0.000 0.308 0.215
## ProbSol 1.620 0.169 9.558 0.000 1.000 1.000
## CogRest 4.152 0.452 9.186 0.000 1.000 1.000
## ExpEmo 3.039 0.339 8.958 0.000 1.000 1.000
## SocSupp 7.770 0.823 9.435 0.000 1.000 1.000
semPaths_default(l1fac_m, sizeMan=7)
What about a model with a higher-order factor structure: Problem-Focused versus Emotion-Focused?
hcfa <- '
ProbSol =~ P1 + P2 + P3
CogRest =~ C1 + C2 + C3
ExpEmo =~ E1 + E2 + E3
SocSupp =~ S1 + S2 + S3
PFCope =~ ProbSol + CogRest
EFCope =~ ExpEmo + SocSupp
'
#hcfa_m <- sem(hcfa, sample.cov=covmat, sample.nobs = 275)
hcfa_m <- sem(hcfa, df)
summary(hcfa_m, fit.measures=TRUE, standardized=TRUE)
## lavaan (0.5-23.1097) converged normally after 63 iterations
##
## Number of observations 275
##
## Estimator ML
## Minimum Function Test Statistic 83.005
## Degrees of freedom 49
## P-value (Chi-square) 0.002
##
## Model test baseline model:
##
## Minimum Function Test Statistic 2761.514
## Degrees of freedom 66
## P-value 0.000
##
## User model versus baseline model:
##
## Comparative Fit Index (CFI) 0.987
## Tucker-Lewis Index (TLI) 0.983
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -5642.819
## Loglikelihood unrestricted model (H1) -5601.316
##
## Number of free parameters 29
## Akaike (AIC) 11343.637
## Bayesian (BIC) 11448.524
## Sample-size adjusted Bayesian (BIC) 11356.570
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.050
## 90 Percent Confidence Interval 0.031 0.068
## P-value RMSEA <= 0.05 0.469
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.017
##
## Parameter Estimates:
##
## Information Expected
## Standard Errors Standard
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## ProbSol =~
## P1 1.000 1.273 0.911
## P2 1.424 0.071 20.174 0.000 1.812 0.864
## P3 0.896 0.043 20.777 0.000 1.140 0.879
## CogRest =~
## C1 1.000 2.038 0.888
## C2 1.028 0.051 19.966 0.000 2.094 0.874
## C3 0.842 0.040 21.115 0.000 1.715 0.904
## ExpEmo =~
## E1 1.000 1.743 0.873
## E2 1.542 0.072 21.393 0.000 2.687 0.928
## E3 1.133 0.059 19.047 0.000 1.975 0.860
## SocSupp =~
## S1 1.000 2.787 0.901
## S2 0.706 0.033 21.678 0.000 1.969 0.896
## S3 0.381 0.018 21.230 0.000 1.062 0.886
## PFCope =~
## ProbSol 1.000 0.812 0.812
## CogRest 1.609 0.422 3.813 0.000 0.816 0.816
## EFCope =~
## ExpEmo 1.000 0.826 0.826
## SocSupp 1.572 0.407 3.865 0.000 0.812 0.812
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## PFCope ~~
## EFCope 0.448 0.146 3.062 0.002 0.301 0.301
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .P1 0.333 0.051 6.525 0.000 0.333 0.170
## .P2 1.111 0.130 8.560 0.000 1.111 0.253
## .P3 0.384 0.048 8.043 0.000 0.384 0.228
## .C1 1.119 0.143 7.812 0.000 1.119 0.212
## .C2 1.355 0.162 8.350 0.000 1.355 0.236
## .C3 0.655 0.093 7.012 0.000 0.655 0.182
## .E1 0.946 0.112 8.423 0.000 0.946 0.237
## .E2 1.157 0.205 5.644 0.000 1.157 0.138
## .E3 1.372 0.155 8.859 0.000 1.372 0.260
## .S1 1.806 0.242 7.448 0.000 1.806 0.189
## .S2 0.947 0.124 7.655 0.000 0.947 0.196
## .S3 0.308 0.038 8.102 0.000 0.308 0.215
## ProbSol 0.552 0.279 1.982 0.047 0.341 0.341
## CogRest 1.388 0.721 1.925 0.054 0.334 0.334
## ExpEmo 0.968 0.530 1.826 0.068 0.319 0.319
## SocSupp 2.652 1.314 2.018 0.044 0.341 0.341
## PFCope 1.068 0.308 3.467 0.001 1.000 1.000
## EFCope 2.071 0.593 3.491 0.000 1.000 1.000
semPaths_default(hcfa_m, sizeMan=7)
Are these models nested?
anova(l1fac_m, hcfa_m)
Df | AIC | BIC | Chisq | Chisq diff | Df diff | Pr(>Chisq) | |
---|---|---|---|---|---|---|---|
l1fac_m | 48 | 11346 | 11454 | 83 | NA | NA | NA |
hcfa_m | 49 | 11344 | 11449 | 83 | 0.035 | 1 | 0.852 |
nonnest2::vuongtest(l1fac_m, hcfa_m)
##
## Model 1
## Class: lavaan
## Call: lavaan::lavaan(model = bmodel, data = df, model.type = "sem", ...
##
## Model 2
## Class: lavaan
## Call: lavaan::lavaan(model = hcfa, data = df, model.type = "sem", int.ov.free = TRUE, ...
##
## Variance test
## H0: Model 1 and Model 2 are indistinguishable
## H1: Model 1 and Model 2 are distinguishable
## w2 = 0.000, p = 0.854
##
## Non-nested likelihood ratio test
## H0: Model fits are equal for the focal population
## H1A: Model 1 fits better than Model 2
## z = 0.094, p = 0.462
## H1B: Model 2 fits better than Model 1
## z = 0.094, p = 0.5375
aa <- aictab(list(l1fac_m, hcfa_m))
## Warning in aictab.AIClavaan(list(l1fac_m, hcfa_m)):
## Model names have been supplied automatically in the table
We can test whether models are nested, equivalent, or not nested using Bentler’s nesting and equivalent testing procedure. As mentioned above, this involves fitting model 1 using a candidate SEM, saving the model-implied covariance matrix, then fitting model 2 using that matrix as the input. We examine the difference in degrees of freedom. If \(df_1 - df_2 > 0\) and the \(\chi^2\) of model 2 is essentially zero (usually < .001) then we say model 1 is nested in model 2. Note that this is an asymmetric test. Model 1 may not be nested in model 2, but the converse (2 nested in 1) could be.
Following Bentler steps:
df1 <- fitmeasures(l1fac_m)["df"] #degrees of freedom for saturated factor model (no higher-order structure)
mimplied_l1fac <- fitted(l1fac_m)$cov #model implied cov
#now use the model-implied covariance as the data to analyze for hcfa_m
hcfa_m_fromcov <- sem(hcfa, sample.cov=mimplied_l1fac, sample.nobs = 275)
df2 <- fitmeasures(hcfa_m_fromcov)["df"]
d <- df1 - df2
cat("df for NET: ", d, "\n")
## df for NET: -1
chi2 <- fitmeasures(hcfa_m_fromcov)["chisq"]
cat("Chi-square for NET test: ", chi2, "\n")
## Chi-square for NET test: 0.0349
We see that the degrees of freedom difference is negative. Thus, the saturated model is not nested in the hierarchical model. This makes sense intuitively in that the hierarhical model imposes some structure on the factor model, whereas the saturated model estimates unique covariances among factors.
We can reverse the NET test (in terms of what is \(M1\) versus \(M2\)). To test whether the hierarchical CFA is nested in the saturated model, we first fit the saturated CFA, then test the hierarchical model using the model-implied covariance matrix from the saturated model.
#is the H CFA nested in the saturated (first-order)
df1 <- fitmeasures(hcfa_m)["df"]
mimplied_hcfa <- fitted(hcfa_m)$cov
#now use the model-implied covariance as the data to analyze for hcfa_m
hcfa_m_fromcov <- sem(bmodel, sample.cov=mimplied_hcfa, sample.nobs = 275)
df2 <- fitmeasures(hcfa_m_fromcov)["df"]
d <- df1 - df2
cat("df for NET: ", d, "\n")
## df for NET: 1
chi2 <- fitmeasures(hcfa_m_fromcov)["chisq"]
cat("Chi-square for NET test: ", chi2, "\n")
## Chi-square for NET test: 0
Here, we see the expected result for a nested model: the df difference is positive (1) and the \(\chi^2\) of the second model (using the model-implied covariance of the first) is 0. Thus, the hierarchical CFA model is nested in the saturated first-order factor model.
The semTools
package provides a net()
function that will compute nesting/equivalence tests for all pairs of models passed in (as separate arguments). This package also provides the moreFitIndices
function, which provides the \(AIC_C\) as aic.smallN
.
#the easy way
semTools::net(l1fac_m, hcfa_m)
##
## If cell [R, C] is TRUE, the model in row R is nested within column C.
##
## If cell [R, C] is TRUE and the models have the same degrees of freedom,
## they are equivalent models. See Bentler & Satorra (2010) for details.
##
## If cell [R, C] is NA, then the model in column C did not converge when
## fit to the implied means and covariance matrix from the model in row R.
##
## The hidden diagonal is TRUE because any model is equivalent to itself.
## The upper triangle is hidden because for models with the same degrees
## of freedom, cell [C, R] == cell [R, C]. For all models with different
## degrees of freedom, the upper diagonal is all FALSE because models with
## fewer degrees of freedom (i.e., more parameters) cannot be nested
## within models with more degrees of freedom (i.e., fewer parameters).
##
## l1fac_m hcfa_m
## l1fac_m (df = 48)
## hcfa_m (df = 49) TRUE
semTools::moreFitIndices(l1fac_m)
## gammaHat adjGammaHat baseline.rmsea aic.smallN bic.priorN
## 9.79e-01 9.66e-01 3.85e-01 1.13e+04 1.15e+04
## hqc sic
## 1.14e+04 1.14e+04