1 Getting to know you
2 Welcome to SEM!
3 A bit of math housekeeping
4 What do we need to learn?
5 Covariance review (Welcome back to correlation and regression!)
6 Covariance <-> correlation
- 6.1 Factors affecting correlations
7 Intuitions about covariance structure modeling
8 Single-predictor regression
9 Multiple regression
10 Assumptions of regression (from Kline)
11 Suppression
12 Use lavaan for simple multiple regression

1 Getting to know you

Name
Area, year
Advisor
Research interests
Quantitative training thus far
Experience with SEM?
What you hope to get out of the course

2 Welcome to SEM!

Structural equation modeling is multivariate regression technique that has grown immensely in popularity over the past two decades. This rise reflects at least three related phenomena: 1) the increasing sophistication of quantitative training in the social sciences; 2) the availability of large multivariate datasets; and 3) the computational power to estimate complex, highly parameterized models.

2.1 What is SEM?

Structural equation modeling is not one thing. Rather, it is a family of techniques for testing multivariate hypotheses about the relationships among observed and (often) latent variables. In general, the goal of SEM is to understand structural patterns in the associations among variables. For example, can twenty items on a depression inventory be thought to measure an underlying (hypothetical) construct?

Critically, SEM allows researchers to test hypotheses about observed variables, as well as latent variables that are not directly measured. This opens the door to develop the construct validity of a latent variable (e.g., depression), or to test hypotheses about how constructs are related to each other (e.g., the relationship between anxiety and depression).

SEM has two important components: structure and measurement.

2.2 Structural model

The structural model represents the relationships among constructs, whether observed or latent. This is closest to standard regression techniques, where believe that one variable causes (or at least is associated with) another.

For example, age often predicts level of performance on working memory tests. Thus, \(Age \rightarrow WorkingMemory\). A developmental psychologist might be quick to note that it may not be age that is causal per se. Rather, the proximate cause might be increasing functional integration of brain networks, or some other outcome of aging. We might think of this in terms of age having an indirect effect on working memory via one or more intermediate (mediating) variables:

#make up data
df <- matrix(rnorm(1000), nrow=100, ncol=4, 
             dimnames = list(NULL, c("Age", "WorkingMemory", "Education", "NetworkIntegration"))
)

#lavaan model syntax
m <- '
WorkingMemory ~ Age
WorkingMemory ~ NetworkIntegration
WorkingMemory ~ Education
Education ~ Age
NetworkIntegration ~ Age
'
mm <- sem(m, data=df)
suppressWarnings(semPaths(mm, style='lisrel', what="diagram", layout = "spring", 
         nCharNodes=20, sizeMan=25, sizeMan2=10, label.scale=FALSE))

Did you notice we just fit our first SEM?! Pretty exciting… I mean, it was technically a path analysis because all variables were observed, but we’ll save that discussion for another day.

2.3 Measurement model

The measurement model specifies how observed variables are related to each other to form underlying latent variables.

That is, observed variables are considered indicators (i.e., indirect measures) of a latent variable. And latent variables typically represent hypothetical constructs in SEM. For example, narcissism is a construct that describes the tendency to be grandiose, vengeful, self-enhancing, and potentially vulnerable to rejection. Research on narcissism (e.g., Pincus & Lukowitsky, 2010) suggests that it likely has multiple facets, such as grandiosity versus vulnerability. So, perhaps it is a hypothetical construct composed of separable latent subcomponents.

The role of a measurement model is to test whether the underlying latent structure of the indicators matches the researcher’s theory. For example, a psychometric item intended to capture grandiosity may not correlate with other grandiosity items. Thus, it would likely be a poor indicator of that latent construct. We’ll spend more time on the notation next week, but this is the basic idea:

grViz("
digraph narc {

  # a 'graph' statement
  graph [overlap = true, fontsize = 12]

  # nodes for observed and latent
  node [shape = circle, fontname = Helvetica]
  'Pathological\nNarcissism'; 'Narcissistic\nGrandiosity'; 'Narcissistic\nVulnerability';
  
  node [shape = box, fontname = Helvetica]
  i1; i2; i3; i4; i5; i6

  # edges
  'Pathological\nNarcissism'->'Narcissistic\nGrandiosity' 'Pathological\nNarcissism'->'Narcissistic\nVulnerability'
  'Narcissistic\nGrandiosity'->'i1' 'Narcissistic\nGrandiosity'->'i2' 'Narcissistic\nGrandiosity'->'i3'
  'Narcissistic\nVulnerability'->'i4' 'Narcissistic\nVulnerability'->'i5' 'Narcissistic\nVulnerability'->'i6'
}")

3 A bit of math housekeeping

The use of a ‘hat’ over a variable denotes an estimator of a quantity. So, \(\hat{Y}\) indicates the predicted values of \(Y\) from a statistical model.
Sample means are denoted \(\bar{x}\).
Sample variances are \(s^2_x\) and standard deviations are \(s_x\).
Matrices are denoted by boldface capital letters, \(\mathbf{X}\).
Vectors are denoted by boldface lowercase letters, \(\mathbf{x}\).
The transpose of matrix flips a matrix along its diagonal (transposing rows and columns). It is denoted with the prime operator, \(\mathbf{X}'\) and the t() function in R.
Matrices are composed of rows and columns (we won’t deal with arrays of 3+ dimensions). Rows are the first dimension, columns are the second. So, a matrix composed of \(n\) observations and \(p\) variables is denoted: \[\underset{n \times p}{\mathbf{X}}\]

In R, selecting row i in column j from matrix X is: X[i,j].

A Row vector is a collection of numbers having one row and \(m\) columns (i.e., 1 x \(m\)).

\[ \begin{equation} \mathbf{x} = \begin{bmatrix} x_1 & x_2 & \cdots & x_m \end{bmatrix} \end{equation} \]

A column vector is a collection of numbers having many rows and one column (i.e., \(m\) x 1).

\[ \begin{equation} \mathbf{x} =\begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_m \end{bmatrix} \end{equation} \]

Thus, the transpose of a column vector is a row vector, and vice versa:

\[ \begin{equation} \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_m \end{bmatrix}' = \begin{bmatrix} x_1 & x_2 & \cdots & x_m \end{bmatrix} \end{equation} \]

Random variables (i.e., variables whose values are depend on uncertain processes) are denoted by capital letters (e.g., \(X\)). Most psychometric data are random variables (e.g., height, weight, IQ, depression severity). Realizations of random variables (i.e., draws from the processes that control the variable) are denoted by lower case letters (e.g., \(x\)).

I will probably screw these up along the way, but I wanted to mention them up front to be on the same page!

4 What do we need to learn?

Review of articles identified by students.

5 Covariance review (Welcome back to correlation and regression!)

We’ll use housing price data from Boston’s 1970 census to review important concepts in correlation and regression. This is a nice dataset for regression because there are many interdependent variables: crime, pollutants, age of properties, etc.

#example dataset from mlbench package with home prices in Boston by census tract
data(BostonHousing2)
BostonSmall <- BostonHousing2 %>% dplyr::select(
  cmedv, #median value of home in 1000s
  crim, #per capita crime by town
  nox, #nitric oxide concentration
  lstat #proportion of lower status
  )

n <- nrow(BostonSmall) #number of observations
k <- ncol(BostonSmall) #number of variables

#show variables
splom(BostonSmall)

We can use the cov() function in R to get a variance-covariance matrix:

from_r <- cov(BostonSmall)
from_r

##         cmedv   crim     nox   lstat
## cmedv  84.312 -30.77 -0.4568 -48.577
## crim  -30.770  73.99  0.4196  27.986
## nox    -0.457   0.42  0.0134   0.489
## lstat -48.577  27.99  0.4889  50.995

And we can look under the hood to verify that we understand the concept and formula:

\[ s_{XY} = \frac{\sum{(X-\bar{X})(Y-\bar{Y})}}{N-1} \]

mycov <- function(x, y) {
  stopifnot(length(x)==length(y))
  sum((x - mean(x))*(y - mean(y)))/(length(x) - 1)
}

cmat <- matrix(NA, nrow=k, ncol=k)
indices <- which(is.na(cmat), arr.ind=TRUE)
cmat[indices] <- apply(indices, 1, function(v) { 
  mycov(BostonSmall[, v[1] ], BostonSmall[, v[2] ])
})

cmat

##         [,1]   [,2]    [,3]    [,4]
## [1,]  84.312 -30.77 -0.4568 -48.577
## [2,] -30.770  73.99  0.4196  27.986
## [3,]  -0.457   0.42  0.0134   0.489
## [4,] -48.577  27.99  0.4889  50.995

Or we can do the matrix math by hand:

\[\mathbf{S_{XX}} = 1/(N-1)\mathbf{X'X}\]

if \(\mathbf{X}\) is a matrix that is already in deviation form (i.e., each variable is mean-centered).

#for the mathy among you...
bmat <- as.matrix(BostonSmall) #convert to numeric matrix
n <- nrow(bmat) #number of observations

#1) compute the means for each variable and populate a matrix of the same size
M <- matrix(data=1, nrow=n, ncol=1) %*% colMeans(bmat)

#2) compute the deviation of each score from its variable mean
D <- bmat - M

#3) the covariance matrix is D'D/n-1
C <- t(D) %*% D / (n-1)

C

##         cmedv   crim     nox   lstat
## cmedv  84.312 -30.77 -0.4568 -48.577
## crim  -30.770  73.99  0.4196  27.986
## nox    -0.457   0.42  0.0134   0.489
## lstat -48.577  27.99  0.4889  50.995

6 Covariance <-> correlation

The correlation of x and y is a covariance that has been standardized by the standard deviations of x and y. This yields a scale-insensitive measure of the linear association of x and y.

\[r_{XY}= \frac{s_{XY}}{s_{X} s_{Y}}\]

rmat <- array(NA, dim=dim(cmat)) #initialize an empty correlation matrix
variances <- diag(cmat) #variances of each variable are on the diagonal of the covariance matrix
rmat[indices] <- apply(indices, 1, function(v) { cmat[v[1],v[2]] / sqrt(variances[ v[1] ] * variances[ v[2] ]) })
rmat

##        [,1]   [,2]   [,3]   [,4]
## [1,]  1.000 -0.390 -0.429 -0.741
## [2,] -0.390  1.000  0.421  0.456
## [3,] -0.429  0.421  1.000  0.591
## [4,] -0.741  0.456  0.591  1.000

And it’s easy to generate a correlation matrix from R directly using the cor() function:

cor(BostonSmall, method="pearson")

##        cmedv   crim    nox  lstat
## cmedv  1.000 -0.390 -0.429 -0.741
## crim  -0.390  1.000  0.421  0.456
## nox   -0.429  0.421  1.000  0.591
## lstat -0.741  0.456  0.591  1.000

There is also a handy cov2cor() function for converting a covariance matrix to a correlation matrix:

cov2cor(cmat)

##        [,1]   [,2]   [,3]   [,4]
## [1,]  1.000 -0.390 -0.429 -0.741
## [2,] -0.390  1.000  0.421  0.456
## [3,] -0.429  0.421  1.000  0.591
## [4,] -0.741  0.456  0.591  1.000

Rearranging the equation above, we can convert correlations to covariances by: \[s_{XY}= r_{XY} s_X s_Y\]

And lavaan has a cor2cov() function that takes a covariance matrix and vector of standard deviations.

cormat <- cor(BostonSmall)
sds <- apply(BostonSmall, 2, sd)
lavaan::cor2cov(cormat, sds)

##         cmedv   crim     nox   lstat
## cmedv  84.312 -30.77 -0.4568 -48.577
## crim  -30.770  73.99  0.4196  27.986
## nox    -0.457   0.42  0.0134   0.489
## lstat -48.577  27.99  0.4889  50.995

6.1 Factors affecting correlations

Linearity: less linear, lower \(r\)
Variability: smaller \(s\), smaller \(r\)
Distribution of scores: e.g., opposing skews, smaller \(r\)
Reliability: lower measurement precision in either variable, lower \(r\)

7 Intuitions about covariance structure modeling

A broad goal of SEM is to test hypotheses about the structure of covariance matrices. For example, are two variables related because one causes (or at least predicts) the other? Are two variables correlated because they are both indicators of an underlying construct?

Here’s an example of the scientific power we’d like to leverage in SEM (courtesy of Aidan Wright)

7.2 What can SEM bring to this problem?

A crucial point about SEM is that the ‘goodness’ of a model can be thought of in terms of how well it predicts (captures) the full covariance matrix of the data using fewer parameters. So in a covariance matrix, we have

\[p = \frac{k(k+1)}{2}\]

free parameters. That is, we have to estimate each cell in the covariance matrix. In the case of dataset above with 9 variables, we have

\[ p = \frac{9 \cdot 10}{2} = 45\]

parameters: 36 covariances, and 9 variances (diagonal). SEM seeks to capture the full covariance matrix \(\mathbf{S_{XX}}\) using a model-implied covariance matrix \(\hat{\Sigma}\). If this can be done with fewer than 45 parameters, it suggests that there is some structure underlying the covariance matrix that is more parsimonious than a purely unstructured account. This relates to the degrees of freedom in the model, which we will dive into when discussing identification later in the course.

7.3 Testing a model of covariance structure

We can test how well a candidate model captures the 9 x 9 covariance matrix of these data using fewer than 45 parameters.

We’ve represented the full covariance matrix using 22 parameters. Not a bad start! (assuming that it captures all 45 covariance values).

8 Single-predictor regression

Single-predictor regression (sometimes called ‘bivariate regression’, though this is confusing) is simply a re-expression of the covariance between two variables. Importantly, regression places variables into the asymmetric roles of criterion and predictor.

That is, the goal of regression is to predict values of the criterion variable based on one or more predictors. Whereas correlation is inherently non-directional, regression moves us one step closer to positing a belief that one can estimate the values of the criterion if one knows values of the predictors. Although this does not imply strong causality, conceptual clarity about the roles of variables in a regression model is essential. And having a clear head about predictor and criterion variable becomes even more essential in SEM!

Here’s the basic equation: \[ \hat{Y} = B_{0} + B_{1}X + \varepsilon \]

The expected change in Y for a one-unit change in X is encoded by \(B_1\). Note that by adding \(B_0\), we are representing a belief about the mean of \(\hat{Y}\). We can re-express \(B_1\) in terms of correlation or covariance, as above.

First, let’s set the roles of criterion and predictor. Can we estimate the median home price (criterion) on the basis of crime (predictor)? Let’s fit this equation in R:

\[\hat{\textrm{Price}} = B_{0} + B_{1}\textrm{Crime} + \varepsilon\]

m <- lm(cmedv ~ crim, BostonSmall)
summary(m)

## 
## Call:
## lm(formula = cmedv ~ crim, data = BostonSmall)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -16.95  -5.47  -2.00   2.53  29.81 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  24.0316     0.4082    58.9   <2e-16 ***
## crim         -0.4159     0.0438    -9.5   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.47 on 504 degrees of freedom
## Multiple R-squared:  0.152,  Adjusted R-squared:  0.15 
## F-statistic: 90.2 on 1 and 504 DF,  p-value: <2e-16

We see that \(B_1\) is -.42, indicating a negative relationship between price and crime.

We can obtain the same estimate by knowing the correlation between price and crime, as well as the standard deviations of each:

\[B_1 = r_{XY}\frac{s_Y}{s_X}\]

unname(cormat["cmedv","crim"]*(sds["cmedv"]/sds["crim"]))

## [1] -0.416

We can also visualize the association between these variables as a scatterplot, adding the predicted values \(\hat{\textrm{Price}}\) at every value of \(\textrm{Crime}\) (i.e., the fitted regression line).

BostonSmall <- BostonSmall %>% add_predictions(m)
ggplot(BostonSmall, aes(x=crim, y=cmedv)) + 
  geom_point() + geom_line(aes(y=pred)) + theme_bw()

8.1 Regression diagnostics

This is clearly not a straightforward relationship! The distribution of crime has a long positive tail. We can visualize the residuals of this model, \(\textrm{Price} - \hat{\textrm{Price}}\) to get a sense of things:

ggplot(BostonSmall, aes(x=crim, y=cmedv)) + 
  geom_point() + 
  geom_line(aes(y=pred)) + 
  geom_segment(aes(xend=crim, yend=pred), color="blue", alpha=0.2) +
  theme_bw()

Or the association between the predicted data and residuals:

BostonSmall <- BostonSmall %>% add_residuals(m)
ggplot(BostonSmall, aes(x=pred, y=resid)) + 
  geom_point() + stat_smooth(method="loess")

Or the distribution of residuals (which should be normal):

ggplot(BostonSmall, aes(x=resid)) + geom_histogram(bins=15)

The residuals do not look white and there is evidence of heteroscedasticity. The distribution of crime is very difficult to work with, and we won’t solve it perfectly now.

ggplot(BostonSmall, aes(x=crim)) + geom_histogram(bins=15)

But a log2 transformation of crime moves things in the right direction in terms of normalizing the variance while retaining interpretability (one unit change = 2x crime). Remember that regression doesn’t assume normality of predictors, but instead of the residuals.

ggplot(BostonSmall, aes(x=log2(crim))) + geom_histogram(bins=15)

Refit the model using the log2 transformation. This is moving in the right direction and will have to do for now.

BostonSmall$log_crim <- log2(BostonSmall$crim)
m2 <- lm(cmedv ~ log_crim, BostonSmall)
BostonSmall <- BostonSmall %>% add_predictions(m2)
BostonSmall <- BostonSmall %>% add_residuals(m2)
ggplot(BostonSmall, aes(x=log_crim, y=cmedv)) + 
  geom_point() + 
  geom_line(aes(y=pred)) + 
  geom_segment(aes(xend=log_crim, yend=pred), color="blue", alpha=0.2) + 
  theme_bw()

8.2 Orthogonality of residuals with predictor(s)

Let’s return to the relationships among correlation, covariance, and regression. As Kline notes (p. 27), the residuals of a regression should be orthogonal to the predictor. Indeed, the least squares criterion requires this: \(\underset{\mathbf{X} \in \mathbb{R}}{\textrm{min}} \sum(Y - \hat{Y})^2\).

cor.test(residuals(m2), BostonSmall$log_crim)

## 
##  Pearson's product-moment correlation
## 
## data:  residuals(m2) and BostonSmall$log_crim
## t = 1e-15, df = 500, p-value = 1
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.0872  0.0872
## sample estimates:
##      cor 
## 4.34e-17

8.3 Swapping criterion and predictor

Unlike correlation or covariance, which are symmetric and undirected, swapping the criterion and predictor in single-predictor regression changes the slope estimate of their association. If we treat crime as the criterion and housing price as the predictor, the slope now represents change in housing prices as a function of change in crime (under log2, a one unit change indicates a 2x increase in crime rate). To wit:

\[ \textrm{log}_2(\hat{\textrm{Crime)}} = B_0 + B_1 \textrm{Price} \]

m3 <- lm(log_crim ~ cmedv, BostonSmall)
summary(m3)

## 
## Call:
## lm(formula = log_crim ~ cmedv, data = BostonSmall)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -5.951 -2.186 -0.655  2.521  8.600 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.3738     0.3273    7.25  1.6e-12 ***
## cmedv        -0.1553     0.0135  -11.54  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.78 on 504 degrees of freedom
## Multiple R-squared:  0.209,  Adjusted R-squared:  0.208 
## F-statistic:  133 on 1 and 504 DF,  p-value: <2e-16

We can, however, still recover the correlation based on the unstandardized regression coefficient and the standard deviations of the variables:

\[r_{\textrm{Price,log2(Crime)}} = B_1 \frac{s_{\textrm{log2(Crime)}}}{s_{\textrm{Price}}}\]

(coef(m3)["cmedv"] * sd(BostonSmall$cmedv)) / sd(BostonSmall$log_crim)

##  cmedv 
## -0.457

cor.test(~log_crim + cmedv, BostonSmall)

## 
##  Pearson's product-moment correlation
## 
## data:  log_crim and cmedv
## t = -10, df = 500, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.524 -0.386
## sample estimates:
##    cor 
## -0.457

8.4 Standardized parameter estimates

This brings us to standardization in regression, which attempts to re-express the parameter estimates in units that are not specific to the scale of each variable. More specifically, standardized parameter estimates represent the association in standard deviation units. In the case of crime and housing prices, we could ask how many standard deviations lower are housing prices given a one SD increase in crime.

Standardization is often achieved by z-scoring all variables (or at least dividing each by its own SD) prior to calculating the regression:

BostonSmall_std <- BostonSmall %>% select(-pred, -resid) %>% mutate_all(scale)
m4 <- lm(cmedv ~ log_crim, BostonSmall_std)
summary(m4)

## 
## Call:
## lm(formula = cmedv ~ log_crim, data = BostonSmall_std)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -1.885 -0.563 -0.270  0.290  3.627 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.58e-18   3.96e-02     0.0        1    
## log_crim    -4.57e-01   3.96e-02   -11.5   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.89 on 504 degrees of freedom
## Multiple R-squared:  0.209,  Adjusted R-squared:  0.208 
## F-statistic:  133 on 1 and 504 DF,  p-value: <2e-16

Thus, in the context of single-predictor regression, the standardized regression coefficient is the correlation.

\[r_{xy} := \beta_1 := B_1 \frac{s_y}{s_x} \]

coef(m2)["log_crim"] * sd(BostonSmall$log_crim) / sd(BostonSmall$cmedv)

## log_crim 
##   -0.457

Standardization can also be achieved after model estimation by dividing by the relevant standard deviations. I’ll skip the details for now, but check out the lm.beta package.

lm.beta(m2)

## 
## Call:
## lm(formula = cmedv ~ log_crim, data = BostonSmall)
## 
## Standardized Coefficients::
## (Intercept)    log_crim 
##       0.000      -0.457

8.5 A few other regression equivalencies

As Kline notes, there are a few other handy properties here:

8.5.1 The unsigned correlation of predictor and criterion is equal to the unsigned correlation of the criterion with the fitted/predicted values

\[ |r_{XY}| = r_{Y\hat{Y}} \]

More generally, as in multiple regression, this should be expressed:

\[ R_{Y \cdot X_{1...p}} = r_{Y\hat{Y}} \]

cor.test(fitted(m2), BostonSmall$cmedv)

## 
##  Pearson's product-moment correlation
## 
## data:  fitted(m2) and BostonSmall$cmedv
## t = 10, df = 500, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.386 0.524
## sample estimates:
##   cor 
## 0.457

8.5.2 Total variance in \(Y\) can be decomposed into the predicted part and residual part

\[ s_{Y}^2 = s_{\hat{Y}}^2 + s_{Y - \hat{Y}}^2 \]

var(BostonSmall$cmedv)

## [1] 84.3

var(fitted(m2)) + var(resid(m2))

## [1] 84.3

8.5.3 The proportion of shared variance between \(X\) and \(Y\) is the proportion of \(Y\) variance explained relative to its total variance

\[ r_{XY}^2 = \frac{s_{\hat{Y}}^2}{s_{Y}^2} \]

rsquare(m2, BostonSmall)

## [1] 0.209

var(fitted(m2)) / var(BostonSmall$cmedv)

## [1] 0.209

9 Multiple regression

In single-predictor regression, there are straightforward mappings back to covariance and correlation. As we consider the joint effects of multiple predictors, however, the math gets a bit harder. Conceptually, the goal of multiple (predictor) regression is to quantify the effects of several predictors on a criterion. In multiple regression, each parameter estimate quantifies the partial relationship of a given predictor \(X_1\) with the criterion \(Y\), controlling for all other predictors \(X_{2...p}\). Thus, how much of a change in \(Y\) is expected from a unit change in \(X1\) at a given level of \(X_{2...p}\) (aka ‘holding constant’ those predictors).

\[ \hat{Y} = B_0 + B_1 X_1 + B_2 X_2 + B_3 X_3 + ... B_p X_p + \varepsilon \]

Let’s consider the case in which housing prices are predicted by both crime and nitric oxide concentration:

\[\hat{\textrm{Price}} = B_{0} + B_{1}\textrm{log}_2\textrm{(Crime)} + B_2 \textrm{Nox} + \varepsilon\]

m5 <- lm(cmedv ~ log_crim + nox, BostonSmall)
summary(m5)

## 
## Call:
## lm(formula = cmedv ~ log_crim + nox, data = BostonSmall)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -15.97  -4.92  -2.27   2.66  32.96 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   29.470      3.004    9.81  < 2e-16 ***
## log_crim      -0.925      0.188   -4.91  1.2e-06 ***
## nox          -14.391      5.070   -2.84   0.0047 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.12 on 503 degrees of freedom
## Multiple R-squared:  0.222,  Adjusted R-squared:  0.219 
## F-statistic: 71.6 on 2 and 503 DF,  p-value: <2e-16

We can see that there are negative associations of both crime and pollutants with home prices. As mentioned above, the multiple correlation of all predictors with the criterion is given by:

\[ R_{Y \cdot X_{1...p}} = r_{Y\hat{Y}} \]

cor.test(fitted(m5), BostonSmall$cmedv)

## 
##  Pearson's product-moment correlation
## 
## data:  fitted(m5) and BostonSmall$cmedv
## t = 10, df = 500, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.400 0.536
## sample estimates:
##   cor 
## 0.471

And the variance explained:

cor(fitted(m5), BostonSmall$cmedv)^2

## [1] 0.222

9.1 Standardized parameters in multiple regression

To extend standardized parameter estimates to the multiple regression context, we have to account for the shared variance among predictors.

\[ \beta_\textrm{Nox} = \frac{r_\textrm{Nox,Price} - r_\textrm{Crime,Price} r_\textrm{Nox,Crime}}{1 - r_\textrm{Nox,Crime}^2} \]

So, we adjust the association Nox with Price by the product of the correlations of Crime with Price (other predictor with criterion) and the correlation of Nox and Crime (predictor correlation). The magnitude of the standardized partial coefficient is reduced by adjusting for the correlation among predictors in the denominator. In the case of orthogonal (uncorrelated) predictors, note that the standardized beta simplifies to the bivariate correlation, as in single-predictor regression:

\[ \begin{aligned} r_\textrm{Nox,Crime} :&= 0 \\ \beta_\textrm{Nox} & = r_\textrm{Nox,Price} \end{aligned} \]

Unstandardized betas can be computed from the standardized betas as above:

\[ B_\textrm{Nox} = \beta_\textrm{Nox}\left(\frac{s_\textrm{Price}}{s_\textrm{Nox}}\right) \]

9.2 Partial and part correlations

Also note that we can consider partial relationships in multiple regression in correlational (standardized) terms. But we have to decide how we wish to think about shared variance among predictors. There are two options:

9.2.1 Partial correlation

The association of \(X1\) and \(Y\), controlling for shared variance with \(X2\). Note that the goal here is to remove the influence of \(X2\) from both variables before considering their association.

\[ r_{Y,X1 \cdot X2} = \frac{r_{Y,X1} - r_{X1,X2}r_{Y,X2}}{\sqrt{(1-r_{X1,X2}^2)(1-r_{Y,X2}^2)}} \]

9.2.2 Part (semipartial)

The association of \(X1\) with \(Y\), controlling for shared variance between \(X1\) and \(X2\). Importantly, the association of \(X2\) with \(Y\) is not removed.

\[ r_{Y(X1 \cdot X2)} = \frac{r_{Y,X1} - r_{Y,X2}r_{X1,X2}}{\sqrt{1-r_{X1,X2}^2}} \]

9.2.3 Graphical expression

Venn diagram of shared variance

\(a\) is the part correlation of \(X1\) with \(Y\), adjusting for \(X1-X2\) association.
\(\frac{a}{a+d}\) is the partial correlation of \(X1\) with \(Y\), controlling for \(X2\).

9.3 Decomposing variance explained in regression (standardized coefficients, part, or partial correlations)

There is not a single best measure of association between a predictor and criterion in multiple regression. Rather, the challenge is that regression coefficients, part correlations, and partial correlations all represent the association of a predictor with a criterion conditional on the other predictors in the model.

The good news is that standardized regression coefficients are interpretable because they are on the same scale and have a known interpretation (relationship between \(Y\) and \(X_1\) at a given level of \(X_2 ... X_p\)). The bad news is that they do not sufficiently weight the direct effect of a predictor on the criterion, which is often of substantive interest (Johson and Lebreton, 2004).

There are additional measures of the relative importance of predictors in a regression model that go beyond what we will cover in class. I would suggest reading this paper and using the relaimpo package to examine relative importance of predictors in regression.

One intuitive, but flawed measure, of partial association is the product of the standardized coefficient with the correlation of the predictor and the criterion (Hoffman, 1960):

\[ r^2_{Y,X1\cdot (X_2...X_p)} = \beta_{X_1}r_{Y,X_1} \]

By using the marginal correlation (i.e., the bivariate relationship, averaging over all other predictors), the sum of these estimates across all predictors equals the model \(R^2\).

If you’re interested in the relative importance of a predictor in a multiple regression model, I would suggest using the Lindeman, Merenda and Gold (‘lmg’) method of computing relative importance over standardized coefficients, Hoffman’s method, part, or partial correlation. It’s described in the paper above. Here’s a quick example:

m_relimport <- lm(cmedv ~ log_crim + nox, BostonSmall)
stdbetas <- lm.beta(m_relimport)$standardized.coefficients #standardized coefficients

stdbetas["log_crim"] * cor(BostonSmall$cmedv, BostonSmall$log_crim)

## log_crim 
##    0.144

stdbetas["nox"] * cor(BostonSmall$cmedv, BostonSmall$nox)

##   nox 
## 0.078

#here is the broader set of relatience importance measures to consider (esp. 'lmg')
calc.relimp(m_relimport, type = c("lmg", "first", "last", "betasq", "pratt", "genizi", "car"))

## Warning in rev(variances[[p]]) - variances[[p + 1]]: Recycling array of length 1 in vector-array arithmetic is deprecated.
##   Use c() or as.vector() instead.

## Response variable: cmedv 
## Total response variance: 84.3 
## Analysis based on 506 observations 
## 
## 2 Regressors: 
## log_crim nox 
## Proportion of variance explained by model: 22.2%
## Metrics are not normalized (rela=FALSE). 
## 
## Relative importance metrics: 
## 
##             lmg   last first betasq pratt genizi    car
## log_crim 0.1232 0.0373 0.209 0.0986 0.144 0.1232 0.1310
## nox      0.0984 0.0125 0.184 0.0330 0.078 0.0984 0.0906
## 
## Average coefficients for different model sizes: 
## 
##              1X     2Xs
## log_crim  -1.35  -0.925
## nox      -34.02 -14.391

As a reminder, the complexity of partial relationships in multiple regression largely comes from the conceptual (and statistical) difficulty of handling the relationship among predictors. If the predictors are uncorrelated with each other, then everything you know from single-predictor regression holds true – for example, the equivalence of bivariate correlation and standardized coefficients.

A useful way to think about this is that if the predictors are truly orthogonal to each other, then if I regress \(\mathbf{y}\) on \(\mathbf{x}_1\) alone, none of the variance accounted for by \(\mathbf{x}_2\) has been removed. Indeed, one could take the residuals of the \(Y ~ X_1\) regression into a new model:

\[ \begin{align} r_{X_1,X_2} :&= 0 \\ \hat{Y} &= B_0 + B_1 X_1 + \varepsilon \\ R_{X_1} &= Y - \hat{Y} \\ \hat{R}_{X_1} &= B_0 + B_2 X_2 + \varepsilon \end{align} \]

The coefficients for \(B_1\) and \(B_2\) will be identical when estimated jointly:

\[ \hat{Y} = B_0 + B_1 X_1 + B_2 X_2 + \varepsilon \]

9.3.1 Equivalence of coefficients to single-predictor regression when predictors are orthogonal (uncorrelated)

Here’s a quick demo. First a structure in which \(X_1\) and \(X_2\) are associated with, \(Y\), but not each other. The underlying correlation structure is:

mu <- rep(0,3)
Sigma <- tribble(
  ~y, ~x1, ~x2,
  1, 0.4, 0.4,
  0.4, 1, 0,
  0.4, 0, 1
)

print(Sigma)

## # A tibble: 3 x 3
##       y    x1    x2
##   <dbl> <dbl> <dbl>
## 1   1.0   0.4   0.4
## 2   0.4   1.0   0.0
## 3   0.4   0.0   1.0

sim_nocorr <- data.frame(MASS::mvrnorm(n=5000, mu=mu, Sigma=Sigma, empirical=TRUE))
names(sim_nocorr) <- c("y", "x1", "x2")
cor(sim_nocorr) #verify that x1 and x2 correlate with y, but not each other

##      y       x1       x2
## y  1.0 4.00e-01 4.00e-01
## x1 0.4 1.00e+00 4.98e-18
## x2 0.4 4.98e-18 1.00e+00

First, here are standardized betas under joint versus single-predictor models:

lm.beta(lm(y ~ x1 + x2, sim_nocorr))

## 
## Call:
## lm(formula = y ~ x1 + x2, data = sim_nocorr)
## 
## Standardized Coefficients::
## (Intercept)          x1          x2 
##         0.0         0.4         0.4

lm.beta(lm(y ~ x1, sim_nocorr))

## 
## Call:
## lm(formula = y ~ x1, data = sim_nocorr)
## 
## Standardized Coefficients::
## (Intercept)          x1 
##         0.0         0.4

lm.beta(lm(y ~ x2, sim_nocorr))

## 
## Call:
## lm(formula = y ~ x2, data = sim_nocorr)
## 
## Standardized Coefficients::
## (Intercept)          x2 
##         0.0         0.4

Semipartial correlation of \(Y\) with \(X_1\) net \(X_2\):

with(sim_nocorr, spcor.test(y, x1, x2))

estimate	p.value	statistic	n	gp	Method
0.4	0	30.9	5000	1	pearson

Semipartial correlation of \(Y\) with \(X_2\) net \(X_1\):

with(sim_nocorr, spcor.test(y, x2, x1))

estimate	p.value	statistic	n	gp	Method
0.4	0	30.9	5000	1	pearson

Bivariate correlation of \(Y\) with \(X_1\):

with(sim_nocorr, cor(y, x1))

## [1] 0.4

Bivariate correlation of \(Y\) with \(X_2\):

with(sim_nocorr, cor(y, x2))

## [1] 0.4

Now, the equivalence of coefficients estimated sequentially (one predictor at a time) versus jointly under orthogonal predictors:

#demonstrate equivalence of sequential versus join estimation under orthogonal predictors
m1 <- lm(y ~ x1, sim_nocorr)
summary(m1)

## 
## Call:
## lm(formula = y ~ x1, data = sim_nocorr)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.893 -0.614  0.010  0.638  2.862 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 3.38e-17   1.30e-02     0.0        1    
## x1          4.00e-01   1.30e-02    30.9   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.917 on 4998 degrees of freedom
## Multiple R-squared:  0.16,   Adjusted R-squared:  0.16 
## F-statistic:  952 on 1 and 4998 DF,  p-value: <2e-16

sim_nocorr$resid_x1 <- resid(m1)
m2 <- lm(resid_x1 ~ x2 - 1, sim_nocorr) #remove intercept since that was pulled out in m1
summary(m2)

## 
## Call:
## lm(formula = resid_x1 ~ x2 - 1, data = sim_nocorr)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.0276 -0.5724  0.0129  0.5697  2.6885 
## 
## Coefficients:
##    Estimate Std. Error t value Pr(>|t|)    
## x2   0.4000     0.0117    34.3   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.825 on 4999 degrees of freedom
## Multiple R-squared:  0.19,   Adjusted R-squared:  0.19 
## F-statistic: 1.18e+03 on 1 and 4999 DF,  p-value: <2e-16

m3 <- lm(y ~ x1 + x2, sim_nocorr)
summary(m3)

## 
## Call:
## lm(formula = y ~ x1 + x2, data = sim_nocorr)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.0276 -0.5724  0.0129  0.5697  2.6885 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 3.72e-17   1.17e-02     0.0        1    
## x1          4.00e-01   1.17e-02    34.3   <2e-16 ***
## x2          4.00e-01   1.17e-02    34.3   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.825 on 4997 degrees of freedom
## Multiple R-squared:  0.32,   Adjusted R-squared:  0.32 
## F-statistic: 1.18e+03 on 2 and 4997 DF,  p-value: <2e-16

9.3.2 Non-equivalence of coefficients when predictors are correlated

Now add a correlation of 0.6 between the predictors and repeat the various tests:

Sigma <- tribble(
  ~y, ~x1, ~x2,
  1, 0.4, 0.4,
  0.4, 1, 0.6,
  0.4, 0.6, 1
)
print(Sigma)

## # A tibble: 3 x 3
##       y    x1    x2
##   <dbl> <dbl> <dbl>
## 1   1.0   0.4   0.4
## 2   0.4   1.0   0.6
## 3   0.4   0.6   1.0

sim_withcorr <- data.frame(MASS::mvrnorm(n=5000, mu=mu, Sigma=Sigma, empirical=TRUE))
names(sim_withcorr) <- c("y", "x1", "x2")

#now, semipartial and standardized beta are not equivalent
#likewise, bivariate correlation does not match 
lm.beta(lm(y ~ x1 + x2, sim_withcorr))

## 
## Call:
## lm(formula = y ~ x1 + x2, data = sim_withcorr)
## 
## Standardized Coefficients::
## (Intercept)          x1          x2 
##        0.00        0.25        0.25

lm.beta(lm(y ~ x1, sim_withcorr))

## 
## Call:
## lm(formula = y ~ x1, data = sim_withcorr)
## 
## Standardized Coefficients::
## (Intercept)          x1 
##         0.0         0.4

lm.beta(lm(y ~ x2, sim_withcorr))

## 
## Call:
## lm(formula = y ~ x2, data = sim_withcorr)
## 
## Standardized Coefficients::
## (Intercept)          x2 
##         0.0         0.4

with(sim_withcorr, spcor.test(y, x1, x2))

estimate	p.value	statistic	n	gp	Method
0.2	0	14.4	5000	1	pearson

with(sim_withcorr, spcor.test(y, x2, x1))

estimate	p.value	statistic	n	gp	Method
0.2	0	14.4	5000	1	pearson

with(sim_withcorr, cor(y, x1))

## [1] 0.4

with(sim_withcorr, cor(y, x2))

## [1] 0.4

#demonstrate *non*-equivalence of sequential versus join estimation under correlated predictors
m1 <- lm(y ~ x1, sim_withcorr)
summary(m1)

## 
## Call:
## lm(formula = y ~ x1, data = sim_withcorr)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.096 -0.627  0.000  0.617  3.422 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 3.36e-17   1.30e-02     0.0        1    
## x1          4.00e-01   1.30e-02    30.9   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.917 on 4998 degrees of freedom
## Multiple R-squared:  0.16,   Adjusted R-squared:  0.16 
## F-statistic:  952 on 1 and 4998 DF,  p-value: <2e-16

sim_withcorr$resid_x1 <- resid(m1)
m2 <- lm(resid_x1 ~ x2 - 1, sim_withcorr) #remove intercept since that was pulled out in m1
summary(m2)

## 
## Call:
## lm(formula = resid_x1 ~ x2 - 1, data = sim_withcorr)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.172 -0.610  0.000  0.604  3.501 
## 
## Coefficients:
##    Estimate Std. Error t value Pr(>|t|)    
## x2   0.1600     0.0128    12.5   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.902 on 4999 degrees of freedom
## Multiple R-squared:  0.0305, Adjusted R-squared:  0.0303 
## F-statistic:  157 on 1 and 4999 DF,  p-value: <2e-16

m3 <- lm(y ~ x1 + x2, sim_withcorr)
summary(m3)

## 
## Call:
## lm(formula = y ~ x1 + x2, data = sim_withcorr)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.295 -0.600  0.005  0.599  3.438 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 3.71e-17   1.27e-02     0.0        1    
## x1          2.50e-01   1.58e-02    15.8   <2e-16 ***
## x2          2.50e-01   1.58e-02    15.8   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.895 on 4997 degrees of freedom
## Multiple R-squared:   0.2,   Adjusted R-squared:   0.2 
## F-statistic:  625 on 2 and 4997 DF,  p-value: <2e-16

10 Assumptions of regression (from Kline)

Regression coefficients reflect unconditional linear relations only. Stated differently, the covariance of a predictor \(X\) and the criterion \(Y\) must be constant over \(X\), other predictors \(\mathbf{P}\), and unmeasured variables \(\mathbf{U}\). Thus, unmodeled interactions among predictors, measured or unmeasured, violates this assumption. And nonlinear associations between \(X\) and \(Y\) do as well.
Predictors are measured perfectly. Although we know that psychological variables suffer from measurement error (i.e., we are uncertain about the true value of a predictor \(X\)), we cannot manage this problem in conventional regression. Rather, predictors are assumed to have 0 measurement error. Violating this can lead to bias in parameter estimates, both for the unreliable predictor and other parameters.
Residuals are normally distributed. An important assumption in regression is that the errors (i.e., residuals) are normally distributed conditional on \(\mathbf{X}\) (i.e., the matrix of all predictors). In addition, knowledge about one residual should not provide information about another residual. This is the assumption of iid residual – independent and identically distributed. The iid assumption can be violated, for example, by clustered data in which the cluster has certain unmodeled properties (e.g., longitudinal data where one person is higher on average than others).
Residuals are homoscedastic. The variance of the residuals should not depend on the value of the predictor. So, if more extreme values tend to be more poorly fit (larger variance of residuals), this is heteroscedastic.
There are no causal effects among the predictors. This assumption highlights that in univariate regression (i.e., one dependent variable), there is no way to model indirect causal effects among predictors. So if one believes that \(X\) causes \(Y\) via its influence on \(M\) (i.e., \(X \rightarrow M \rightarrow Y\)), this requires two regression equations (\(X \rightarrow M\) and \(M \rightarrow Y\)), which cannot be accommodated in standard regression. Indeed, such multivariate regression problems are a major motivation for SEM!
There is no specification error. Many, if not most, statistical models have this assumption, which basically states that the form of the model reflects an accurate representation of the data-generating processes in the population. There are many ways to violate this assumption.
1. Misspecifying the form of the relationship between a predictor and the criterion (e.g., nonlinear effects).
2. Using a biased or imprecise statistical estimator
3. Including predictors that are irrelevant in the population, but could absorb variance by chance.
4. Omitting variables that have a relationship with the criterion.
5. Omitting variables that are associated with predictors in the model.

11 Suppression

This is a complex problem that is addressed in more detail by Cohen, Cohen, West, and Aiken (2003). I will mention a couple of cases:

11.1 Classical suppression

Including a variable that is uncorrelated with the criterion, but correlated with a useful predictor, can inflate the parameter estimate of the useful predictor. For example:

\[ \begin{align} r_{YX} &= 0.5 \\ r_{YZ} &= 0 \\ r_{XZ} &= 0.4 \\ \hat{Y} &= B_0 + B_1 X + B_2 Z + \varepsilon \end{align} \]

In this cirumstance, as we discussed above, the partial regression coefficient adjusts for the associations among predictors.

\[ \beta_\textrm{X} = \frac{r_\textrm{XY} - r_\textrm{YZ} r_\textrm{XZ}}{1 - r_\textrm{XZ}^2} \]

Thus, when there is no association between the criterion \(Y\) and the predictor \(Z\), we get

\[ \beta_\textrm{X} = \frac{r_\textrm{XY}}{1 - r_\textrm{XZ}^2} \] Relative to the bivariate correlation \(r_{XY}\), the denominator will be smaller under classical suppression, leading \(\beta_X\) to be larger.

11.2 Net suppression

Definition: We observe positive bivariate associations among 2+ predictors and a criterion, but one of the predictors has a negative regression coefficient.

\[ \begin{align} r_{YX} &= 0.3 \\ r_{YZ} &= 0.5 \\ r_{XZ} &= 0.4 \\ \hat{Y} &= B_0 + B_1 X + B_2 Z + \varepsilon \end{align} \]

In this case, \(X\) is the predictor having a weaker association with \(Y\) and will tend to take have a negative parameter estimate in the multiple regression, largely reducing error variance in \(Z\), but contributing little to predicting \(Y\). This form of suppression is often identified when the standardized beta for a predictor is opposite in sign to the bivariate correlation of that predictor with the criterion.

11.3 Cooperative suppression

Definition: two predictors correlate negatively with each other, but positively with the criterion. This will lead to a high multiple \(R\) because the predictors will account for more of the criterion when they are considered together than in bivariate correlation. Thus, this form of suppression is often identified when the standardized beta for a predictor exceed its bivariate correlation with the criterion.

\[ \begin{align} r_{YX} &= 0.5 \\ r_{YZ} &= 0.5 \\ r_{XZ} &= -0.4 \\ \hat{Y} &= B_0 + B_1 X + B_2 Z + \varepsilon \end{align} \]

More detail here

12 Use lavaan for simple multiple regression

We’ve now reviewed many fundamental topics in regression. And the next piece of good news is that SEM builds directly on regression methods. More specifically, the idea of ‘structural equations’ refers to the fact that we have more than one equation representing a model of covariance structure in which we (usually) have multiple criterion variables and multiple predictors.

Let’s return to our house pricing example above. We can estimate the same multiple regression models using the lavaan package that we’ll be using for more complex SEMs. And the syntax even has many similarities with lm().

Here’s the single-predictor regression from above, run as a path model in lavaan:

lavaan_m <- 'cmedv ~ log_crim'
mlav <- sem(lavaan_m, data=BostonSmall)
summary(mlav)

## lavaan (0.5-23.1097) converged normally after  11 iterations
## 
##   Number of observations                           506
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic                0.000
##   Degrees of freedom                                 0
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   cmedv ~                                             
##     log_crim         -1.346    0.116  -11.567    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .cmedv            66.549    4.184   15.906    0.000

And for comparison, the output of lm()

  summary(lm(cmedv ~ log_crim, BostonSmall))

## 
## Call:
## lm(formula = cmedv ~ log_crim, data = BostonSmall)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -17.31  -5.17  -2.48   2.66  33.30 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   21.013      0.386    54.4   <2e-16 ***
## log_crim      -1.346      0.117   -11.5   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.17 on 504 degrees of freedom
## Multiple R-squared:  0.209,  Adjusted R-squared:  0.208 
## F-statistic:  133 on 1 and 504 DF,  p-value: <2e-16

The regression coefficient is identical (good!). One thing to note is that we don’t have an intercept in the lavaan output. This highlights an important difference that basic SEM often focuses on the covariance structure of the data. We can include means as well, but typically only when it’s relevant to our scientific questions. For example, do males and females differ on mean level of a depression latent factor?

We can ask lavaan to include the mean (intercept) in the model in this case using meanstructure=TRUE:

mlav <- sem(lavaan_m, data=BostonSmall, meanstructure=TRUE)

It’s good to get in the habit of examining the ‘parameter’ table in lavaan, which provides an important summary of what parameters are free in the model (i.e., have to be estimated), and what parameters were requested by the user (you!) in the model syntax.

parTable(mlav)

id	lhs	op	rhs	user	block	group	free	ustart	exo	plabel	start	est	se
1	cmedv	~	log_crim	1	1	1	1	NA	0	.p1.	0.00	-1.35	0.116
2	cmedv	~~	cmedv	0	1	1	2	NA	0	.p2.	42.07	66.55	4.184
3	log_crim	~~	log_crim	0	1	1	0	NA	1	.p3.	9.71	9.71	0.000
4	cmedv	~1		0	1	1	3	NA	0	.p4.	22.53	21.01	0.386
5	log_crim	~1		0	1	1	0	NA	1	.p5.	-1.13	-1.13	0.000

Note that we can get standardized estimates in lavaan as well. This is a more complicated topic in SEM that we’ll return to:

standardizedSolution(mlav)

lhs	op	rhs	est.std	se	z	pvalue
cmedv	~	log_crim	-0.457	0.033	-13.7	0
cmedv	~~	cmedv	0.791	0.030	26.0	0
log_crim	~~	log_crim	1.000	0.000	NA	NA
cmedv	~1		2.291	0.089	25.8	0
log_crim	~1		-0.361	0.000	NA	NA

Other than fitting this simple regression model, how could we use lavaan to conduct regression analyses?

And can you think of an example analysis that is possible in lavaan (i.e., path analysis) that isn’t possible in standard regression?

Welcome to SEM!

PSY 597 Week 1

Michael Hallquist

24 Aug 2017