MLR Path Analysis

In this tutorial, we are going to use lavaan for a multiple linear regression path analysis.

Load the pacakges

library(lavaan)

Import the data

For this example, our data is a variance-covariance matrix for three observed variables: ReadSC1, MathSC1, Goals2. For more information about the data, please refer back to the course slides.

We use the following code to read the variance-covariance matrix into R. We first create an object called mlr.lower which contains the lower diagonal elements of the covariance matrix. We then use the getCov() function in lavaan to convert the matrix to a \(3\times3\) full symmetric covariance matrix with variable names. If you have a full covariance matrix to begin with, you can directly import the matrix without the need to convert it.

mlr.lower <- '
1.623
-.308 1.951
.198 .290 1.627'

(mlr.cov <- getCov(mlr.lower, names = c("ReadSC1", "MathSC1", "Goals2")))

##         ReadSC1 MathSC1 Goals2
## ReadSC1   1.623  -0.308  0.198
## MathSC1  -0.308   1.951  0.290
## Goals2    0.198   0.290  1.627

Write the model syntax

In order for lavaan to fit a model to the data, you need to tell lavaan what the model is. To communicate with lavaan, you would need to speak its language – the lavaan model syntax. The model syntax describes the model to be estimated. It usually consists of multiple formulas. Depending on the operator being used, different formulas carry different meanings. In this example, we use the tilde ~ to tell lavaan that we wish to regress Goals2 on ReadSC1 and MathSC1. We then use the double tilde ~~ to tell lavaan that we wish to estimate the variances of the independent variables and the residual variance of the outcome variable. We also allow the two independent variables to covary.

mlr.model <- '

# regression
Goals2 ~ ReadSC1 + MathSC1

# variance and covariances
Goals2 ~~ Goals2
ReadSC1 ~~ ReadSC1
MathSC1 ~~ MathSC1
ReadSC1 ~~ MathSC1
'

Fit the model to the data

After you specify the model syntax, you can fit the model to the variance-covariance matrix using the sem() function. In this function, we first specify the model syntax. Then we tell lavaan that our data is a variance-covariance matrix using the sample.cov = argument. We also tell lavaan that our sample size is 1000.

mlr.fit <- sem(mlr.model, sample.cov = mlr.cov, sample.nobs = 1000)

Summarize the results

Next you can summarize the results using the summary()function. To get more info on the model fit measures, you can use the optional argument fit.measures = TRUE. To take a look at the standardzied parameter estimates, you can use the optional argument standardized = TRUE. Unstandardized solutions are displayed in the column labled “Estimate”. The standardized solutions are listed in the column labeled “Std.all”.

summary(mlr.fit, fit.measures = T, standardized = T)

## lavaan 0.6-19 ended normally after 16 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         6
## 
##   Number of observations                          1000
## 
## Model Test User Model:
##                                                       
##   Test statistic                                 0.000
##   Degrees of freedom                                 0
## 
## Model Test Baseline Model:
## 
##   Test statistic                                81.389
##   Degrees of freedom                                 3
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    1.000
##   Tucker-Lewis Index (TLI)                       1.000
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -5034.298
##   Loglikelihood unrestricted model (H1)      -5034.298
##                                                       
##   Akaike (AIC)                               10080.597
##   Bayesian (BIC)                             10110.043
##   Sample-size adjusted Bayesian (SABIC)      10090.987
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.000
##   P-value H_0: RMSEA <= 0.050                       NA
##   P-value H_0: RMSEA >= 0.080                       NA
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   Goals2 ~                                            
##     ReadSC1           0.155    0.031    4.941    0.000
##     MathSC1           0.173    0.029    6.056    0.000
##    Std.lv  Std.all
##                   
##     0.155    0.155
##     0.173    0.190
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   ReadSC1 ~~                                          
##     MathSC1          -0.308    0.057   -5.393    0.000
##    Std.lv  Std.all
##                   
##    -0.308   -0.173
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .Goals2            1.545    0.069   22.361    0.000
##     ReadSC1           1.621    0.073   22.361    0.000
##     MathSC1           1.949    0.087   22.361    0.000
##    Std.lv  Std.all
##     1.545    0.950
##     1.621    1.000
##     1.949    1.000