Measured Variable Path Analysis

In this tutorial, we are going to use lavaan for measured variable path analysis.

Load the pacakges

library(lavaan)
library(lavaanPlot)

Example 1

In this example, there are 4 measured variables: ReadSC1, MathSC1, Goals2, and SATmath.

Import the data

In this example, our data is a correlation matrix. For more information about the data, please refer back to the course slides.

We can read the lower half of the correlation matrix into R and make it symmetric using the getCov() function. But we need to further convert it to a variance-covariance matrix before lavaan can use it as the input data. With the correlation matrix and sample standard deviations, you can easily compute the variance-covariance matrix. We then assign variable names to the covariance matrix, which is required for lavaan to work properly.

cormat <- '
1.000
-.173 1.000
.122 .163 1.000
-.049 .556 .385 1.000'

# convert to a symmetric correaltion matrix
Cmat <- getCov(cormat)

# sample standard deviations
dat.sdev <- c(1.274, 1.397, 1.276, 37.087)
Dmat <- diag(dat.sdev)
covmat <- Dmat %*% Cmat %*% Dmat      # compute the covariance matrix

# assign row and column names to the covariance matrix
rownames(covmat) <- c("ReadSC1", "MathSC1", "Goals2", "SATmath")
colnames(covmat) <- rownames(covmat)
covmat

##            ReadSC1    MathSC1     Goals2     SATmath
## ReadSC1  1.6230760 -0.3079016  0.1983261   -2.315193
## MathSC1 -0.3079016  1.9516090  0.2905592   28.806660
## Goals2   0.1983261  0.2905592  1.6281760   18.219360
## SATmath -2.3151931 28.8066597 18.2193596 1375.445569

Write the model syntax

Now we tell lavaan what our model looks like. Like the MLR example, we use the tilde ~ operator to tell lavaan that we wish to regress Goals2 on ReadSC1 and MathSC1. We also want to regress SATmath on all the other three variables. We use the double tilde ~~ to tell lavaan that we wish to estimate the variances of the independent variables. We also allow the two exogenous variables to covary.

mvpa.model1 <- '

# regressions
Goals2 ~ ReadSC1 + MathSC1
SATmath ~ Goals2 + ReadSC1 + MathSC1

# variances and covariances
ReadSC1 ~~ ReadSC1 + MathSC1
MathSC1 ~~ MathSC1

'

Fit the model to the data

After you specify the model syntax, you can fit the model to the variance-covariance matrix using the sem() function. In this function, we first specify the model syntax. Then we tell lavaan that our data is a variance-covariance matrix using the sample.cov = argument. We also tell lavaan that our sample size is 1000.

mvpa.fit.1 <- sem(mvpa.model1, sample.cov = covmat, sample.nobs = 1000)

Summarize the results

Next you can summarize the results using the summary()function. To get more info on the model fit measures, you can use the optional argument fit.measures = TRUE. To take a look at the standardzied parameter estimates, you can use the optional argument standardized = TRUE. The standardized solutions are listed in the column labeled “Std.all”.

summary(mvpa.fit.1, fit.measures = T, standardized = T)

## lavaan 0.6-19 ended normally after 16 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        10
## 
##   Number of observations                          1000
## 
## Model Test User Model:
##                                                       
##   Test statistic                                 0.000
##   Degrees of freedom                                 0
## 
## Model Test Baseline Model:
## 
##   Test statistic                               589.257
##   Degrees of freedom                                 6
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    1.000
##   Tucker-Lewis Index (TLI)                       1.000
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -9812.610
##   Loglikelihood unrestricted model (H1)      -9812.610
##                                                       
##   Akaike (AIC)                               19645.219
##   Bayesian (BIC)                             19694.297
##   Sample-size adjusted Bayesian (SABIC)      19662.536
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.000
##   P-value H_0: RMSEA <= 0.050                       NA
##   P-value H_0: RMSEA >= 0.080                       NA
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   Goals2 ~                                            
##     ReadSC1           0.155    0.031    4.947    0.000
##     MathSC1           0.173    0.029    6.064    0.000
##   SATmath ~                                           
##     Goals2            8.781    0.731   12.004    0.000
##     ReadSC1           0.054    0.734    0.074    0.941
##     MathSC1          13.462    0.673   19.994    0.000
##    Std.lv  Std.all
##                   
##     0.155    0.155
##     0.173    0.190
##                   
##     8.781    0.302
##     0.054    0.002
##    13.462    0.507
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   ReadSC1 ~~                                          
##     MathSC1          -0.308    0.057   -5.391    0.000
##    Std.lv  Std.all
##                   
##    -0.308   -0.173
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     ReadSC1           1.621    0.073   22.361    0.000
##     MathSC1           1.950    0.087   22.361    0.000
##    .Goals2            1.546    0.069   22.361    0.000
##    .SATmath         826.971   36.983   22.361    0.000
##    Std.lv  Std.all
##     1.621    1.000
##     1.950    1.000
##     1.546    0.950
##   826.971    0.602

Example 2

In the second example, there are 3 measured variables of interest: ReadSC1, MathSC1, and SATmath.

Import the data

In this example, our data is still a correlation matrix. Similarly, we first read in the lower half of the correlation matrix and convert it to a full symmetrix covariance matrix.

# read in lower half of the correlation matrix
cormat <- '
1.000
-.173 1.000
-.049 .556 1.000'

Cmat <- getCov(cormat)

# convert it to variance-covariance matrix
dat.sdev <- c(1.274,1.397,37.087)
Dmat <- diag(dat.sdev)
covmat <- Dmat %*% Cmat %*% Dmat

# assign row and column names to the covariance matrix
rownames(covmat) <- c("ReadSC1", "MathSC1","SATmath")
colnames(covmat) <- rownames(covmat)
covmat

##            ReadSC1    MathSC1     SATmath
## ReadSC1  1.6230760 -0.3079016   -2.315193
## MathSC1 -0.3079016  1.9516090   28.806660
## SATmath -2.3151931 28.8066597 1375.445569

Write the model syntax

Now we tell lavaan what our model looks like using the model syntax.

mvpa.model2 <- '
MathSC1 ~ ReadSC1
SATmath ~ MathSC1
'

Fit the model to the data

mvpa.fit.2 <- sem(mvpa.model2, sample.cov=covmat, sample.nobs = 1000)

Summarize the results

summary(mvpa.fit.2, fit.measures = T, standardized = T)

## lavaan 0.6-19 ended normally after 1 iteration
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         4
## 
##   Number of observations                          1000
## 
## Model Test User Model:
##                                                       
##   Test statistic                                 3.328
##   Degrees of freedom                                 1
##   P-value (Chi-square)                           0.068
## 
## Model Test Baseline Model:
## 
##   Test statistic                               403.526
##   Degrees of freedom                                 3
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.994
##   Tucker-Lewis Index (TLI)                       0.983
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -6584.371
##   Loglikelihood unrestricted model (H1)      -6582.707
##                                                       
##   Akaike (AIC)                               13176.742
##   Bayesian (BIC)                             13196.373
##   Sample-size adjusted Bayesian (SABIC)      13183.669
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.048
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.110
##   P-value H_0: RMSEA <= 0.050                    0.404
##   P-value H_0: RMSEA >= 0.080                    0.240
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.019
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   MathSC1 ~                                           
##     ReadSC1          -0.190    0.034   -5.554    0.000
##   SATmath ~                                           
##     MathSC1          14.760    0.698   21.153    0.000
##    Std.lv  Std.all
##                   
##    -0.190   -0.173
##                   
##    14.760    0.556
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .MathSC1           1.891    0.085   22.361    0.000
##    .SATmath         949.296   42.454   22.361    0.000
##    Std.lv  Std.all
##     1.891    0.970
##   949.296    0.691