Introduction

In this vignette we will discuss how to use simPM to search for item-level PHPM designs via forward assembly. Similar with balanced item-level PHPM designs, the missingness is imposed at the item(or observed variable) level. But unlike the balanced designs, the number of missing observed measurements is not equal across all the missing data patterns. For example, some participants may be missing only one observed measurement in the remaining waves, while some participants may be assigned to miss four observed measurement in the remaining waves.

In simPM, the forward assembly methods will build up the PHPM design sequentially. It will start by searching for the optimal missing data pattern with only one missing observed measurement. Given the optimal pattern with only one missing indicator, it continues to build the design by adding another pattern with two missing observed measurements, after which it continue to add another missing data pattern with three missing observed measurements…At each step, it looks for the optimal missing data pattern that would yield the best statistical power, in combination with the previously selected missing data patterns.

Compared to balanced item-level PHPM designs, the designs returned by forward assembly will have much fewer number of unique missing data patterns and thus make it easier for implementation in practice.

Search for item-level PHPM designs via forward assembly

To implement forward assembly using simPM, we need to specify the methods = "forward" argument when using the simPM() function. Additionally, we also need to specify the maximum number of unique missing data patterns we would wish to have in the PHPM design using the max.mk argument.

Example

In this hypothetical example, Suppose a group of researchers is interested in examining the longitudinal reciprocal relations between peer relationships and teacher-child relationships. They have been funded for a longitudinal panel study following 1000 children for three years. Each year they would collect data on three measures: two measures of peer relationships (self-perceived popularity and peer-perceived popularity) and one measure of teacher-child relationships (teacher-child support). Upon the completion of data collection, they plan to fit an autoregressive cross-lagged model shown below. The parameter of focal interest to the researchers are the cross-lagged path coefficients predicting the peer-perceived popularity from the self-perceived popularity measured at the previous time point, as well as in the cross-lagged paths predicting teacher-child support from the peer-perceived popularity measured at the previous time point (marked in red).

Unfortunately, after the first wave of data collection, the funding agency announces a 30% reduction in the remaining funding. The researcher wishes to continue the project under the budget constraint, with the hope to not compromise the scientific rigor and statistical power. The reseacher thus decideds to use simPM to find a design that yields sufficient power but costs no more than the reduced budget.

An autoregressive and cross-lagged model

An autoregressive and cross-lagged model

After supplying the population model and the analysis model, we can use the simPM() function to search for an optimal item-level missing design with the methods = "forward" argument. For this example, suppose we wish to have no more than three unique missing data patterns in the PHPM design. For more details about the specification of other arguments, please refer to this vignette.

popModel <- [1050 chars quoted with ''']
analyzeModel <- '

#----------- path coefficients ------------#

SelfPop2~SelfPop1+PeerPop1
SelfPop3~SelfPop2+PeerPop2

PeerPop2~SelfPop1+PeerPop1+Support1
PeerPop3~SelfPop2+PeerPop2+Support2

Support2~Support1+PeerPop1
Support3~Support2+PeerPop2

#----------- residual covariance ----------#

PeerPop1~~SelfPop1+Support1
SelfPop1~~Support1

PeerPop2~~SelfPop2+Support2
SelfPop2~~Support2

PeerPop3~~SelfPop3+Support3
SelfPop3~~Support3


#---------------- means -------------------#

PeerPop1~1
SelfPop1~1
Support1~1

PeerPop2~1
SelfPop2~1
Support2~1

PeerPop3~1
SelfPop3~1
Support3~1


#--------------- variances ----------------#

PeerPop1~~PeerPop1
SelfPop1~~SelfPop1
Support1~~Support1

PeerPop2~~PeerPop2
SelfPop2~~SelfPop2
Support2~~Support2

PeerPop3~~PeerPop3
SelfPop3~~SelfPop3
Support3~~Support3

'
forward.ex3 <- simPM(
  popModel,
  analyzeModel,
  VarNAMES=c("PeerPop1","SelfPop1","Support1","PeerPop2","SelfPop2","Support2","PeerPop3","SelfPop3","Support3"),
  distal.var = NULL,
  n=10,
  nreps=1000,
  seed=12345,
  Time=3,
  k=3,
  max.mk=3,
  Time.complete=1,
  costmx=c(5,5,5,10,10,10),
  pc=0.1,
  pd=0,
  focal.param=c("PeerPop2~SelfPop1","Support2~PeerPop1","PeerPop3~SelfPop2","Support3~PeerPop2"),
  eval.budget=T,
  rm.budget=31500L,
  complete.var=NULL,
  engine="l",
  methods="forward"
)

In this example, we have:

  • \({6 \choose 1}=6\) possible patterns with \(1\) missing observed measurement.
  • \({6 \choose 2}=15\) possible patterns with \(2\) missing observed measurements.
  • \({6 \choose 3}=20\) possible patterns with \(3\) missing observed measurements.
  • \({6 \choose 4}=15\) possible patterns with \(4\) missing observed measurements.
  • \({6 \choose 5}=6\) possible patterns with \(5\) missing observed measurements.

The program will run Monte Carlo simulations in order to assemble the optimal PHPM design. With the max.mk = 3 argument, the target PHPM design will have five unique missing data patterns, each has \(1, 2, 3\) missing observed measurements, respectively.

According to the output, the optimal design costs $30,000, which is below the reduced available budget. From the output, we can see that 10% of the participants are assigned to provide complete data across all the future waves of data collection. The rest 90% of the participants are randomly assigned to one of the three unique missing data patterns (\(n=300\) in each pattern).

Over 1000 replications, this design yields an empirical power of 0.885 for testing the path coefficient \(b_{PS_1}\), 0.914 for testing the path coefficient \(b_{PS_2}\), 0.906 for testing the path coefficient \(b_{TS_1}\), and 0.937 for testing the path coefficient \(b_{TS_2}\). With this PHPM design, the statistical power is satisfactory for testing each of the focal parameters.

summary(forward.ex3)
#> [1] "=================results summary================"
#>    convergence.rate weakest.param.name weakest.para.power cost.design
#> 2                 1  PeerPop2~SelfPop1              0.755       31500
#> 3                 1  PeerPop2~SelfPop1              0.796       31500
#> 4                 1  PeerPop2~SelfPop1              0.798       31500
#> 5                 1  Support2~PeerPop1              0.736       31500
#> 6                 1  Support2~PeerPop1              0.729       31500
#> 7                 1  Support2~PeerPop1              0.714       31500
#> 8                 1  PeerPop2~SelfPop1              0.751       30000
#> 9                 1  PeerPop2~SelfPop1              0.749       30000
#> 10                1  PeerPop2~SelfPop1              0.795       30000
#> 11                1  Support2~PeerPop1              0.737       31500
#> 12                1  Support2~PeerPop1              0.734       31500
#> 13                1  Support2~PeerPop1              0.721       31500
#> 14                1  PeerPop3~SelfPop2              0.870       30000
#> 15                1  PeerPop3~SelfPop2              0.873       30000
#> 16                1  PeerPop2~SelfPop1              0.885       30000
#> 17                1  Support2~PeerPop1              0.737       30000
#> 18                1  Support2~PeerPop1              0.724       30000
#> 19                1  Support2~PeerPop1              0.720       30000
#> 20                1  PeerPop3~SelfPop2              0.874       28500
#>    miss.num miss.var1 miss.var2 miss.var3 sim.seq miss.loc1 miss.loc2
#> 2         3  PeerPop2  SelfPop2  PeerPop3       2         4         5
#> 3         3  PeerPop2  SelfPop2  SelfPop3       3         4         5
#> 4         3  PeerPop2  SelfPop2  Support3       4         4         5
#> 5         3  PeerPop2  Support2  PeerPop3       5         4         6
#> 6         3  PeerPop2  Support2  SelfPop3       6         4         6
#> 7         3  PeerPop2  Support2  Support3       7         4         6
#> 8         3  PeerPop2  PeerPop3  SelfPop3       8         4         7
#> 9         3  PeerPop2  PeerPop3  Support3       9         4         7
#> 10        3  PeerPop2  SelfPop3  Support3      10         4         8
#> 11        3  SelfPop2  Support2  PeerPop3      11         5         6
#> 12        3  SelfPop2  Support2  SelfPop3      12         5         6
#> 13        3  SelfPop2  Support2  Support3      13         5         6
#> 14        3  SelfPop2  PeerPop3  SelfPop3      14         5         7
#> 15        3  SelfPop2  PeerPop3  Support3      15         5         7
#> 16        3  SelfPop2  SelfPop3  Support3      16         5         8
#> 17        3  Support2  PeerPop3  SelfPop3      17         6         7
#> 18        3  Support2  PeerPop3  Support3      18         6         7
#> 19        3  Support2  SelfPop3  Support3      19         6         8
#> 20        3  PeerPop3  SelfPop3  Support3      20         7         8
#>    miss.loc3
#> 2          7
#> 3          8
#> 4          9
#> 5          7
#> 6          8
#> 7          9
#> 8          8
#> 9          9
#> 10         9
#> 11         7
#> 12         8
#> 13         9
#> 14         8
#> 15         9
#> 16         9
#> 17         8
#> 18         9
#> 19         9
#> 20         9
#> [1] "=================Optimal design================="
#>    convergence.rate weakest.param.name weakest.para.power cost.design
#> 16                1  PeerPop2~SelfPop1              0.885       30000
#>    miss.num miss.var1 miss.var2 miss.var3 sim.seq miss.loc1 miss.loc2
#> 16        3  SelfPop2  SelfPop3  Support3      16         5         8
#>    miss.loc3
#> 16         9
#> [1] "=================Optimal design for focal parameters================="
#>                   Estimate Average Estimate SD Average SE
#> PeerPop2~SelfPop1        0.0509315  0.01593366 0.01625888
#> Support2~PeerPop1        0.1225502  0.03691843 0.03681819
#> PeerPop3~SelfPop2        0.0568749  0.01686760 0.01728927
#> Support3~PeerPop2        0.1313719  0.03780042 0.03793717
#>                   Power (Not equal 0)    Std Est Std Est SD Std Ave SE
#> PeerPop2~SelfPop1               0.885 0.07715693 0.02407530 0.02460888
#> Support2~PeerPop1               0.906 0.10899450 0.03257593 0.03257226
#> PeerPop3~SelfPop2               0.914 0.08652626 0.02558155 0.02628572
#> Support3~PeerPop2               0.937 0.12610467 0.03596546 0.03609359
#>                   Average Param  Average Bias Coverage Average FMI1
#> PeerPop2~SelfPop1         0.051 -6.849557e-05    0.951  0.004906487
#> Support2~PeerPop1         0.124 -1.449846e-03    0.937  0.275855979
#> PeerPop3~SelfPop2         0.056  8.748955e-04    0.947  0.202017999
#> Support3~PeerPop2         0.132 -6.281368e-04    0.951  0.330559605
#>                       SD FMI1
#> PeerPop2~SelfPop1 0.002715199
#> Support2~PeerPop1 0.020332822
#> PeerPop3~SelfPop2 0.015647460
#> Support3~PeerPop2 0.020608104
#> [1] "=================Optimal patterns==============="
#>            PeerPop1 SelfPop1 Support1 PeerPop2 SelfPop2 Support2 PeerPop3
#>                   0        0        0        0        1        0        0
#>                   0        0        0        0        0        1        0
#>                   0        0        0        0        0        0        0
#> completers        0        0        0        0        0        0        0
#>            SelfPop3 Support3
#>                   1        1
#>                   1        0
#>                   1        0
#> completers        0        0
#> [1] "=================Optimal probs=================="
#> [1] 0.3 0.3 0.3 0.1
#> [1] "=================Optimal ns===================="
#> [1] 300 300 300 100
plotPM(forward.ex3, row.names=F)