vignettes/Forward-Assembly-PHPM.Rmd
Forward-Assembly-PHPM.Rmd
In this vignette we will discuss how to use simPM
to search for item-level PHPM designs via forward assembly. Similar with balanced item-level PHPM designs, the missingness is imposed at the item(or observed variable) level. But unlike the balanced designs, the number of missing observed measurements is not equal across all the missing data patterns. For example, some participants may be missing only one observed measurement in the remaining waves, while some participants may be assigned to miss four observed measurement in the remaining waves.
In simPM
, the forward assembly methods will build up the PHPM design sequentially. It will start by searching for the optimal missing data pattern with only one missing observed measurement. Given the optimal pattern with only one missing indicator, it continues to build the design by adding another pattern with two missing observed measurements, after which it continue to add another missing data pattern with three missing observed measurements…At each step, it looks for the optimal missing data pattern that would yield the best statistical power, in combination with the previously selected missing data patterns.
Compared to balanced item-level PHPM designs, the designs returned by forward assembly will have much fewer number of unique missing data patterns and thus make it easier for implementation in practice.
To implement forward assembly using simPM
, we need to specify the methods = "forward"
argument when using the simPM()
function. Additionally, we also need to specify the maximum number of unique missing data patterns we would wish to have in the PHPM design using the max.mk
argument.
In this hypothetical example, Suppose a group of researchers is interested in examining the longitudinal reciprocal relations between peer relationships and teacher-child relationships. They have been funded for a longitudinal panel study following 1000 children for three years. Each year they would collect data on three measures: two measures of peer relationships (self-perceived popularity and peer-perceived popularity) and one measure of teacher-child relationships (teacher-child support). Upon the completion of data collection, they plan to fit an autoregressive cross-lagged model shown below. The parameter of focal interest to the researchers are the cross-lagged path coefficients predicting the peer-perceived popularity from the self-perceived popularity measured at the previous time point, as well as in the cross-lagged paths predicting teacher-child support from the peer-perceived popularity measured at the previous time point (marked in red).
Unfortunately, after the first wave of data collection, the funding agency announces a 30% reduction in the remaining funding. The researcher wishes to continue the project under the budget constraint, with the hope to not compromise the scientific rigor and statistical power. The reseacher thus decideds to use simPM
to find a design that yields sufficient power but costs no more than the reduced budget.
After supplying the population model and the analysis model, we can use the simPM()
function to search for an optimal item-level missing design with the methods = "forward"
argument. For this example, suppose we wish to have no more than three unique missing data patterns in the PHPM design. For more details about the specification of other arguments, please refer to this vignette.
popModel <- [1050 chars quoted with ''']
analyzeModel <- ' #----------- path coefficients ------------# SelfPop2~SelfPop1+PeerPop1 SelfPop3~SelfPop2+PeerPop2 PeerPop2~SelfPop1+PeerPop1+Support1 PeerPop3~SelfPop2+PeerPop2+Support2 Support2~Support1+PeerPop1 Support3~Support2+PeerPop2 #----------- residual covariance ----------# PeerPop1~~SelfPop1+Support1 SelfPop1~~Support1 PeerPop2~~SelfPop2+Support2 SelfPop2~~Support2 PeerPop3~~SelfPop3+Support3 SelfPop3~~Support3 #---------------- means -------------------# PeerPop1~1 SelfPop1~1 Support1~1 PeerPop2~1 SelfPop2~1 Support2~1 PeerPop3~1 SelfPop3~1 Support3~1 #--------------- variances ----------------# PeerPop1~~PeerPop1 SelfPop1~~SelfPop1 Support1~~Support1 PeerPop2~~PeerPop2 SelfPop2~~SelfPop2 Support2~~Support2 PeerPop3~~PeerPop3 SelfPop3~~SelfPop3 Support3~~Support3 '
forward.ex3 <- simPM( popModel, analyzeModel, VarNAMES=c("PeerPop1","SelfPop1","Support1","PeerPop2","SelfPop2","Support2","PeerPop3","SelfPop3","Support3"), distal.var = NULL, n=10, nreps=1000, seed=12345, Time=3, k=3, max.mk=3, Time.complete=1, costmx=c(5,5,5,10,10,10), pc=0.1, pd=0, focal.param=c("PeerPop2~SelfPop1","Support2~PeerPop1","PeerPop3~SelfPop2","Support3~PeerPop2"), eval.budget=T, rm.budget=31500L, complete.var=NULL, engine="l", methods="forward" )
In this example, we have:
The program will run Monte Carlo simulations in order to assemble the optimal PHPM design. With the max.mk = 3
argument, the target PHPM design will have five unique missing data patterns, each has \(1, 2, 3\) missing observed measurements, respectively.
According to the output, the optimal design costs $30,000, which is below the reduced available budget. From the output, we can see that 10% of the participants are assigned to provide complete data across all the future waves of data collection. The rest 90% of the participants are randomly assigned to one of the three unique missing data patterns (\(n=300\) in each pattern).
Over 1000 replications, this design yields an empirical power of 0.885 for testing the path coefficient \(b_{PS_1}\), 0.914 for testing the path coefficient \(b_{PS_2}\), 0.906 for testing the path coefficient \(b_{TS_1}\), and 0.937 for testing the path coefficient \(b_{TS_2}\). With this PHPM design, the statistical power is satisfactory for testing each of the focal parameters.
summary(forward.ex3) #> [1] "=================results summary================" #> convergence.rate weakest.param.name weakest.para.power cost.design #> 2 1 PeerPop2~SelfPop1 0.755 31500 #> 3 1 PeerPop2~SelfPop1 0.796 31500 #> 4 1 PeerPop2~SelfPop1 0.798 31500 #> 5 1 Support2~PeerPop1 0.736 31500 #> 6 1 Support2~PeerPop1 0.729 31500 #> 7 1 Support2~PeerPop1 0.714 31500 #> 8 1 PeerPop2~SelfPop1 0.751 30000 #> 9 1 PeerPop2~SelfPop1 0.749 30000 #> 10 1 PeerPop2~SelfPop1 0.795 30000 #> 11 1 Support2~PeerPop1 0.737 31500 #> 12 1 Support2~PeerPop1 0.734 31500 #> 13 1 Support2~PeerPop1 0.721 31500 #> 14 1 PeerPop3~SelfPop2 0.870 30000 #> 15 1 PeerPop3~SelfPop2 0.873 30000 #> 16 1 PeerPop2~SelfPop1 0.885 30000 #> 17 1 Support2~PeerPop1 0.737 30000 #> 18 1 Support2~PeerPop1 0.724 30000 #> 19 1 Support2~PeerPop1 0.720 30000 #> 20 1 PeerPop3~SelfPop2 0.874 28500 #> miss.num miss.var1 miss.var2 miss.var3 sim.seq miss.loc1 miss.loc2 #> 2 3 PeerPop2 SelfPop2 PeerPop3 2 4 5 #> 3 3 PeerPop2 SelfPop2 SelfPop3 3 4 5 #> 4 3 PeerPop2 SelfPop2 Support3 4 4 5 #> 5 3 PeerPop2 Support2 PeerPop3 5 4 6 #> 6 3 PeerPop2 Support2 SelfPop3 6 4 6 #> 7 3 PeerPop2 Support2 Support3 7 4 6 #> 8 3 PeerPop2 PeerPop3 SelfPop3 8 4 7 #> 9 3 PeerPop2 PeerPop3 Support3 9 4 7 #> 10 3 PeerPop2 SelfPop3 Support3 10 4 8 #> 11 3 SelfPop2 Support2 PeerPop3 11 5 6 #> 12 3 SelfPop2 Support2 SelfPop3 12 5 6 #> 13 3 SelfPop2 Support2 Support3 13 5 6 #> 14 3 SelfPop2 PeerPop3 SelfPop3 14 5 7 #> 15 3 SelfPop2 PeerPop3 Support3 15 5 7 #> 16 3 SelfPop2 SelfPop3 Support3 16 5 8 #> 17 3 Support2 PeerPop3 SelfPop3 17 6 7 #> 18 3 Support2 PeerPop3 Support3 18 6 7 #> 19 3 Support2 SelfPop3 Support3 19 6 8 #> 20 3 PeerPop3 SelfPop3 Support3 20 7 8 #> miss.loc3 #> 2 7 #> 3 8 #> 4 9 #> 5 7 #> 6 8 #> 7 9 #> 8 8 #> 9 9 #> 10 9 #> 11 7 #> 12 8 #> 13 9 #> 14 8 #> 15 9 #> 16 9 #> 17 8 #> 18 9 #> 19 9 #> 20 9 #> [1] "=================Optimal design=================" #> convergence.rate weakest.param.name weakest.para.power cost.design #> 16 1 PeerPop2~SelfPop1 0.885 30000 #> miss.num miss.var1 miss.var2 miss.var3 sim.seq miss.loc1 miss.loc2 #> 16 3 SelfPop2 SelfPop3 Support3 16 5 8 #> miss.loc3 #> 16 9 #> [1] "=================Optimal design for focal parameters=================" #> Estimate Average Estimate SD Average SE #> PeerPop2~SelfPop1 0.0509315 0.01593366 0.01625888 #> Support2~PeerPop1 0.1225502 0.03691843 0.03681819 #> PeerPop3~SelfPop2 0.0568749 0.01686760 0.01728927 #> Support3~PeerPop2 0.1313719 0.03780042 0.03793717 #> Power (Not equal 0) Std Est Std Est SD Std Ave SE #> PeerPop2~SelfPop1 0.885 0.07715693 0.02407530 0.02460888 #> Support2~PeerPop1 0.906 0.10899450 0.03257593 0.03257226 #> PeerPop3~SelfPop2 0.914 0.08652626 0.02558155 0.02628572 #> Support3~PeerPop2 0.937 0.12610467 0.03596546 0.03609359 #> Average Param Average Bias Coverage Average FMI1 #> PeerPop2~SelfPop1 0.051 -6.849557e-05 0.951 0.004906487 #> Support2~PeerPop1 0.124 -1.449846e-03 0.937 0.275855979 #> PeerPop3~SelfPop2 0.056 8.748955e-04 0.947 0.202017999 #> Support3~PeerPop2 0.132 -6.281368e-04 0.951 0.330559605 #> SD FMI1 #> PeerPop2~SelfPop1 0.002715199 #> Support2~PeerPop1 0.020332822 #> PeerPop3~SelfPop2 0.015647460 #> Support3~PeerPop2 0.020608104 #> [1] "=================Optimal patterns===============" #> PeerPop1 SelfPop1 Support1 PeerPop2 SelfPop2 Support2 PeerPop3 #> 0 0 0 0 1 0 0 #> 0 0 0 0 0 1 0 #> 0 0 0 0 0 0 0 #> completers 0 0 0 0 0 0 0 #> SelfPop3 Support3 #> 1 1 #> 1 0 #> 1 0 #> completers 0 0 #> [1] "=================Optimal probs==================" #> [1] 0.3 0.3 0.3 0.1 #> [1] "=================Optimal ns====================" #> [1] 300 300 300 100
plotPM(forward.ex3, row.names=F)