This vignette shows how you can further investigate your models using Monte Carlo simulations. In addition to reporting power, simulations allows you to investigate parameter bias, model misspecification, Type I errors and convergence issues. For instance, even if the analytical power calculations work well, it can be useful to check the small sample properties of the model, e.g. estimates of cluster-level random effects with only a few clusters.

The simulation-function accepts the same study setup-object that we have used in the two- and three-level power vignettes. The simulation-function supports all models that are available in **powerlmm**.

```
#cores <- parallel::detectCores(logical = FALSE) # use all physical CPU cores
cores <- 16
nsim <- 1000
```

We will start with an example of how to evaluate the partially nested three-level model. When doing simulations it can be a good idea to specify the model using raw values, to ensure all parameters are reasonable. Here I use estimates from a real clinical trial.

```
library(powerlmm)
p <- study_parameters(n1 = 11,
n2 = 10,
n3 = 6,
fixed_intercept = 37,
fixed_slope = -0.64,
sigma_subject_intercept = 2.8,
sigma_subject_slope = 0.4,
sigma_cluster_intercept = 0,
cor_subject = -0.2,
icc_slope = 0.05,
sigma_error = 2.6,
dropout = dropout_weibull(proportion = 0.3,
rate = 1/2),
partially_nested = TRUE,
effect_size = cohend(-0.8,
standardizer = "pretest_SD"))
p
```

```
##
## Study setup (three-level, partially nested)
##
## n1 = 11
## n2 = 10 x 6 (treatment)
## 60 (control)
## n3 = 6 (treatment)
## 0 (control)
## 6 (total)
## total_n = 60 (treatment)
## 60 (control)
## 120 (total)
## dropout = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 (time)
## 0, 11, 15, 18, 20, 22, 24, 26, 27, 29, 30 (%, control)
## 0, 11, 15, 18, 20, 22, 24, 26, 27, 29, 30 (%, treatment)
## icc_pre_subjects = 0.54
## icc_pre_clusters = 0
## icc_slope = 0.05
## var_ratio = 0.02
## effect_size = -0.8 (Cohen's d [SD: pretest_SD, control])
```

The `get_monte_carlo_se`

-function accepts a power object, and will return the 95 % interval for a given amount of simulations. This is useful to gauge the precision of the empirical power estimates.

```
x <- get_power(p)
get_monte_carlo_se(x, nsim = c(1000, 2000, 5000))
```

```
## power se lwr_95 upr_95
## 1 0.65 0.0151 0.62 0.68
## 2 0.65 0.0107 0.63 0.67
## 3 0.65 0.0067 0.64 0.66
```

We would need about 5000 simulation if we want our empirical power to be +/- 1% from the analytical power.

Now, let’s run the simulation. First we need to write the **lme4**-formula, since the simulated data sets are analyzed using `lme4::lmer`

. Then we pass the `study_parameters`

-object to the `simulate`

function, and when the simulation is finished we use `summary`

to view the results. You can run the simulation in parallel by settings `cores`

> 1. The parallel computations will be done using the **parallel** package. If performed on a Windows machine *or* inside a GUI, then PSOCK clusters are used, if on Unix/macOS and the simulation is run non-interactively in a terminal then forking is used.

```
f <- sim_formula("y ~ treatment * time +
(1 + time | subject) +
(0 + treatment:time | cluster)")
if(RUN_SIM) {
res <- simulate(object = p,
nsim = nsim,
formula = f,
satterthwaite = TRUE,
cores = cores,
save = FALSE)
}
summary(res)
```

```
## Model: default
##
## Random effects
##
## parameter M_est theta est_rel_bias prop_zero is_NA
## subject_intercept 7.900 7.8000 0.0067 0.00 0
## subject_slope 0.160 0.1600 -0.0160 0.00 0
## cluster_slope 0.012 0.0084 0.4500 0.42 0
## error 6.800 6.8000 -0.0014 0.00 0
## cor_subject -0.200 -0.2000 -0.0210 0.00 0
##
## Fixed effects
##
## parameter M_est theta M_se SD_est Power Power_bw Power_satt
## (Intercept) 37.000 37.00 0.420 0.430 1.000 . .
## treatment 0.031 0.00 0.590 0.580 0.046 . .
## time -0.640 -0.64 0.068 0.072 1.000 . .
## treatment:time -0.300 -0.31 0.110 0.100 0.810 0.62 0.77
## ---
## Number of simulations: 1000 | alpha: 0.05
## Time points (n1): 11
## Subjects per cluster (n2 x n3): 10 x 6 (treatment)
## 60 (control)
## Total number of subjects: 120
```

`## [Model: default] 0.4% of the models threw convergence warnings`

It is also possible to automatically create the `formula`

by leaving it blank. Then the `lmer-formula`

is created by including the parameters that are not specified as `NA`

or `NULL`

in the model, see `?create_lmer_formula`

. N.B., parameters specified as 0 are included in the model.

Let’s compare the correct partially nested model to a two-level linear mixed-effects model. The `sim_formula_compare`

function lets us fit multiple models to each simulated data set, by passing the model formulas as named arguments. By setting `effect_size`

to 0 the simulation results tells us the empirical Type I errors for the correct and misspecified model.

```
p <- update(p, effect_size = 0)
f <- sim_formula_compare("correct" = "y ~ treatment * time +
(1 + time | subject) +
(0 + treatment:time | cluster)",
"wrong" = "y ~ treatment * time +
(1 + time | subject)")
if(RUN_SIM) {
res2 <- simulate(object = p,
nsim = nsim,
formula = f,
satterthwaite = TRUE,
cores = cores,
save = FALSE)
}
summary(res2)
```

```
## Model: correct
##
## Random effects
##
## parameter M_est theta est_rel_bias prop_zero is_NA
## subject_intercept 7.800 7.8000 -0.0064 0.00 0
## subject_slope 0.160 0.1600 -0.0200 0.00 0
## cluster_slope 0.013 0.0084 0.5000 0.43 0
## error 6.700 6.8000 -0.0015 0.00 0
## cor_subject -0.190 -0.2000 -0.0420 0.00 0
##
## Fixed effects
##
## parameter M_est theta M_se SD_est Power Power_bw Power_satt
## (Intercept) 37.0000 37.00 0.420 0.420 1.000 . .
## treatment -0.0120 0.00 0.590 0.580 0.055 . .
## time -0.6400 -0.64 0.068 0.068 1.000 . .
## treatment:time 0.0049 0.00 0.110 0.100 0.046 0.004 0.042
## ---
## Model: wrong
##
## Random effects
##
## parameter M_est theta est_rel_bias prop_zero is_NA
## subject_intercept 7.80 7.80 -0.0064 0 0
## subject_slope 0.16 0.16 0.0130 0 0
## error 6.70 6.80 -0.0015 0 0
## cor_subject -0.19 -0.20 -0.0580 0 0
##
## Fixed effects
##
## parameter M_est theta M_se SD_est Power Power_bw Power_satt
## (Intercept) 37.0000 37.00 0.420 0.420 1.000 . .
## treatment -0.0120 0.00 0.590 0.580 0.055 . .
## time -0.6400 -0.64 0.069 0.068 1.000 . .
## treatment:time 0.0048 0.00 0.097 0.100 0.060 0.01 0.058
## ---
## Number of simulations: 1000 | alpha: 0.05
## Time points (n1): 11
## Subjects per cluster (n2 x n3): 10 x 6 (treatment)
## 60 (control)
## Total number of subjects: 120
```

`## [Model: correct] 0.3% of the models threw convergence warnings`

All the parameters are summarized and presented for both models. Since we specified `satterthwaite = TRUE`

, empirical power is both based on the Wald Z test and the Wald *t* test using Satterthwaite’s *df* approximation (from the **lmerTest** package).

We can also simulate multiple designs and compare them. Let’s compare 6 vs 12 clusters in the treatment arm, and a `ìcc_slope`

of 0.05 and 0.1, and a Cohen’s d of 0.5 and 0.8.

```
p <- study_parameters(n1 = 11,
n2 = 10,
n3 = c(6, 12),
fixed_intercept = 37,
fixed_slope = -0.64,
sigma_subject_intercept = 2.8,
sigma_subject_slope = 0.4,
sigma_cluster_intercept = 0,
cor_subject = -0.2,
icc_slope = c(0.05, 0.1),
sigma_error = 2.6,
dropout = dropout_weibull(proportion = 0.3,
rate = 1/2),
partially_nested = TRUE,
effect_size = cohend(c(0.5, 0.8),
standardizer = "pretest_SD"))
f <- sim_formula("y ~ treatment * time + (1 + time | subject) +
(0 + treatment:time | cluster)")
if(RUN_SIM) {
res3 <- simulate(object = p,
nsim = nsim,
formula = f,
satterthwaite = TRUE,
cores = cores,
batch_progress = FALSE)
}
```

The setup and simulation is the same, except that we define vectors of values for some parameters. In this scenario we have a total of 2 * 2 * 2 = 8 different combination of parameter values. For each of the 8 setups `nsim`

number of simulations will be performed. With multiple designs `summary`

works a bit differently. Instead of summarizing all parameters, we now have to choose which parameters to summarize.

```
# Summarize the 'time:treatment' results
x <- summary(res3, para = "treatment:time")
# customize what to print
print(x, add_cols = c("n3_tx", "icc_slope", "effect_size"),
estimates = FALSE)
```

```
## Model: 'All' | Type: 'fixed' | Parameter(s): 'treatment:time'
## ---
## effect_size icc_slope n3_tx Power Power_bw Power_satt
## 0.5 0.05 6 0.45 0.23 0.40
## 0.5 0.05 12 0.74 0.64 0.72
## 0.5 0.10 6 0.41 0.23 0.36
## 0.5 0.10 12 0.69 0.60 0.66
## 0.8 0.05 6 0.82 0.62 0.78
## 0.8 0.05 12 0.98 0.97 0.98
## 0.8 0.10 6 0.77 0.54 0.71
## 0.8 0.10 12 0.96 0.94 0.95
## ---
## nsim: 1000 | 29 columns not shown
```

```
# Summarize the cluster-level random slope
x <- summary(res3, para = "cluster_slope")
print(x, add_cols = c("n3_tx", "icc_slope", "effect_size"))
```

```
## Model: 'All' | Type: 'random' | Parameter(s): 'cluster_slope'
## ---
## effect_size icc_slope n3_tx M_est theta
## 0.5 0.05 6 0.012 0.0084
## 0.5 0.05 12 0.010 0.0084
## 0.5 0.10 6 0.020 0.0178
## 0.5 0.10 12 0.018 0.0178
## 0.8 0.05 6 0.012 0.0084
## 0.8 0.05 12 0.010 0.0084
## 0.8 0.10 6 0.021 0.0178
## 0.8 0.10 12 0.018 0.0178
## ---
## nsim: 1000 | 28 columns not shown
```

Sometimes it is useful to compare if the full longitudinal model is preferred over just analyzing posttest data. In this example we will compare the 2-level LMM to models that analyze the results as a cross-sectional model at posttest. This can be accomplished by using the `data_transform`

argument, note that we must specify `test = "treatment"`

for the OLS posttest models.

```
p <- study_parameters(n1 = 11,
n2 = 40,
icc_pre_subject = 0.5,
cor_subject = -0.5,
var_ratio = c(0, 0.01, 0.02, 0.03),
dropout = c(0, dropout_weibull(0.3, 1)),
effect_size = cohend(c(0, 0.8)))
# Formulas --------------------------------------------------------------------
# OLS "t-test"
f_PT <- sim_formula("y ~ treatment",
test = "treatment",
data_transform = transform_to_posttest)
# ANCOVA
f_PT_pre <- sim_formula("y ~ treatment + pretest",
test = "treatment",
data_transform = transform_to_posttest)
# analyze as 2-level longitudinal
f_LMM <- sim_formula("y ~ time * treatment +
(1 + time | subject)")
# constrain treatment differences at pretest
f_LMM_c <- sim_formula("y ~ time + time:treatment +
(1 + time | subject)")
# combine formulas
f <- sim_formula_compare("posttest" = f_PT,
"ANCOVA" = f_PT_pre,
"LMM" = f_LMM,
"LMM_c" = f_LMM_c)
# Run sim --------------------------------------------------------------------
if(RUN_SIM) {
res4 <- simulate(p,
formula = f,
nsim = nsim,
cores = cores,
satterthwaite = TRUE,
batch_progress = FALSE)
}
```

We are mostly interested in the treatment effect, so we use the `para`

argument. Since the treatment effect is represented by different terms in the cross-sectional and longitudinal model we specify the name of the treatment effect for each model.

```
tests <- list("posttest" = "treatment",
"ANCOVA" = "treatment",
"LMM" = "time:treatment",
"LMM_c" = "time:treatment")
x <- summary(res4, para = tests)
print(head(x, 5),
add_cols = c("var_ratio",
"effect_size"))
```

```
## Model: 'All' | Type: 'fixed' | Parameter(s): 'treatment', 'time:treatment'
## ---
## model effect_size var_ratio M_est theta M_se SD_est Power Power_bw Power_satt
## ANCOVA 0.0 0.00 0.102 0 2.7 2.7 0.056 0.049 0.049
## ANCOVA 0.0 0.01 -0.022 0 3.1 3.0 0.047 0.042 0.042
## ANCOVA 0.0 0.02 -0.077 0 3.6 3.6 0.060 0.056 0.056
## ANCOVA 0.0 0.03 0.148 0 4.1 4.0 0.045 0.042 0.042
## ANCOVA 0.8 0.00 11.196 0 2.7 2.7 0.980 0.978 0.978
## ---
## nsim: 1000 | 23 columns not shown
```

We only show the first 5 results, for these data a figure will show the results better.

```
library(ggplot2)
x$model <- factor(x$model, levels = c("posttest",
"ANCOVA",
"LMM",
"LMM_c"))
d_limits <- data.frame(effect_size = c(0),
Power_satt = c(0.025, 0.075),
var_ratio = 0,
model = 0,
dropout = 0)
d_hline <- data.frame(effect_size = c(0, 0.8),
x = c(0.05, 0.8))
ggplot(x, aes(model,
Power_satt,
group = interaction(var_ratio, dropout),
color = factor(var_ratio),
linetype = factor(dropout))) +
geom_hline(data = d_hline, aes(yintercept = x)) +
geom_point() +
geom_blank(data = d_limits) +
geom_line() +
labs(y = "Power (Satterthwaite)",
linetype = "Dropout",
color = "Variance ratio",
title = "Power and Type I errors",
subtitle = 'Comparing "cross-sectional" and longitudinal models\nunder various amounts of heterogeneity and dropout') +
facet_grid(dropout ~ effect_size,
scales = "free",
labeller = "label_both") +
coord_flip()
```

More information about this simulation can be found in this blog post http://rpsychologist.com/do-you-need-multilevel-powerlmm-0-4-0.

A common strategy when analyzing (longitudinal) data is to build the model in a data driven fashion. We can investigate how well such a strategy works using `sim_formula_compare`

. We’ll define four model formulas, starting with a 2-level random intercept model and working up to a 3-level random intercept and slope model. The true model is a 3-level model with only a random slope at the third level. Let’s first setup the model

```
p <- study_parameters(n1 = 11,
n2 = 40,
n3 = 3,
icc_pre_subject = 0.5,
cor_subject = -0.5,
icc_slope = 0.05,
var_ratio = 0.03)
f0 <- sim_formula("y ~ time * treatment + (1 | subject)")
f1 <- sim_formula("y ~ time * treatment + (1 + time | subject)")
f2 <- sim_formula("y ~ time * treatment + (1 + time | subject) + (0 + time | cluster)")
f3 <- sim_formula("y ~ time * treatment + (1 + time | subject) + (1 + time | cluster)")
f <- sim_formula_compare("subject-intercept" = f0,
"subject-slope" = f1,
"cluster-slope" = f2,
"cluster-intercept" = f3)
```

Then we run the simulation, the four model formulas in `f`

will be fit the each data set.

```
if(RUN_SIM) {
res5 <- simulate(p, formula = f,
nsim = nsim,
satterthwaite = TRUE,
cores = cores,
CI = FALSE)
}
```

During each simulation the REML log-likelihood is saved for each model, we can therefore perform the model selection using the summary method, as a post-processing step. Since REML is used it is assumed the fixed effects are the same, and that we compare a “full model” to a “reduced model”. Let’s try a forward selection strategy, where start with the first model and compare it to the next. If the comparison is significant we continue testing models, else we stop. The summary function performs this model comparison for each of the `nsim`

simulations and returns the results from the “winning” model in each simulation replication.

```
x <- summary(res5, model_selection = "FW", LRT_alpha = 0.05)
x
```

```
## Model: model_selection
##
## Random effects
##
## parameter M_est theta est_rel_bias prop_zero is_NA
## subject_intercept 100.000 100.00 -0.00098 0.00 0
## subject_slope 2.900 2.80 0.00690 0.00 0
## error 100.000 100.00 0.00140 0.00 0
## cor_subject -0.500 -0.50 -0.00780 0.00 0
## cluster_slope 0.280 0.15 0.85000 0.00 0
## cluster_intercept 8.000 0.00 8.00000 0.00 0
## cor_cluster 0.019 0.00 0.01900 0.56 0
##
## Fixed effects
##
## parameter M_est theta M_se SD_est Power Power_bw Power_satt
## (Intercept) 0.02800 0 1.10 1.10 0.054 . .
## time -0.00077 0 0.25 0.28 0.140 . .
## treatment -0.04300 0 1.50 1.50 0.047 . .
## time:treatment 0.00150 0 0.35 0.40 0.120 0.049 0.12
## ---
## Number of simulations: 1000 | alpha: 0.05
## Time points (n1): 11
## Subjects per cluster (n2 x n3): 40 x 3 (treatment)
## 40 x 3 (control)
## Total number of subjects: 240
## ---
## Results based on LRT model comparisons, using direction: FW (alpha = 0.05)
## Model selected (proportion)
## cluster-intercept cluster-slope subject-slope
## 0.009 0.422 0.569
```

The point of the model selection algorithm is to mimic a type of data driven model selection that is quite common. We see that this strategy do not lead to nominal Type I errors in this scenario. However, it is fairly common to increase the LRT’s alpha level to try to improve this strategy. Let’s try a range of alpha levels to see the impact.

```
alphas <- seq(0.01, 0.5, length.out = 50)
x <- vapply(alphas, function(a) {
type1 <- summary(res5, model_selection = "FW", LRT_alpha = a)
type1$summary$model_selection$FE$Power_satt[4]
}, numeric(1))
d <- data.frame(LRT_alpha = alphas, type1 = x)
ggplot(d, aes(LRT_alpha, type1)) +
geom_line() +
geom_hline(yintercept = 0.05, linetype = "dotted") +
labs(y = "Type I errors (time:treatment)",
title = "Impact of the alpha level when using LRT for model selection")
```

We can also see the results from each of the four models. Here we will look at the `time:treatment`

effect.

```
x1 <- summary(res5, para = "time:treatment")
x1
```

```
## Model: summary
##
## Fixed effects: 'time:treatment'
##
## model M_est theta M_se SD_est Power Power_bw Power_satt
## subject-intercept 0.0015 0 0.14 0.4 0.500 0.320 0.490
## subject-slope 0.0015 0 0.25 0.4 0.210 0.073 0.200
## cluster-slope 0.0015 0 0.39 0.4 0.083 0.023 0.051
## cluster-intercept 0.0015 0 0.40 0.4 0.078 0.025 0.041
## ---
## Number of simulations: 1000 | alpha: 0.05
## Time points (n1): 11
## Subjects per cluster (n2 x n3): 40 x 3 (treatment)
## 40 x 3 (control)
## Total number of subjects: 240
```

We see that the 2-lvl random intercept model lead to substantially inflated Type I errors = 0.49. The 2-level model that also adds a random slope is somewhat better but still not good, Type I errors = 0.2. The correct 3-level model that account for the third level using a random slope have close to nominal Type I errors = 0.05. The full 3-level that adds an unnecessary random intercept is somewhat conservative, Type I errors = 0.04.

When choosing a strategy Type I errors is not only factor we want to minimize, power is also important. So let’s see how power is affected.

```
# See if power is impacted
p1 <- update(p, effect_size = cohend(0.8))
if(RUN_SIM) {
res6 <- simulate(p1,
formula = f,
nsim = nsim,
satterthwaite = TRUE,
cores = cores,
CI = FALSE)
}
# we can also summary a fixed effect for convenience
x <- summary(res6, model_selection = "FW", para = "time:treatment")
print(x, verbose = FALSE)
```

```
## Model: summary
##
## Fixed effects: 'time:treatment'
##
## model M_est theta M_se SD_est Power Power_bw Power_satt
## model_selection 1.1 1.1 0.37 0.41 0.79 0.58 0.66
## ---
```

```
x1 <- summary(res6, para = "time:treatment")
print(x1, verbose = FALSE)
```

```
## Model: summary
##
## Fixed effects: 'time:treatment'
##
## model M_est theta M_se SD_est Power Power_bw Power_satt
## subject-intercept 1.1 1.1 0.14 0.41 0.99 0.98 0.99
## subject-slope 1.1 1.1 0.25 0.41 0.94 0.84 0.94
## cluster-slope 1.1 1.1 0.39 0.41 0.77 0.54 0.62
## cluster-intercept 1.1 1.1 0.40 0.41 0.77 0.53 0.56
## ---
```

We already investigated the consequences of ignoring the cluster-level. We can also investigate the consequences of ignoring higher-level clustering in a posttest only analysis, by using the `data_transform`

argument in `sim_formula`

.

```
# simulation formulas
# analyze as a posttest only 1-level OLS model
f0 <- sim_formula("y ~ treatment",
test = "treatment",
data_transform = transform_to_posttest)
# analyze as a posttest only 2-level model
f1 <- sim_formula("y ~ treatment + (1 | cluster)",
test = "treatment",
data_transform = transform_to_posttest)
f <- sim_formula_compare("post_ignore" = f0,
"post_2lvl" = f1)
if(RUN_SIM) {
res7 <- simulate(p,
formula = f,
nsim = nsim,
cores = cores,
satterthwaite = TRUE,
batch_progress = FALSE)
}
# Type I error
print(summary(res7, para = "treatment"), verbose = FALSE)
```

```
## Model: summary
##
## Fixed effects: 'treatment'
##
## model M_est theta M_se SD_est Power Power_bw Power_satt
## post_ignore 0.038 0 2.3 4.1 0.29 0.110 0.280
## post_2lvl 0.038 0 3.8 4.1 0.12 0.038 0.073
## ---
```

We see that the Type I errors is substantially increased when clustering is ignored.

```
# model selection
summary(res7,
model_selection = "FW",
para = "treatment",
LRT_alpha = 0.1)
```

```
## Model: summary
##
## Fixed effects: 'treatment'
##
## model M_est theta M_se SD_est Power Power_bw Power_satt
## model_selection 0.038 0 3.5 4.1 0.17 0.061 0.16
## ---
## Number of simulations: 1000 | alpha: 0.05
## Time points (n1): 11
## Subjects per cluster (n2 x n3): 40 x 3 (treatment)
## 40 x 3 (control)
## Total number of subjects: 240
## ---
## Results based on LRT model comparisons, using direction: FW (alpha = 0.1)
## Model selected (proportion)
## post_2lvl post_ignore
## 0.468 0.532
## ---
## At least one of the models applied a data transformation during simulation,
## summaries that depend on the true parameter values will no longer be correct,
## see 'help(summary.plcp_sim)'
```

Just as with the longitudinal model, the LRT model selection strategy rejects the 2-level model too often leading to inflated Type I errors.