The `simglm`

package offers users the ability to simulate from a variety of generalized linear models, both single level and multilevel generalized models. Instead of using the `sim_reg`

function to call these, there is now a `sim_glm`

function to use.

Similar to the `sim_reg`

function, one benefit of this package for simulation is that the intermediate steps are returned as well. This is useful for additional processing of the data, for example to add in your own missing data function.

Here is an example for simulating a single level logistic model:

```
fixed <- ~ 1 + act + diff
fixed_param <- c(2, 0.5, 0.3)
cov_param <- list(dist_fun = c('rnorm', 'rnorm'),
var_type = c("single", "single"),
opts = list(list(mean = 0, sd = 4),
list(mean = 0, sd = 3)))
n <- 150
temp_single <- sim_glm(fixed = fixed, fixed_param = fixed_param,
cov_param = cov_param,
n = n, data_str = "single", outcome_type = 'logistic')
print(temp_single, n = 5)
```

```
## # A tibble: 150 x 7
## X.Intercept. act diff Fbeta logistic sim_data ID
## * <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
## 1 1 -3.505898 3.4620018 1.285651 0.7834102 1 1
## 2 1 7.486467 3.8626606 6.902032 0.9989953 1 2
## 3 1 -2.649272 1.9332594 1.255342 0.7782232 0 3
## 4 1 1.543911 0.4931161 2.919890 0.9488210 1 4
## 5 1 1.356734 -2.4919101 1.930794 0.8733373 1 5
## # ... with 145 more rows
```

As you can see from the code above, the syntax is virtually identical to the syntax for the `sim_reg`

function. The largest difference is the omission of the `error_var`

and `rand_gen`

commands. The returned data frame includes the response variable in the logistic function (Fbeta), the probability found by taking \(\frac{exp(Fbeta)}{1 + exp(Fbeta)}\) (logistic), and the returned 0/1 variable by using the `rbinom`

function using the probabilities defined above (sim_data).

Adding in additional levels is straightforward and again very similar to the `sim_reg`

function. Here is a two level example with students nested in classrooms, the **act** variable is treated as a classroom variable:

```
# Longitudinal linear mixed model example
fixed <- ~1 + diff + act
random <- ~1
fixed_param <- c(2, 0.5, 0.3)
random_param <- list(random_var = 7, rand_gen = "rnorm", ther_sim = TRUE)
cov_param <- list(dist_fun = c('rnorm', 'rnorm'),
var_type = c("level1", "level2"),
opts = list(list(mean = 0, sd = 2),
list(mean = 0, sd = 1.4)))
n <- 150
p <- 30
data_str <- "cross"
temp_cross <- sim_glm(fixed, random, random3 = NULL, fixed_param,
random_param, random_param3 = NULL,
cov_param, k = NULL, n, p,
data_str = data_str, outcome_type = 'logistic')
print(temp_cross, n = 5)
```

```
## # A tibble: 4,500 x 11
## X.Intercept. diff act b0 Fbeta randEff
## * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 -1.4036140 -2.651821 -0.6863745 0.50264674 -0.6863745
## 2 1 -2.2547237 -2.651821 -0.6863745 0.07709191 -0.6863745
## 3 1 0.7395251 -2.651821 -0.6863745 1.57421630 -0.6863745
## 4 1 3.8340366 -2.651821 -0.6863745 3.12147206 -0.6863745
## 5 1 -1.7756291 -2.651821 -0.6863745 0.31663919 -0.6863745
## # ... with 4,495 more rows, and 5 more variables: logistic <dbl>,
## # prob <dbl>, sim_data <dbl>, withinID <int>, clustID <int>
```

Below is sample code for a three level example. Primary differences are the additional terms associated with the third level.

```
fixed <- ~1 + diff + act + actClust
random <- ~1
random3 <- ~ 1
fixed_param <- c(4, 0.8, 0.15, 1.1)
random_param <- list(random_var = 7, rand_gen = "rnorm")
random_param3 <- list(random_var = 4, rand_gen = "rnorm")
cov_param <- list(dist_fun = c('rnorm', 'rnorm', 'rnorm'),
var_type = c("level1", "level2", "level3"),
opts = list(list(mean = 0, sd = 1.5),
list(mean = 0, sd = 4),
list(mean = 0, sd = 2)))
k <- 10
n <- 150
p <- 30
data_str <- "cross"
temp_three <- sim_glm(fixed, random, random3, fixed_param, random_param,
random_param3, cov_param, k, n, p, data_str = data_str,
outcome_type = 'logistic')
print(temp_three, n = 5)
```

```
## # A tibble: 45,000 x 15
## X.Intercept. diff act actClust b0_2 b0_3
## * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 -3.4065510 -0.1764601 -0.4651754 -2.377952 -1.229453
## 2 1 -0.1871581 -0.1764601 -0.4651754 -2.377952 -1.229453
## 3 1 -1.1366189 -0.1764601 -0.4651754 -2.377952 -1.229453
## 4 1 -1.8038968 -0.1764601 -0.4651754 -2.377952 -1.229453
## 5 1 -1.0277931 -0.1764601 -0.4651754 -2.377952 -1.229453
## # ... with 4.5e+04 more rows, and 9 more variables: Fbeta <dbl>,
## # randEff <dbl>, randEff3 <dbl>, logistic <dbl>, prob <dbl>,
## # sim_data <dbl>, withinID <int>, clustID <int>, clust3ID <int>
```

The package also has the ability to simulate count outcomes in addition to the 0/1 dichotomous outcomes. The syntax is the same as above, except the `outcome_type`

argument is modified from `logistic`

to `poisson`

. As some may have realized from above, the dichotomous outcomes are assumed to follow a logistic metric where as the count outcomes follow a poisson model. Belwo is an adapted example from above.

```
fixed <- ~ 1 + act + diff
fixed_param <- c(-0.5, 0.5, 0.3)
cov_param <- list(dist_fun = c('rnorm', 'rnorm'),
var_type = c("single", "single"),
opts = list(list(mean = 0, sd = 4),
list(mean = 0, sd = 3)))
n <- 150
temp_single <- sim_glm(fixed = fixed, fixed_param = fixed_param,
cov_param = cov_param,
n = n, data_str = "single", outcome_type = 'poisson')
print(temp_single, n = 5)
```

```
## # A tibble: 150 x 7
## X.Intercept. act diff Fbeta poisson sim_data ID
## * <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
## 1 1 -0.8931976 -4.5597417 -2.3145213 0.09881348 0 1
## 2 1 -9.6980629 2.6928798 -4.5411675 0.01066095 0 2
## 3 1 5.5823732 -1.9424863 1.7084407 5.52034705 7 3
## 4 1 0.6918532 -1.3531627 -0.5600222 0.57119638 1 4
## 5 1 -5.5337912 -0.3045037 -3.3582467 0.03479621 0 5
## # ... with 145 more rows
```