The following examples demonstrate the use of the corrsys and corrsys2 functions within the SimRepeat package. These functions generate correlated systems of M equations representing a system of repeated measures at M time points. The equations may contain 1) ordinal (\(r \geq 2\) categories), continuous (normal, non-normal, and mixture distributions), and/or count (regular and zero-inflated, Poisson and Negative Binomial) independent variables \(X\); 2) continuous error terms \(E\); 3) a discrete time variable \(Time\); and 4) random effects \(U\). The random effects may be a random intercept, a random slope for time, or a random slope for any of the \(X\) variables. The important assumptions are:

  1. There are at least 2 equations with at least 1 independent variable total.
  2. The independent variables, random effect terms, and error terms are uncorrelated.
  3. Each equation has an error term.
  4. All error terms have continuous non-mixture distributions or all have continuous mixture distributions.
  5. All random effects are continuous.
  6. Growth is linear with respect to time.

The outcomes \(Y\) are generated using a hierarchical linear models (HLM) approach. See The Hierarchical Linear Models Approach for a System of Correlated Equations with Multiple Variable Types vignette for a description of the HLM model. The independent variables, error terms, and random effects are generated from multivariate normal variables with intermediate correlations calculated using either SimCorrMix::intercorr and correlation method 1 for corrsys or SimCorrMix::intercorr2 and correlation method 2 for corrsys2. See the SimCorrMix package for a description of the correlation methods and the techniques used to generate each variable type.

The corrsys and corrsys2 functions contain no parameter checks in order to decrease simulation time. That should be done first using checkpar. Summaries of the system can be obtained using summary_sys. More information regarding function inputs can be found by consulting the function documentation. Some code has been adapted from the SimMultiCorrData (Fialkowski 2017) and SimCorrMix (Fialkowski 2018) packages.

Example 1 demonstrates the corrsys function to generate a system of 3 equations for 5 independent variables with no random effects. Example 2 uses the corrsys2 function to demonstrate how to handle missing variables by removing the independent variables from \(Y_2\) in Example 1. Example 3 generates a system of system of 4 equations with random effects.

Notes on fitting linear mixed models in R:

There are two main options for fitting mixed effects models in R. The nlme package (Pinheiro et al. 2017) permits linear and nonlinear mixed models. Linear models are fit using lme (see Laird and Ware (1982)); nonlinear models are fit using nlme (see Lindstrom and Bates (1990)). Both allow nested random effects and the within-group errors can be correlated and/or have unequal variances. The correlation argument permits several different correlation structures, including AR(1) and CS (see that package’s documentation). Setting weights = varIdent(form = \(\sim\) 1 | time) achieves unequal variances across time.

The lme4 package (Bates, Mächler, et al. 2015) permits linear (lmer), generalized linear (glmer), and nonlinear (nlmer) mixed models. Since there are no specific function arguments, implementing different error term correlation structures is more difficult than in nlme. Certain error structures can be achieved through manipulation of the data and models (see Bates, Mächler, et al. (2015)). However, lme4 implements crossed random effects more efficiently and contains facilities for likelihood profiling and parametric bootstrapping. The lmerTest package (Kuznetsova, Brockhoff, and Christensen 2017) provides \(p\)-values for beta coefficients.

Example 1: System of 3 equations for 5 independent variables

Description of Variables

  1. Ordinal variable: \(X_{ord(1)}\) has 3 categories (i.e., drug treatment) and is the same in each equation
  2. Continuous variables:
  1. \(X_{cont(1)}\) is a time-varying covariate (subject-level term) with a Rice distribution with increasing variance and an AR(1, p = 0.5) correlation structure
  1. \(X_{cont(11)}\) has a Rice(1, 0.5) distribution which requires a sixth cumulant correction of 0.08
  2. \(X_{cont(21)}\) has a Rice(2, 2) distribution which requires a sixth cumulant correction of 0.12
  3. \(X_{cont(31)}\) has a Rice(4, 8) distribution which requires a sixth cumulant correction of 0.36
  1. \(X_{mix(1)}\) is a mixture of Normal(-5, 2) and Normal(3, 1) with mixing parameters 0.3 and 0.7; it is a time-varying covariate (subject-level term) and the components have an AR(1, p = 0.4) correlation structure across Y
  1. Poisson variable: \(X_{pois(1)}\) is a zero-inflated Poisson variable with \(\lambda = 15\), the probability of a structural zero set at \(0.10\), and is the same in each equation
  2. Negative Binomial variable: \(X_{nb(1)}\) is a regular NB time-varying covariate (subject-level term) with an AR(1, p = 0.3) correlation structure and increasing mean and variance
  1. \(X_{nb(11)}\) has a size of 10 and mean of 3
  2. \(X_{nb(21)}\) has a size of 10 and mean of 4
  3. \(X_{nb(31)}\) has a size of 10 and mean of 5
  1. Error terms have Skewnormal distributions which become more skewed over time with an AR(1, p = 0.4) correlation structure:
  1. \(E_1\) has a Skewnormal(0, 1, 1) distribution which requires a sixth cumulant correction of 0.06
  2. \(E_2\) has a Skewnormal(0, 1, 5) distribution
  3. \(E_3\) has a Skewnormal(0, 1, 25) distribution which requires a sixth cumulant correction of 0.15

Description of Variables

There is an interaction between \(X_{ord(1)}\) and \(X_{pois(1)}\) for each \(Y\). Since they are both group-level covariates, the interaction is also a group-level covariate that will interact with the subject-level covariates \(X_{cont(1)}\), \(X_{mix(1)}\), and \(X_{nb(1)}\). However, only \(X_{ord(1)}\) and \(X_{pois(1)}\) interact with time in this example (normally their interaction would also interact with time). The ordering in the equations below reflects the ordering in the simulation process.

\[\begin{equation} \begin{split} Y_1 &= \beta_0 + \beta_1 * X_{ord(1)} + \beta_2 * X_{cont(11)} + \beta_3 * X_{mix(11)} + \beta_4 * X_{pois(1)} + \beta_5 * X_{nb(11)} + \beta_{int} * X_{ord(1)} * X_{pois(1)} \\ &+ \beta_{subj1} * X_{ord(1)} * X_{cont(11)} + \beta_{subj2} * X_{pois(1)} * X_{cont(11)} + \beta_{subj3} * X_{ord(1)} * X_{pois(1)} * X_{cont(11)} \\ &+ \beta_{subj4} * X_{ord(1)} * X_{mix(11)} + \beta_{subj5} * X_{pois(1)} * X_{mix(11)} + \beta_{subj6} * X_{ord(1)} * X_{pois(1)} * X_{mix(11)} \\ &+ \beta_{subj7} * X_{ord(1)} * X_{nb(11)} + \beta_{subj8} * X_{pois(1)} * X_{nb(11)} + \beta_{subj9} * X_{ord(1)} * X_{pois(1)} * X_{nb(11)} \\ &+ \beta_{tint1} * X_{ord(1)} * Time_1 + \beta_{tint2} * X_{pois(1)} * Time_1 + \beta_{t} * Time_1 + E_1 \end{split} \tag{1} \end{equation}\] \[\begin{equation} \begin{split} Y_2 &= \beta_0 + \beta_1 * X_{ord(1)} + \beta_2 * X_{cont(21)} + \beta_3 * X_{mix(21)} + \beta_4 * X_{pois(1)} + \beta_5 * X_{nb(21)} + \beta_{int} * X_{ord(1)} * X_{pois(1)} \\ &+ \beta_{subj1} * X_{ord(1)} * X_{cont(21)} + \beta_{subj2} * X_{pois(1)} * X_{cont(21)} + \beta_{subj3} * X_{ord(1)} * X_{pois(1)} * X_{cont(21)} \\ &+ \beta_{subj4} * X_{ord(1)} * X_{mix(21)} + \beta_{subj5} * X_{pois(1)} * X_{mix(21)} + \beta_{subj6} * X_{ord(1)} * X_{pois(1)} * X_{mix(21)} \\ &+ \beta_{subj7} * X_{ord(1)} * X_{nb(21)} + \beta_{subj8} * X_{pois(1)} * X_{nb(21)} + \beta_{subj9} * X_{ord(1)} * X_{pois(1)} * X_{nb(21)} \\ &+ \beta_{tint1} * X_{ord(1)} * Time_2 + \beta_{tint2} * X_{pois(1)} * Time_2 + \beta_{t} * Time_2 + E_2 \end{split} \tag{2} \end{equation}\] \[\begin{equation} \begin{split} Y_3 &= \beta_0 + \beta_1 * X_{ord(1)} + \beta_2 * X_{cont(31)} + \beta_3 * X_{mix(31)} + \beta_4 * X_{pois(1)} + \beta_5 * X_{nb(31)} + \beta_{int} * X_{ord(1)} * X_{pois(1)} \\ &+ \beta_{subj1} * X_{ord(1)} * X_{cont(31)} + \beta_{subj2} * X_{pois(1)} * X_{cont(31)} + \beta_{subj3} * X_{ord(1)} * X_{pois(1)} * X_{cont(31)} \\ &+ \beta_{subj4} * X_{ord(1)} * X_{mix(31)} + \beta_{subj5} * X_{pois(1)} * X_{mix(31)} + \beta_{subj6} * X_{ord(1)} * X_{pois(1)} * X_{mix(31)} \\ &+ \beta_{subj7} * X_{ord(1)} * X_{nb(31)} + \beta_{subj8} * X_{pois(1)} * X_{nb(31)} + \beta_{subj9} * X_{ord(1)} * X_{pois(1)} * X_{nb(31)} \\ &+ \beta_{tint1} * X_{ord(1)} * Time_3 + \beta_{tint2} * X_{pois(1)} * Time_3 + \beta_{t} * Time_3 + E_3 \end{split} \tag{3} \end{equation}\]
library("SimRepeat")
library("printr")
library("nlme")
library("reshape2")
options(scipen = 999)

Step 1: Set up parameter inputs

This is the most time-consuming part of the simulation process. It is important to read the function documentation carefully to understand the formats for each parameter input. Incorrect formatting will lead to errors. Most of these can be prevented by using the checkpar function in Step 2.

seed <- 137
n <- 10000
M <- 3

# Ordinal variable
marginal <- lapply(seq_len(M), function(x) list(c(1/3, 2/3)))
support <- lapply(seq_len(M), function(x) list(c(0, 1, 2)))

# Non-mixture continuous variables
method <- "Polynomial"
Stcum1 <- calc_theory("Rice", c(1, 0.5))
Stcum2 <- calc_theory("Rice", c(2, 2))
Stcum3 <- calc_theory("Rice", c(4, 8))

# Error terms
error_type <- "non_mix"
Error1 <- calc_theory("Skewnormal", c(0, 1, 1))
Error2 <- calc_theory("Skewnormal", c(0, 1, 5))
Error3 <- calc_theory("Skewnormal", c(0, 1, 25))
corr.e <- matrix(c(1, 0.4, 0.4^2, 0.4, 1, 0.4, 0.4^2, 0.4, 1), M, M, 
  byrow = TRUE)

skews <- list(c(Stcum1[3], Error1[3]), c(Stcum2[3], Error2[3]), 
  c(Stcum3[3], Error3[3]))
skurts <- list(c(Stcum1[4], Error1[4]), c(Stcum2[4], Error2[4]), 
  c(Stcum3[4], Error3[4]))
fifths <- list(c(Stcum1[5], Error1[5]), c(Stcum2[5], Error2[5]), 
  c(Stcum3[5], Error3[5]))
sixths <- list(c(Stcum1[6], Error1[6]), c(Stcum2[6], Error2[6]), 
  c(Stcum3[6], Error3[6]))
Six <- list(list(0.08, 0.06), list(0.12, NULL), list(0.36, 0.15))

# Mixture continuous variable
mix_pis <- lapply(seq_len(M), function(x) list(c(0.3, 0.7)))
mix_mus <- lapply(seq_len(M), function(x) list(c(-5, 3)))
mix_sigmas <- lapply(seq_len(M), function(x) list(c(2, 1)))
mix_skews <- mix_skurts <- mix_fifths <- mix_sixths <- 
  lapply(seq_len(M), function(x) list(c(0, 0)))
mix_Six <- list()
Nstcum <- calc_mixmoments(mix_pis[[1]][[1]], mix_mus[[1]][[1]], 
  mix_sigmas[[1]][[1]], mix_skews[[1]][[1]], mix_skurts[[1]][[1]], 
  mix_fifths[[1]][[1]], mix_sixths[[1]][[1]])

means <- list(c(Stcum1[1], Nstcum[1], 0),
  c(Stcum2[1], Nstcum[1], 0),
  c(Stcum3[1], Nstcum[1], 0))
vars <- list(c(Stcum1[2]^2, Nstcum[2]^2, Error1[2]^2),
  c(Stcum2[2]^2, Nstcum[2]^2, Error2[2]^2),
  c(Stcum3[2]^2, Nstcum[2]^2, Error3[2]^2))

# Poisson variable
lam <- list(15, 15, 15)
p_zip <- 0.10

# Negative Binomial variables
size <- list(10, 10, 10)
mu <- list(3, 4, 5)
prob <- list()
p_zinb <- 0

# X_ord(1) and X_pois(1) are the same across Y
same.var <- c(1, 5)

# set up X correlation matrix
corr.x <- list()
corr.x[[1]] <- list(matrix(0.4, 6, 6), matrix(0.35, 6, 6), matrix(0.25, 6, 6))
diag(corr.x[[1]][[1]]) <- 1
# set correlations between components of X_mix(11) to 0
corr.x[[1]][[1]][3:4, 3:4] <- diag(2)
# set correlations between time-varying covariates of Y1 and Y2
corr.x[[1]][[2]][2, 2] <- 0.5
corr.x[[1]][[2]][3:4, 3:4] <- matrix(0.4, 2, 2)
corr.x[[1]][[2]][6, 6] <- 0.3
# set correlations between time-varying covariates of Y1 and Y3
corr.x[[1]][[3]][2, 2] <- 0.5^2
corr.x[[1]][[3]][3:4, 3:4] <- matrix(0.4^2, 2, 2)
corr.x[[1]][[3]][6, 6] <- 0.3^2
# set correlations for the same variables equal across outcomes
corr.x[[1]][[2]][, same.var] <- corr.x[[1]][[3]][, same.var] <-
  corr.x[[1]][[1]][, same.var]

corr.x[[2]] <- list(t(corr.x[[1]][[2]]), matrix(0.35, 6, 6), 
  matrix(0.25, 6, 6))
diag(corr.x[[2]][[2]]) <- 1
# set correlations between components of X_mix(21) to 0
corr.x[[2]][[2]][3:4, 3:4] <- diag(2)
# set correlations between time-varying covariates of Y2 and Y3
corr.x[[2]][[3]][2, 2] <- 0.5
corr.x[[2]][[3]][3:4, 3:4] <- matrix(0.4, 2, 2)
corr.x[[2]][[3]][6, 6] <- 0.3
# set correlations for the same variables equal across outcomes
corr.x[[2]][[2]][same.var, ] <- corr.x[[1]][[2]][same.var, ]
corr.x[[2]][[2]][, same.var] <- corr.x[[2]][[3]][, same.var] <- 
  t(corr.x[[1]][[2]][same.var, ])
corr.x[[2]][[3]][same.var, ] <- corr.x[[1]][[3]][same.var, ]

corr.x[[3]] <- list(t(corr.x[[1]][[3]]), t(corr.x[[2]][[3]]), 
  matrix(0.3, 6, 6))
diag(corr.x[[3]][[3]]) <- 1
# set correlations between components of X_mix(31) to 0
corr.x[[3]][[3]][3:4, 3:4] <- diag(2)
# set correlations for the same variables equal across outcomes
corr.x[[3]][[3]][same.var, ] <- corr.x[[1]][[3]][same.var, ]
corr.x[[3]][[3]][, same.var] <- t(corr.x[[3]][[3]][same.var, ])

Time <- 1:M
betas.0 <- 0
betas.t <- 1
# use a list of length 1 so that betas are the same across Y
betas <- list(seq(0.5, 1.5, 0.25))
# interaction between ordinal and Poisson variable, becomes 
# another group-level variable
int.var <- matrix(c(1, 1, 4, 2, 1, 4, 3, 1, 4), 3, 3, byrow = TRUE)
betas.int <- list(0.5)
# continuous non-mixture, continuous mixture, and NB variables are 
# subject-level variables
subj.var <- matrix(c(1, 2, 1, 3, 1, 5, 2, 2, 2, 3, 2, 5, 3, 2, 3, 3, 3, 5), 
  nrow = 9, ncol = 2, byrow = TRUE)
# there are 3 subject-level variables and 3 group-level variables forming 
# 9 group-subject interactions
betas.subj <- list(seq(0.5, 0.5 + (9 - 1) * 0.1, 0.1))
# only ordinal and Poisson variable interact with time (excluding the 
# ordinal-Poisson interaction variable)
tint.var <- matrix(c(1, 1, 1, 4, 2, 1, 2, 4, 3, 1, 3, 4), 6, 2, byrow = TRUE)
betas.tint <- list(c(0.25, 0.5))

Step 2: Check parameter inputs

checkpar(M, method, error_type, means, vars, skews, skurts, fifths, sixths, 
  Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, 
  mix_sixths, mix_Six, marginal, support, lam, p_zip, pois_eps = list(), 
  size, prob, mu, p_zinb, nb_eps = list(), corr.x, corr.yx = list(), corr.e, 
  same.var, subj.var, int.var, tint.var, betas.0, betas, betas.subj, betas.int, 
  betas.t, betas.tint, quiet = TRUE)
## [1] TRUE

Step 3: Generate system

Note that use.nearPD = FALSE and adjgrad = FALSE so that negative eigen-values will be replaced with eigmin (default \(0\)) instead of using the nearest positive-definite matrix (found with Bates and Maechler (2017)’s Matrix::nearPD function by Higham (2002)’s algorithm) or the adjusted gradient updating method via adj_grad (Yin and Zhang 2013; Zhang and Yin Year not provided; Maree 2012).

Sys1 <- corrsys(n, M, Time, method, error_type, means, vars,
  skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews,
  mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal, support, lam, p_zip,
  size, prob, mu, p_zinb, corr.x, corr.e, same.var, subj.var, int.var,
  tint.var, betas.0, betas, betas.subj, betas.int, betas.t, betas.tint,
  seed = seed, use.nearPD = FALSE, quiet = TRUE)
## Total Simulation time: 0.434 minutes
knitr::kable(Sys1$constants[[1]], booktabs = TRUE, 
  caption = "PMT constants for Y_1")
Table 1: PMT constants for Y_1
c0 c1 c2 c3 c4 c5
-0.1150419 1.0356110 0.1172383 -0.0189393 -0.0007322 0.0005196
0.0000000 1.0000000 0.0000000 0.0000000 0.0000000 0.0000000
0.0000000 1.0000000 0.0000000 0.0000000 0.0000000 0.0000000
-0.0234264 0.9949525 0.0242231 0.0014391 -0.0002656 0.0000140
Sys1$valid.pdf
## [[1]]
## [1] "TRUE" "TRUE" "TRUE" "TRUE"
## 
## [[2]]
## [1] "TRUE" "TRUE" "TRUE" "TRUE"
## 
## [[3]]
## [1] "TRUE" "TRUE" "TRUE" "TRUE"

Step 4: Describe results

Sum1 <- summary_sys(Sys1$Y, Sys1$E, E_mix = NULL, Sys1$X, Sys1$X_all, M, 
  method, means, vars, skews, skurts, fifths, sixths, mix_pis, mix_mus, 
  mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, marginal, 
  support, lam, p_zip, size, prob, mu, p_zinb, corr.x, corr.e)
names(Sum1)
##  [1] "cont_sum_y"   "rho.y"        "cont_sum_e"   "target_sum_e"
##  [5] "rho.e"        "rho.ye"       "ord_sum_x"    "cont_sum_x"  
##  [9] "target_sum_x" "sum_xall"     "mix_sum_x"    "target_mix_x"
## [13] "pois_sum_x"   "nb_sum_x"     "rho.x"        "rho.xall"    
## [17] "rho.yx"       "rho.yxall"    "maxerr"
knitr::kable(Sum1$cont_sum_y, digits = 3, booktabs = TRUE, 
  caption = "Simulated Distributions of Outcomes")
Table 2: Simulated Distributions of Outcomes
Outcome N Mean SD Median Min Max Skew Skurtosis Fifth Sixth
Y1 1 10000 233.654 261.038 156.228 -310.398 1828.218 1.470 2.717 5.195 7.541
Y2 2 10000 318.684 316.982 226.806 -356.880 2589.484 1.498 2.968 6.973 17.490
Y3 3 10000 484.949 419.921 368.659 -183.236 2834.555 1.210 1.466 0.733 -5.194
knitr::kable(Sum1$target_sum_e, digits = 3, booktabs = TRUE, 
  caption = "Target Distributions of Error Terms")
Table 3: Target Distributions of Error Terms
Outcome Mean SD Skew Skurtosis Fifth Sixth
E1 1 0 0.826 0.137 0.062 -0.002 -0.060
E2 2 0 0.623 0.851 0.705 -0.043 -2.326
E3 3 0 0.604 0.989 0.862 -0.055 -3.140
knitr::kable(Sum1$cont_sum_e, digits = 3, booktabs = TRUE, 
  caption = "Simulated Distributions of Error Terms")
Table 4: Simulated Distributions of Error Terms
Outcome N Mean SD Median Min Max Skew Skurtosis Fifth Sixth
E1 1 10000 0 0.825 -0.023 -3.214 2.909 0.100 0.012 -0.217 -0.238
E2 2 10000 0 0.628 -0.106 -1.720 3.311 0.893 0.747 -0.223 -3.786
E3 3 10000 0 0.605 -0.124 -1.003 4.680 1.011 1.095 1.942 11.806
knitr::kable(Sum1$target_sum_x, digits = 3, booktabs = TRUE, 
  caption = "Target Distributions of Continuous Non-Mixture and Components of 
  Mixture Variables")
Table 5: Target Distributions of Continuous Non-Mixture and Components of Mixture Variables
Outcome X Mean SD Skew Skurtosis Fifth Sixth
cont1_1 1 1 1.330 0.693 0.618 0.210 -0.365 -0.879
cont1_2 1 2 -5.000 2.000 0.000 0.000 0.000 0.000
cont1_3 1 3 3.000 1.000 0.000 0.000 0.000 0.000
cont2_1 2 1 3.097 1.552 0.517 0.015 -0.487 -0.415
cont2_2 2 2 -5.000 2.000 0.000 0.000 0.000 0.000
cont2_3 2 3 3.000 1.000 0.000 0.000 0.000 0.000
cont3_1 3 1 9.090 3.658 0.210 -0.185 -0.024 0.416
cont3_2 3 2 -5.000 2.000 0.000 0.000 0.000 0.000
cont3_3 3 3 3.000 1.000 0.000 0.000 0.000 0.000
knitr::kable(Sum1$cont_sum_x, digits = 3, booktabs = TRUE, 
  caption = "Simulated Distributions of Continuous Non-Mixture and Components 
  of Mixture Variables")
Table 6: Simulated Distributions of Continuous Non-Mixture and Components of Mixture Variables
Outcome X N Mean SD Median Min Max Skew Skurtosis Fifth Sixth
cont1_1 1 1 10000 1.330 0.695 1.238 0.032 4.872 0.656 0.339 -0.124 -0.854
cont1_2 1 2 10000 -5.000 2.002 -4.978 -12.599 3.311 -0.029 0.024 0.116 0.014
cont1_3 1 3 10000 3.000 1.001 2.999 -0.641 7.127 0.014 -0.025 0.082 -0.129
cont2_1 2 1 10000 3.097 1.550 2.951 0.076 9.922 0.518 0.090 -0.227 -0.136
cont2_2 2 2 10000 -5.000 2.003 -5.019 -12.791 2.685 0.017 -0.024 -0.086 0.089
cont2_3 2 3 10000 3.000 1.002 3.001 -0.939 7.372 0.024 0.031 -0.151 0.093
cont3_1 3 1 10000 9.089 3.659 8.950 -0.585 23.360 0.225 -0.159 -0.087 0.426
cont3_2 3 2 10000 -5.000 2.002 -5.012 -12.746 2.827 0.031 0.029 -0.014 0.183
cont3_3 3 3 10000 3.000 1.001 2.998 -0.593 6.988 0.023 -0.036 -0.060 -0.006
knitr::kable(Sum1$target_mix_x, digits = 3, booktabs = TRUE, 
  caption = "Target Distributions of Continuous Mixture Variables")
Table 7: Target Distributions of Continuous Mixture Variables
Outcome X Mean SD Skew Skurtosis Fifth Sixth
mix1_1 1 1 0.6 3.917 -0.967 -0.515 5.351 -7.024
mix2_1 2 1 0.6 3.917 -0.967 -0.515 5.351 -7.024
mix3_1 3 1 0.6 3.917 -0.967 -0.515 5.351 -7.024
knitr::kable(Sum1$mix_sum_x, digits = 3, booktabs = TRUE, 
  caption = "Simulated Distributions of Continuous Mixture Variables")
Table 8: Simulated Distributions of Continuous Mixture Variables
Outcome X N Mean SD Median Min Max Skew Skurtosis Fifth Sixth
mix1_1 1 1 10000 0.6 3.916 2.432 -12.143 6.780 -0.976 -0.496 5.380 -7.343
mix2_1 2 1 10000 0.6 3.916 2.428 -11.481 6.603 -0.978 -0.501 5.433 -7.456
mix3_1 3 1 10000 0.6 3.916 2.451 -12.219 6.968 -0.938 -0.587 5.286 -6.002
Nplot <- plot_simpdf_theory(sim_y = Sys1$X_all[[1]][, 3], ylower = -10, 
  yupper = 10, 
  title = "PDF of X_mix(21): N(-5, 2) and N(3, 1) Mixture",
  fx = function(x) mix_pis[[1]][[1]][1] * dnorm(x, mix_mus[[1]][[1]][1], 
    mix_sigmas[[1]][[1]][1]) + mix_pis[[1]][[1]][2] * 
    dnorm(x, mix_mus[[1]][[1]][2], mix_sigmas[[1]][[1]][2]), 
  lower = -Inf, upper = Inf)
Nplot

Summary of Ordinal Variable: (for \(Y_1\))

knitr::kable(Sum1$ord_sum_x[[1]][1:2, ], digits = 3, row.names = FALSE,
             booktabs = TRUE, caption = "Simulated Distribution of X_ord(1)")
Table 9: Simulated Distribution of X_ord(1)
Outcome Support Target Simulated
1 0 0.333 0.334
1 1 0.667 0.667

Summary of Poisson Variable:

knitr::kable(Sum1$pois_sum_x, digits = 3, row.names = FALSE,
             booktabs = TRUE, caption = "Simulated Distribution of X_pois(1)")
Table 10: Simulated Distribution of X_pois(1)
Outcome X N P0 Exp_P0 Mean Exp_Mean Var Exp_Var Median Min Max Skew Skurtosis
1 1 10000 0.101 0.1 13.495 13.5 33.785 40 14 0 35 -0.869 0.697
2 1 10000 0.101 0.1 13.495 13.5 33.785 40 14 0 35 -0.869 0.697
3 1 10000 0.101 0.1 13.495 13.5 33.785 40 14 0 35 -0.869 0.697
Pplot <- plot_simpdf_theory(sim_y = Sys1$X_all[[1]][, 4], 
  title = "PMF of X_pois(1): Zero-Inflated Poisson Distribution", 
  Dist = "Poisson", params = c(lam[[1]][1], p_zip), cont_var = FALSE)
Pplot

Summary of Negative Binomial Variables \(X_{nb(11)}, X_{nb(21)},\) and \(X_{nb(31)}\):

knitr::kable(Sum1$nb_sum_x, digits = 3, row.names = FALSE,
             booktabs = TRUE, caption = "Simulated Distributions")
Table 11: Simulated Distributions
Outcome X N P0 Exp_P0 Prob Mean Exp_Mean Var Exp_Var Median Min Max Skew Skurtosis
1 1 10000 0.069 0.073 0.769 3.001 3 3.903 3.9 3 0 14 0.808 0.770
2 1 10000 0.034 0.035 0.714 4.003 4 5.654 5.6 4 0 19 0.839 1.182
3 1 10000 0.017 0.017 0.667 5.004 5 7.452 7.5 5 0 20 0.727 0.698
NBplot <- plot_simtheory(sim_y = Sys1$X_all[[1]][, 5], binwidth = 0.5, 
  title = "Simulated Values for X_nb(11)", Dist = "Negative_Binomial", 
  params = c(size[[1]][1], mu[[1]][1], p_zinb), cont_var = FALSE)
NBplot

Maximum Correlation Errors for X Variables by Outcome:

maxerr <- do.call(rbind, Sum1$maxerr)
rownames(maxerr) <- colnames(maxerr) <- paste("Y", 1:M, sep = "")
knitr::kable(as.data.frame(maxerr), digits = 5, booktabs = TRUE, 
  caption = "Maximum Correlation Errors for X Variables")
Table 12: Maximum Correlation Errors for X Variables
Y1 Y2 Y3
Y1 0.00709 0.00804 0.00657
Y2 0.00804 0.00804 0.00804
Y3 0.00657 0.00804 0.00657

Linear model

A linear model will be fit to the data using glm in order to see if the slope coefficients can be recovered (R Core Team 2017). First, the data is reshaped into long format using reshape2::melt (Wickham 2007). Note that since \(X_{ord(1)}\) and \(X_{pois(1)}\) are the same for each outcome, they will be used as factors (id.vars) and are only needed once.

data1 <- as.data.frame(cbind(factor(1:n), Sys1$Y, Sys1$X_all[[1]][, 1:5],
  Sys1$X_all[[2]][, c(2, 3, 5)], Sys1$X_all[[3]][, c(2, 3, 5)]))
colnames(data1)[1] <- "Subject"
data1.a <- melt(data1[, c("Subject", "ord1_1", "pois1_1", "Y1", "Y2", "Y3")], 
  id.vars = c("Subject", "ord1_1", "pois1_1"),
  measure.vars = c("Y1", "Y2", "Y3"), variable.name = "Time", value.name = "Y")
data1.b <- melt(data1[, c("Subject", "cont1_1", "cont2_1", "cont3_1")],
  id.vars = c("Subject"), variable.name = "Time", value.name = "cont1")
data1.c <- melt(data1[, c("Subject", "mix1_1", "mix2_1", "mix3_1")],
  id.vars = c("Subject"), variable.name = "Time", value.name = "mix1")
data1.d <- melt(data1[, c("Subject", "nb1_1", "nb2_1", "nb3_1")],
  id.vars = c("Subject"), variable.name = "Time", value.name = "nb1")
data1.a$Time <- data1.b$Time <- data1.c$Time <- data1.d$Time <- 
  c(rep(1, n), rep(2, n), rep(3, n))
data1 <- merge(merge(merge(data1.a, data1.b, by = c("Subject", "Time")), 
  data1.c, by = c("Subject", "Time")), data1.d, by = c("Subject", "Time"))

Errors \(E_1, E_2,\) and \(E_3\) modeled as having Normal distributions:

fm1 <- glm(Y ~ ord1_1 + cont1 + mix1 + pois1_1 + nb1 + ord1_1:pois1_1 + 
  ord1_1:cont1 + pois1_1:cont1 + ord1_1:pois1_1:cont1 + 
  ord1_1:mix1 + pois1_1:mix1 + ord1_1:pois1_1:mix1 + 
  ord1_1:nb1 + pois1_1:nb1 + ord1_1:pois1_1:nb1 + 
  Time + ord1_1:Time + pois1_1:Time, data = data1)
summary(fm1)
## 
## Call:
## glm(formula = Y ~ ord1_1 + cont1 + mix1 + pois1_1 + nb1 + ord1_1:pois1_1 + 
##     ord1_1:cont1 + pois1_1:cont1 + ord1_1:pois1_1:cont1 + ord1_1:mix1 + 
##     pois1_1:mix1 + ord1_1:pois1_1:mix1 + ord1_1:nb1 + pois1_1:nb1 + 
##     ord1_1:pois1_1:nb1 + Time + ord1_1:Time + pois1_1:Time, data = data1)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.2143  -0.4958  -0.0895   0.4241   4.6031  
## 
## Coefficients:
##                       Estimate Std. Error  t value            Pr(>|t|)    
## (Intercept)          0.0727547  0.0343919    2.115              0.0344 *  
## ord1_1               0.4579595  0.0286148   16.004 <0.0000000000000002 ***
## cont1                0.7589358  0.0053528  141.784 <0.0000000000000002 ***
## mix1                 1.0023232  0.0031825  314.947 <0.0000000000000002 ***
## pois1_1              1.2429948  0.0026613  467.061 <0.0000000000000002 ***
## nb1                  1.4845177  0.0069904  212.366 <0.0000000000000002 ***
## Time                 0.9751990  0.0197703   49.326 <0.0000000000000002 ***
## ord1_1:pois1_1       0.5039399  0.0016936  297.548 <0.0000000000000002 ***
## ord1_1:cont1         0.4993920  0.0041619  119.993 <0.0000000000000002 ***
## cont1:pois1_1        0.5993862  0.0003974 1508.159 <0.0000000000000002 ***
## ord1_1:mix1          0.8007743  0.0033872  236.415 <0.0000000000000002 ***
## mix1:pois1_1         0.9000054  0.0002551 3527.743 <0.0000000000000002 ***
## ord1_1:nb1           1.1045040  0.0060593  182.284 <0.0000000000000002 ***
## pois1_1:nb1          1.2015590  0.0005210 2306.090 <0.0000000000000002 ***
## ord1_1:Time          0.2575984  0.0107161   24.038 <0.0000000000000002 ***
## pois1_1:Time         0.5015038  0.0014760  339.782 <0.0000000000000002 ***
## ord1_1:cont1:pois1_1 0.6999753  0.0002444 2864.155 <0.0000000000000002 ***
## ord1_1:mix1:pois1_1  0.9999631  0.0002245 4454.503 <0.0000000000000002 ***
## ord1_1:pois1_1:nb1   1.2993227  0.0003873 3355.088 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.4800393)
## 
##     Null deviance: 3776265562  on 29999  degrees of freedom
## Residual deviance:      14392  on 29981  degrees of freedom
## AIC: 63141
## 
## Number of Fisher Scoring iterations: 2

Each effect in the model was found to be statistically significant at the \(\alpha = 0.001\) level. Now, compare betas used in simulation to those returned by glm:

fm1.coef <- fm1$coefficients[c("(Intercept)", "ord1_1", "cont1", "mix1", 
  "pois1_1", "nb1", "ord1_1:pois1_1", "Time", "ord1_1:cont1", "cont1:pois1_1", 
  "ord1_1:cont1:pois1_1", "ord1_1:mix1", "mix1:pois1_1", 
  "ord1_1:mix1:pois1_1", "ord1_1:nb1", "pois1_1:nb1", 
  "ord1_1:pois1_1:nb1", "ord1_1:Time", "pois1_1:Time")]
coef <- rbind(c(betas.0, betas[[1]], betas.int[[1]], betas.t, 
  betas.subj[[1]], betas.tint[[1]]), fm1.coef)
colnames(coef) <- names(fm1.coef)
rownames(coef) <- c("Simulated", "Estimated")
knitr::kable(as.data.frame(coef[, 1:6]), digits = 3, booktabs = TRUE, 
  caption = "Beta Coefficients for Repeated Measures Model 1")
Table 13: Beta Coefficients for Repeated Measures Model 1
(Intercept) ord1_1 cont1 mix1 pois1_1 nb1
Simulated 0.000 0.500 0.750 1.000 1.250 1.500
Estimated 0.073 0.458 0.759 1.002 1.243 1.485
knitr::kable(as.data.frame(coef[, 7:12]), digits = 3, booktabs = TRUE)
ord1_1:pois1_1 Time ord1_1:cont1 cont1:pois1_1 ord1_1:cont1:pois1_1 ord1_1:mix1
Simulated 0.500 1.000 0.500 0.600 0.7 0.800
Estimated 0.504 0.975 0.499 0.599 0.7 0.801
knitr::kable(as.data.frame(coef[, 13:19]), digits = 3, booktabs = TRUE)
mix1:pois1_1 ord1_1:mix1:pois1_1 ord1_1:nb1 pois1_1:nb1 ord1_1:pois1_1:nb1 ord1_1:Time pois1_1:Time
Simulated 0.9 1 1.100 1.200 1.300 0.250 0.500
Estimated 0.9 1 1.105 1.202 1.299 0.258 0.502

All of the slope coefficients are estimated well.

Example 2: System from Example 1 with no independent variables for \(Y_2\)

This example uses the corrsys2 function which employs correlation method 2. It requires the additional parameters pois_eps and nb_eps, which default to \(0.0001\) for each variable.

Step 1: Set up parameter inputs

seed <- 137
n <- 10000
M <- 3

# Ordinal variable
marginal <- list(list(c(1/3, 2/3)), NULL, list(c(1/3, 2/3)))
support <- list(list(c(0, 1, 2)), NULL, list(c(0, 1, 2)))

# Non-mixture continuous variables
skews <- list(c(Stcum1[3], Error1[3]), Error2[3], 
  c(Stcum3[3], Error3[3]))
skurts <- list(c(Stcum1[4], Error1[4]), Error2[4],
  c(Stcum3[4], Error3[4]))
fifths <- list(c(Stcum1[5], Error1[5]), Error2[5],
  c(Stcum3[5], Error3[5]))
sixths <- list(c(Stcum1[6], Error1[6]), Error2[6], 
  c(Stcum3[6], Error3[6]))
Six <- list(list(0.08, 0.06), NULL, list(0.36, 0.15))

# Mixture continuous variable
mix_pis <- list(list(c(0.3, 0.7)), NULL, list(c(0.3, 0.7)))
mix_mus <- list(list(c(-5, 3)), NULL, list(c(-5, 3)))
mix_sigmas <- list(list(c(2, 1)), NULL, list(c(2, 1)))
mix_skews <- mix_skurts <- mix_fifths <- mix_sixths <- 
  list(list(c(0, 0)), NULL, list(c(0, 0)))
mix_Six <- list()

means <- list(c(Stcum1[1], Nstcum[1], Error1[1]), Error2[1], 
  c(Stcum3[1], Nstcum[1], Error3[1]))
vars <- list(c(Stcum1[2]^2, Nstcum[2]^2, Error1[2]^2), Error2[2]^2,
  c(Stcum3[2]^2, Nstcum[2]^2, Error3[2]^2))

# Poisson variable
lam <- list(15, NULL, 15)
p_zip <- 0.10

# Negative Binomial variables
size <- list(10, NULL, 10)
mu <- list(3, NULL, 5)
prob <- list()
p_zinb <- 0

# X_ord(1) and X_pois(1) are the same for Y_1 and Y_3
same.var <- matrix(c(1, 1, 3, 1, 1, 5, 3, 5), 2, 4, byrow = TRUE)

# set up X correlation matrix
corr.x <- list()
corr.x[[1]] <- list(matrix(0.4, 6, 6), NULL, matrix(0.25, 6, 6))
diag(corr.x[[1]][[1]]) <- 1
# set correlations between components of X_mix(11) to 0
corr.x[[1]][[1]][3:4, 3:4] <- diag(2)
# set correlations between time-varying covariates of Y1 and Y3
corr.x[[1]][[3]][2, 2] <- 0.5^2
corr.x[[1]][[3]][3:4, 3:4] <- matrix(0.4^2, 2, 2)
corr.x[[1]][[3]][6, 6] <- 0.3^2
# set correlations for the same variables equal across outcomes
corr.x[[1]][[3]][, c(1, 5)] <- corr.x[[1]][[1]][, c(1, 5)]

corr.x[[3]] <- list(t(corr.x[[1]][[3]]), NULL, matrix(0.3, 6, 6))
diag(corr.x[[3]][[3]]) <- 1
# set correlations between components of X_mix(31) to 0
corr.x[[3]][[3]][3:4, 3:4] <- diag(2)
# set correlations for the same variables equal across outcomes
corr.x[[3]][[3]][c(1, 5), ] <- corr.x[[1]][[3]][c(1, 5), ]
corr.x[[3]][[3]][, c(1, 5)] <- t(corr.x[[3]][[3]][c(1, 5), ])

Time <- 1:M
betas.0 <- 0
betas.t <- 1
betas <- list(seq(0.5, 1.5, 0.25), NULL, seq(0.5, 1.5, 0.25))
# interaction between ordinal and Poisson variable, becomes 
# another group-level variable
int.var <- matrix(c(1, 1, 4, 3, 1, 4), 2, 3, byrow = TRUE)
betas.int <- list(0.5, NULL, 0.5)
# continuous non-mixture, continuous mixture, and NB variables are 
# subject-level variables
subj.var <- matrix(c(1, 2, 1, 3, 1, 5, 3, 2, 3, 3, 3, 5), 
  nrow = 6, ncol = 2, byrow = TRUE)
# there are 3 subject-level variables and 3 group-level variables forming 
# 9 group-subject interactions
betas.subj <- list(seq(0.5, 0.5 + (9 - 1) * 0.1, 0.1), NULL, 
  seq(0.5, 0.5 + (9 - 1) * 0.1, 0.1))
# only ordinal and Poisson variable interact with time (excluding the 
# ordinal-Poisson interaction variable)
tint.var <- matrix(c(1, 1, 1, 4, 3, 1, 3, 4), 4, 2, byrow = TRUE)
betas.tint <- list(c(0.25, 0.5), NULL, c(0.25, 0.5))

Step 2: Check parameter inputs

checkpar(M, method, error_type, means, vars, skews, skurts, fifths, sixths, 
  Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, 
  mix_sixths, mix_Six, marginal, support, lam, p_zip, pois_eps = list(), 
  size, prob, mu, p_zinb, nb_eps = list(), corr.x, corr.yx = list(), corr.e, 
  same.var, subj.var, int.var, tint.var, betas.0, betas, betas.subj, betas.int, 
  betas.t, betas.tint, quiet = TRUE)
## [1] TRUE

Step 3: Generate system

Sys2 <- corrsys2(n, M, Time, method, error_type, means, vars,
  skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews,
  mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal, support, lam, p_zip,
  pois_eps = list(), size, prob, mu, p_zinb, nb_eps = list(), corr.x, corr.e, 
  same.var, subj.var, int.var, tint.var, betas.0, betas, betas.subj, betas.int, 
  betas.t, betas.tint, seed = seed, use.nearPD = FALSE, quiet = TRUE)
## Total Simulation time: 0.235 minutes

Step 4: Describe results

Sum2 <- summary_sys(Sys2$Y, Sys2$E, E_mix = NULL, Sys2$X, Sys2$X_all, M, 
  method, means, vars, skews, skurts, fifths, sixths, mix_pis, mix_mus, 
  mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, marginal, 
  support, lam, p_zip, size, prob, mu, p_zinb, corr.x, corr.e)
names(Sum2)
##  [1] "cont_sum_y"   "rho.y"        "cont_sum_e"   "target_sum_e"
##  [5] "rho.e"        "rho.ye"       "ord_sum_x"    "cont_sum_x"  
##  [9] "target_sum_x" "sum_xall"     "mix_sum_x"    "target_mix_x"
## [13] "pois_sum_x"   "nb_sum_x"     "rho.x"        "rho.xall"    
## [17] "rho.yx"       "rho.yxall"    "maxerr"
knitr::kable(Sum2$cont_sum_y, digits = 3, booktabs = TRUE, 
  caption = "Simulated Distributions of Outcomes")
Table 14: Simulated Distributions of Outcomes
Outcome N Mean SD Median Min Max Skew Skurtosis Fifth Sixth
Y1 1 10000 235.280 262.080 156.027 -339.268 1924.387 1.474 2.780 5.545 8.919
Y2 2 10000 2.782 0.627 2.675 1.298 6.262 0.905 0.905 0.674 -0.150
Y3 3 10000 485.996 423.620 374.110 -213.943 3284.809 1.299 1.918 2.365 -0.580
knitr::kable(Sum2$target_sum_e, digits = 3, booktabs = TRUE, 
  caption = "Target Distributions of Error Terms")
Table 15: Target Distributions of Error Terms
Outcome Mean SD Skew Skurtosis Fifth Sixth
E1 1 0.564 0.826 0.137 0.062 -0.002 -0.060
E2 2 0.782 0.623 0.851 0.705 -0.043 -2.326
E3 3 0.797 0.604 0.989 0.862 -0.055 -3.140
knitr::kable(Sum2$cont_sum_e, digits = 3, booktabs = TRUE, 
  caption = "Simulated Distributions of Error Terms")
Table 16: Simulated Distributions of Error Terms
Outcome N Mean SD Median Min Max Skew Skurtosis Fifth Sixth
E1 1 10000 0.564 0.825 0.546 -2.753 3.830 0.120 0.027 -0.237 -0.358
E2 2 10000 0.782 0.627 0.675 -0.702 4.262 0.905 0.905 0.674 -0.150
E3 3 10000 0.797 0.603 0.678 -0.165 5.674 1.020 1.251 2.965 17.647
knitr::kable(Sum2$target_sum_x, digits = 3, booktabs = TRUE, 
  caption = "Target Distributions of Continuous Non-Mixture and Components of 
  Mixture Variables")
Table 17: Target Distributions of Continuous Non-Mixture and Components of Mixture Variables
Outcome X Mean SD Skew Skurtosis Fifth Sixth
cont1_1 1 1 1.33 0.693 0.618 0.210 -0.365 -0.879
cont1_2 1 2 -5.00 2.000 0.000 0.000 0.000 0.000
cont1_3 1 3 3.00 1.000 0.000 0.000 0.000 0.000
cont3_1 3 1 9.09 3.658 0.210 -0.185 -0.024 0.416
cont3_2 3 2 -5.00 2.000 0.000 0.000 0.000 0.000
cont3_3 3 3 3.00 1.000 0.000 0.000 0.000 0.000
knitr::kable(Sum2$cont_sum_x, digits = 3, booktabs = TRUE, 
  caption = "Simulated Distributions of Continuous Non-Mixture and Components 
  of Mixture Variables")
Table 18: Simulated Distributions of Continuous Non-Mixture and Components of Mixture Variables
Outcome X N Mean SD Median Min Max Skew Skurtosis Fifth Sixth
cont1_1 1 1 10000 1.330 0.692 1.252 0.027 4.233 0.616 0.164 -0.714 -2.218
cont1_2 1 2 10000 -5.000 2.000 -4.984 -12.661 3.968 0.008 -0.033 0.077 0.233
cont1_3 1 3 10000 3.000 1.000 3.001 -1.087 6.647 -0.017 0.041 -0.125 -0.094
cont3_1 3 1 10000 9.089 3.660 9.012 -0.245 24.184 0.205 -0.195 -0.008 0.471
cont3_2 3 2 10000 -5.000 2.000 -5.011 -12.762 2.251 0.008 -0.013 -0.051 -0.158
cont3_3 3 3 10000 3.000 1.000 3.008 -1.150 6.927 0.001 0.053 -0.008 -0.163
knitr::kable(Sum2$target_mix_x, digits = 3, booktabs = TRUE, 
  caption = "Target Distributions of Continuous Mixture Variables")
Table 19: Target Distributions of Continuous Mixture Variables
Outcome X Mean SD Skew Skurtosis Fifth Sixth
mix1_1 1 1 0.6 3.917 -0.967 -0.515 5.351 -7.024
mix3_1 3 1 0.6 3.917 -0.967 -0.515 5.351 -7.024
knitr::kable(Sum2$mix_sum_x, digits = 3, booktabs = TRUE, 
  caption = "Simulated Distributions of Continuous Mixture Variables")
Table 20: Simulated Distributions of Continuous Mixture Variables
Outcome X N Mean SD Median Min Max Skew Skurtosis Fifth Sixth
mix1_1 1 1 10000 0.6 3.916 2.441 -11.613 6.645 -0.969 -0.527 5.423 -7.101
mix3_1 3 1 10000 0.6 3.916 2.425 -12.392 6.903 -0.981 -0.482 5.391 -7.545

Summary of Ordinal Variable: (for \(Y_1\))

knitr::kable(Sum2$ord_sum_x[[1]][1:2, ], digits = 3, row.names = FALSE,
  booktabs = TRUE, caption = "Simulated Distribution of X_ord(1)")
Table 21: Simulated Distribution of X_ord(1)
Outcome Support Target Simulated
1 0 0.333 0.336
1 1 0.667 0.669

Summary of Poisson Variable:

knitr::kable(Sum2$pois_sum_x, digits = 3, row.names = FALSE,
  booktabs = TRUE, caption = "Simulated Distribution of X_pois(1)")
Table 22: Simulated Distribution of X_pois(1)
Outcome X N P0 Exp_P0 Mean Exp_Mean Var Exp_Var Median Min Max Skew Skurtosis
1 1 10000 0.096 0.1 13.525 13.5 33.205 40 14 0 30 -0.839 0.748
3 1 10000 0.096 0.1 13.525 13.5 33.205 40 14 0 30 -0.839 0.748

Summary of Negative Binomial Variables \(X_{nb(11)}, X_{nb(21)},\) and \(X_{nb(31)}\):

knitr::kable(Sum2$nb_sum_x, digits = 3, row.names = FALSE,
  booktabs = TRUE, caption = "Simulated Distributions")
Table 23: Simulated Distributions
Outcome X N P0 Exp_P0 Prob Mean Exp_Mean Var Exp_Var Median Min Max Skew Skurtosis
1 1 10000 0.072 0.073 0.769 3.009 3 3.966 3.9 3 0 14 0.848 0.984
3 1 10000 0.018 0.017 0.667 5.003 5 7.601 7.5 5 0 21 0.797 0.984

Maximum Correlation Errors for X Variables by Outcome:

maxerr <- rbind(Sum2$maxerr[[1]][-2], Sum2$maxerr[[3]][-2])
rownames(maxerr) <- colnames(maxerr) <- c("Y1", "Y3")
knitr::kable(as.data.frame(maxerr), digits = 5, booktabs = TRUE, 
  caption = "Maximum Correlation Errors for X Variables")
Table 24: Maximum Correlation Errors for X Variables
Y1 Y3
Y1 0.00636 0.00614
Y3 0.00614 0.00614

Example 3: System of 4 equations with random intercept and random slope for time

\[\begin{equation} \begin{split} Y_1 &= \beta_0 + \beta_1 * X_{O1} + \beta_2 * X_{C11} +\beta_{\rm{subj1}} * X_{O1} * X_{C11} + \beta_{\rm{tint1}} * X_{O1} * Time_1 + \beta_{\rm{t}} * Time_1 + U_0 + U_1 * Time_1 + E_1 \\ Y_2 &= \beta_0 + \beta_1 * X_{O1} + \beta_2 * X_{C21} +\beta_{\rm{subj1}} * X_{O1} * X_{C21} + \beta_{\rm{tint1}} * X_{O1} * Time_2 + \beta_{\rm{t}} * Time_2 + U_0 + U_1 * Time_2 + E_2 \\ Y_3 &= \beta_0 + \beta_1 * X_{O1} + \beta_2 * X_{C31} +\beta_{\rm{subj1}} * X_{O1} * X_{C31} + \beta_{\rm{tint1}} * X_{O1} * Time_3 + \beta_{\rm{t}} * Time_3 + U_0 + U_1 * Time_3 + E_3 \\ Y_4 &= \beta_0 + \beta_1 * X_{O1} + \beta_2 * X_{C41} +\beta_{\rm{subj1}} * X_{O1} * X_{C41} + \beta_{\rm{tint1}} * X_{O1} * Time_4 + \beta_{\rm{t}} * Time_4 + U_0 + U_1 * Time_4 + E_4 \end{split} \tag{2} \end{equation}\]

Description of Variables

  1. Ordinal variable \(X_{O1}\), where \(\Pr[X_{O1} = 0] = 0.2\), \(\Pr[X_{O1} = 1] = 0.35\), and \(\Pr[X_{O1} = 2] = 0.45\), is a group-level variable and is static across equations.
  2. Continuous non-mixture variable \(X_{C1}\) is a subject-level variable with a Logistic(0, 1) distribution, which requires a sixth cumulant correction of 1.75.
  3. \(X\) terms are correlated at 0.1 within an equation and have an AR(1) structure across equations. The correlations for the static variable are held constant across equations.
  4. Random intercept \(U_0\) and time slope \(U_1\) with Normal(0, 1) distributions. Correlation between random effects is 0.3.
  5. The error terms have \(t\)(10) distributions (mean 0, variance 1) and an AR(1, 0.4) correlation structure.

In this example, the random intercept and time slope have continuous non-mixture distributions for all \(Y\). However, the functions corrsys and corrsys2 permit a combination of none, non-mixture, and mixture distributions across the \(Y\) (i.e., if rand.int = c("non_mix", "mix", "none") then the random intercept for \(Y_1\) has a non-mixture, and the random intercept for \(Y_2\) has a mixture distribution; there is no random intercept for \(Y_3\)). In addition, the distributions themselves can vary across outcomes. This is also true for random effects assigned to independent variables as specified in rand.var.

Step 1: Set up parameter inputs

seed <- 1
n <- 10000
M <- 4

# Binary variable
marginal <- lapply(seq_len(M), function(x) list(c(0.2, 0.55)))
support <- lapply(seq_len(M), function(x) list(0:2))

same.var <- 1
subj.var <- matrix(c(1, 2, 2, 2, 3, 2, 4, 2), 4, 2, byrow = TRUE)

# create list of X correlation matrices
corr.x <- list()

rho1 <- 0.1
rho2 <- 0.5
rho3 <- rho2^2
rho4 <- rho2^3
# Y_1
corr.x[[1]] <- list(matrix(rho1, 2, 2), matrix(rho2, 2, 2), matrix(rho3, 2, 2),
  matrix(rho4, 2, 2))
diag(corr.x[[1]][[1]]) <- 1
# set correlations for the same variables equal across outcomes
corr.x[[1]][[2]][, same.var] <- corr.x[[1]][[3]][, same.var] <-
  corr.x[[1]][[4]][, same.var] <- corr.x[[1]][[1]][, same.var]

# Y_2
corr.x[[2]] <- list(t(corr.x[[1]][[2]]), matrix(rho1, 2, 2),
  matrix(rho2, 2, 2), matrix(rho3, 2, 2))
diag(corr.x[[2]][[2]]) <- 1
# set correlations for the same variables equal across outcomes
corr.x[[2]][[2]][same.var, ] <- corr.x[[1]][[2]][same.var, ]
corr.x[[2]][[2]][, same.var] <- corr.x[[2]][[3]][, same.var] <-
  corr.x[[2]][[4]][, same.var] <- t(corr.x[[1]][[2]][same.var, ])
corr.x[[2]][[3]][same.var, ] <- corr.x[[1]][[3]][same.var, ]
corr.x[[2]][[4]][same.var, ] <- corr.x[[1]][[4]][same.var, ]

# Y_3
corr.x[[3]] <- list(t(corr.x[[1]][[3]]), t(corr.x[[2]][[3]]),
  matrix(rho1, 2, 2), matrix(rho2, 2, 2))
diag(corr.x[[3]][[3]]) <- 1
# set correlations for the same variables equal across outcomes
corr.x[[3]][[3]][same.var, ] <- corr.x[[1]][[3]][same.var, ]
corr.x[[3]][[3]][, same.var] <- t(corr.x[[3]][[3]][same.var, ])
corr.x[[3]][[4]][same.var, ] <- corr.x[[1]][[4]][same.var, ]
corr.x[[3]][[4]][, same.var] <- t(corr.x[[1]][[3]][same.var, ])

# Y_4
corr.x[[4]] <- list(t(corr.x[[1]][[4]]), t(corr.x[[2]][[4]]),
  t(corr.x[[3]][[4]]), matrix(rho1, 2, 2))
diag(corr.x[[4]][[4]]) <- 1
# set correlations for the same variables equal across outcomes
corr.x[[4]][[4]][same.var, ] <- corr.x[[1]][[4]][same.var, ]
corr.x[[4]][[4]][, same.var] <- t(corr.x[[4]][[4]][same.var, ])

# create error term correlation matrix
corr.e <- matrix(c(1, 0.4, 0.4^2, 0.4^3,
                   0.4, 1, 0.4, 0.4^2,
                   0.4^2, 0.4, 1, 0.4,
                   0.4^3, 0.4^2, 0.4, 1), M, M, byrow = TRUE)

Log <- calc_theory("Logistic", c(0, 1))
t10 <- calc_theory("t", 10)

# Continuous variables: 1st non-mixture, 2nd error terms
means <- lapply(seq_len(M), function(x) c(Log[1], 0))
vars <- lapply(seq_len(M), function(x) c(Log[2]^2, 1))
skews <- lapply(seq_len(M), function(x) c(Log[3], t10[3]))
skurts <- lapply(seq_len(M), function(x) c(Log[4], t10[4]))
fifths <- lapply(seq_len(M), function(x) c(Log[5], t10[5]))
sixths <- lapply(seq_len(M), function(x) c(Log[6], t10[6]))
Six <- lapply(seq_len(M), function(x) list(1.75, NULL))

## RANDOM EFFECTS
rand.int <- "non_mix" # random intercept
rand.tsl <- "non_mix" # random time slope
rand.var <- NULL # no additional random effects

rmeans <- rskews <- rskurts <- rfifths <- rsixths <- c(0, 0)
rvars <- c(1, 1)
rSix <- list(NULL, NULL)

# append parameters for random effect distributions to parameters for
# continuous fixed effects and error terms
means <- append(means, list(rmeans))
vars <- append(vars, list(rvars))
skews <- append(skews, list(rskews))
skurts <- append(skurts, list(rskurts))
fifths <- append(fifths, list(rfifths))
sixths <- append(sixths, list(rsixths))
Six <- append(Six, list(rSix))

# use a list of length 1 so that betas are the same across Y
betas <- list(c(1, 1))
betas.subj <- list(0.5)
betas.tint <- list(0.75)

# set up correlation matrix for random effects
corr.u <- matrix(c(1, 0.3, 0.3, 1), 2, 2)

Step 2: Check parameter inputs

checkpar(M, "Polynomial", "non_mix", means, vars, skews, skurts, fifths,
  sixths, Six, marginal = marginal, support = support, corr.x = corr.x,
  corr.e = corr.e, same.var = same.var, subj.var = subj.var, betas = betas,
  betas.subj = betas.subj, betas.tint = betas.tint, rand.int = rand.int,
  rand.tsl = rand.tsl, corr.u = corr.u, quiet = TRUE)
## [1] TRUE

Step 3: Generate system

Sys3 <- corrsys(n, M, Time = NULL, "Polynomial", "non_mix", means, vars,
  skews, skurts, fifths, sixths, Six, marginal = marginal, support = support,
  corr.x = corr.x, corr.e = corr.e, same.var = same.var, subj.var = subj.var,
  betas = betas, betas.subj = betas.subj, betas.tint = betas.tint,
  rand.int = rand.int, rand.tsl = rand.tsl, corr.u = corr.u, seed = seed,
  use.nearPD = FALSE, quiet = TRUE)
## Total Simulation time: 0.008 minutes

Step 4: Describe results

Sum3 <- summary_sys(Sys3$Y, Sys3$E, E_mix = NULL, Sys3$X,
  Sys3$X_all, M, "Polynomial", means, vars, skews, skurts, fifths,
  sixths, marginal = marginal, support = support, corr.x = corr.x,
  corr.e = corr.e, U = Sys3$U, U_all = Sys3$U_all, rand.int = rand.int,
  rand.tsl = rand.tsl, corr.u = corr.u, rmeans2 = Sys3$rmeans2,
  rvars2 = Sys3$rvars2)
names(Sum3)
##  [1] "cont_sum_y"   "rho.y"        "cont_sum_e"   "target_sum_e"
##  [5] "rho.e"        "rho.ye"       "ord_sum_x"    "cont_sum_x"  
##  [9] "target_sum_x" "sum_xall"     "rho.x"        "rho.xall"    
## [13] "rho.yx"       "rho.yxall"    "maxerr"       "target_sum_u"
## [17] "cont_sum_u"   "sum_uall"     "rho.u"        "maxerr_u"
knitr::kable(Sum3$cont_sum_y, digits = 3, booktabs = TRUE, 
  caption = "Simulated Distributions of Outcomes")
Table 25: Simulated Distributions of Outcomes
Outcome N Mean SD Median Min Max Skew Skurtosis Fifth Sixth
Y1 1 10000 3.269 3.926 3.047 -14.527 25.868 0.374 0.829 1.476 5.402
Y2 2 10000 5.475 4.969 5.266 -10.926 29.478 0.254 -0.026 -0.052 0.751
Y3 3 10000 7.238 5.585 7.056 -13.822 31.864 0.151 -0.029 0.003 0.585
Y4 4 10000 9.099 6.403 8.950 -12.072 38.790 0.132 0.027 0.177 0.609
knitr::kable(Sum3$target_sum_u, digits = 3, booktabs = TRUE, 
  caption = "Target Distributions of Random Effects")
Table 26: Target Distributions of Random Effects
Outcome U Mean SD Skew Skurtosis Fifth Sixth
cont1_1 1 1 0 1 0 0 0 0
cont1_2 1 2 0 1 0 0 0 0
knitr::kable(Sum3$sum_uall, digits = 3, booktabs = TRUE, 
  caption = "Simulated Distributions of Random Effects")
Table 27: Simulated Distributions of Random Effects
Outcome U N Mean SD Median Min Max Skew Skurtosis Fifth Sixth
U_int 1 1 10000 0 1 -0.004 -3.834 3.564 -0.016 0.040 -0.096 -0.343
U_T1 1 2 10000 0 1 0.014 -3.415 3.933 -0.003 -0.044 0.239 -0.171

Maximum Correlation Error for Random Effects:

Sum3$maxerr_u
## [1] 0

Linear mixed model

A linear mixed model will be fit to the data using lme from package nlme in order to see if the random effects are estimated according to the simulation parameters (Pinheiro et al. 2017). The data is again reshaped into long format using reshape2::melt.

data3 <- as.data.frame(cbind(factor(1:n), Sys3$Y,
  Sys3$X_all[[1]][, c(1:2, 5)], Sys3$X_all[[2]][, c(2, 5)],
  Sys3$X_all[[3]][, c(2, 5)], Sys3$X_all[[4]][, c(2, 5)]))
colnames(data3)[1] <- "Subject"
data3.a <- melt(data3[, c("Subject", "ord1_1", "Y1", "Y2", "Y3", "Y4")],
  id.vars = c("Subject", "ord1_1"),
  measure.vars = c("Y1", "Y2", "Y3", "Y4"), variable.name = "Time",
  value.name = "Y")
data3.b <- melt(data3[, c("Subject", "cont1_1", "cont2_1", "cont3_1",
                          "cont4_1")],
  id.vars = c("Subject"), variable.name = "Time", value.name = "cont1")
data3.a$Time <- data3.b$Time <- c(rep(1, n), rep(2, n), rep(3, n), rep(4, n))
data3 <- merge(data3.a, data3.b, by = c("Subject", "Time"))

Errors modeled as having Gaussian distributions with an AR(1) correlation structure:

fm3 <- lme(Y ~ ord1_1 * Time + ord1_1 * cont1,
  random = ~ Time | Subject, correlation = corAR1(), data = data3)
sum_fm3 <- summary(fm3)

Each effect in the model was again found to be statistically significant at the \(\alpha = 0.001\) level.

Now, compare betas used in simulation to those returned by lme:

fm3.coef <- as.data.frame(sum_fm3$tTable[c("(Intercept)",
  "ord1_1", "cont1", "Time", "ord1_1:cont1", "ord1_1:Time"), ])
coef <- cbind(c(betas.0, betas[[1]], betas.t, betas.subj[[1]], 
  betas.tint[[1]]), fm3.coef)
colnames(coef)[1] <- "Simulated"
knitr::kable(as.data.frame(coef), digits = 3, booktabs = TRUE, 
  caption = "Beta Coefficients for Repeated Measures Model 2")
Table 28: Beta Coefficients for Repeated Measures Model 2
Simulated Value Std.Error DF t-value p-value
(Intercept) 0.00 -0.003 0.032 29996 -0.108 0.914
ord1_1 1.00 1.001 0.022 9998 46.435 0.000
cont1 1.00 0.997 0.007 29996 151.820 0.000
Time 1.00 1.012 0.021 29996 48.222 0.000
ord1_1:cont1 0.50 0.502 0.004 29996 111.984 0.000
ord1_1:Time 0.75 0.741 0.014 29996 51.844 0.000

Estimated standard deviation and AR(1) parameter for error terms:

sum_fm3$sigma
## [1] 1.00218
coef(fm3$modelStruct$corStruct, unconstrained = FALSE)
##       Phi 
## 0.4010826

Summary of estimated random effects:

varcor <- VarCorr(fm3)
fm3.ranef <- data.frame(Cor = as.numeric(varcor[2, 3]),
  SD_int = as.numeric(varcor[1, 2]), SD_Tsl = as.numeric(varcor[2, 2]))
knitr::kable(fm3.ranef, digits = 3, booktabs = TRUE)
Cor SD_int SD_Tsl
0.309 0.991 0.999

References

Bates, Douglas, and Martin Maechler. 2017. Matrix: Sparse and Dense Matrix Classes and Methods. https://CRAN.R-project.org/package=Matrix.

Bates, Douglas, Martin Mächler, Ben Bolker, and Steve Walker. 2015. “Fitting Linear Mixed-Effects Models Using lme4.” Journal of Statistical Software 67 (1): 1–48. https://doi.org/10.18637/jss.v067.i01.

Bates, Douglas, Martin Mächler, Ben Bolker, and Steve Walker. 2015. “Fitting Linear Mixed-Effects Models Using Lme4.” Journal of Statistical Software, Articles 67 (1): 1–48. https://doi.org/10.18637/jss.v067.i01.

Fialkowski, A C. 2017. SimMultiCorrData: Simulation of Correlated Data with Multiple Variable Types. https://CRAN.R-project.org/package=SimMultiCorrData.

———. 2018. SimCorrMix: Simulation of Correlated Data with Multiple Variable Types Including Continuous and Count Mixture Distributions. https://CRAN.R-project.org/package=SimCorrMix.

Higham, N. 2002. “Computing the Nearest Correlation Matrix - a Problem from Finance.” IMA Journal of Numerical Analysis 22 (3): 329–43. https://doi.org/10.1093/imanum/22.3.329.

Kuznetsova, Alexandra, Per B. Brockhoff, and Rune H. B. Christensen. 2017. “lmerTest Package: Tests in Linear Mixed Effects Models.” Journal of Statistical Software 82 (13): 1–26. https://doi.org/10.18637/jss.v082.i13.

Laird, NM, and JH Ware. 1982. “Random-Effects Models for Longitudinal Data.” Biometrics 38: 963–74.

Lindstrom, MJ, and DM Bates. 1990. “Nonlinear Mixed Effects Models for Repeated Measures Data.” Biometrics 46: 673–87.

Maree, S. 2012. “Correcting Non Positive Definite Correlation Matrices.” BSc Thesis Applied Mathematics, Delft University of Technology. http://resolver.tudelft.nl/uuid:2175c274-ab03-4fd5-85a9-228fe421cdbf.

Pinheiro, Jose, Douglas Bates, Saikat DebRoy, Deepayan Sarkar, and R Core Team. 2017. nlme: Linear and Nonlinear Mixed Effects Models. https://CRAN.R-project.org/package=nlme.

R Core Team. 2017. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Wickham, H. 2007. “Reshaping Data with the reshape Package.” Journal of Statistical Software 21 (12): 1–20. http://www.jstatsoft.org/v21/i12/.

Yin, JF, and Y Zhang. 2013. “Alternative Gradient Algorithms for Computing the Nearest Correlation Matrix.” Applied Mathematics and Computation 219 (14): 7591–9. https://doi.org/10.1016/j.amc.2013.01.045.

Zhang, Y, and JF Yin. Year not provided. “Modified Alternative Gradients Algorithm for Computing the Nearest Correlation Matrix.” Internal paper. Shanghai: Tongji University.