The `corrsys`

and `corrsys2`

functions generate correlated systems of `M`

equations representing a **system of repeated measures** at `M`

time points. The equations may contain 1) ordinal (\(r \geq 2\) categories), continuous (normal, non-normal, and mixture distributions), and/or count (regular and zero-inflated, Poisson and Negative Binomial) independent variables \(X\); 2) continuous error terms \(E\); 3) a discrete time variable \(Time\); and 4) random effects \(U\). The random effects may be a random intercept, a random slope for time, or a random slope for any of the \(X\) variables. The **important assumptions** are:

- There are at least 2 equations with at least 1 independent variable total.

- The independent variables, random effect terms, and error terms are uncorrelated.

- Each equation has an error term.

- All error terms have continuous non-mixture distributions or all have continuous mixture distributions.

- All random effects are continuous.

- Growth is linear with respect to time.

Continuous variables are simulated using either Fleishman (1978)’s third-order (`method = "Fleishman"`

) or Headrick (2002)’s fifth-order (`method = "Polynomial"`

) power method transformation (PMT). The \(X\) terms can be the same across equations (i.e., a static covariate as in sex or height) or may be time-varying covariates. The equations may contain different numbers of \(X\) terms (i.e., a covariate could be missing for a given equation).

The outcomes \(Y\) are generated using a hierarchical linear models approach, and the user can specify *subject-level* \(X\) terms. Each subject-level \(X\) term is crossed with all *group-level* \(X\) terms. The equations may also contain interactions between group or subject-level \(X\) variables. Interactions between two group-level variables is considered another group-level variable which will be crossed with all subject-level variables. Interactions between two subject-level variables is considered another subject-level variable which will be crossed with all group-level variables. Since `Time`

is a subject-level variable, each group-level term is crossed with `Time`

unless otherwise specified.

The independent variables, interactions, `Time`

effect, random effects, and error terms are summed together to produce the outcomes \(Y\). The `beta`

coefficients may be the same or differ across equations. The user specifies the betas for the independent variables in `betas`

, for the interactions between two group-level or two subject-level covariates in `betas.int`

, for the group-subject level interactions in `betas.subj`

, and for the `Time`

interactions in `betas.tint`

. Setting a coefficient to 0 will eliminate that term.

The user specifies the correlations 1) between \(E\) terms; 2) between \(X\) variables within each outcome \(Y_p\), \(p = 1, ..., M\), and across outcomes; and 3) between \(U\) variables within each outcome \(Y_p\), \(p = 1, ..., M\), and across outcomes. The order of the independent variables in `corr.x`

must be \(1^{st}\) ordinal (same order as in `marginal`

), \(2^{nd}\) continuous non-mixture (same order as in `skews`

), \(3^{rd}\) components of continuous mixture (same order as in `mix_pis`

), \(4^{th}\) regular Poisson, \(5^{th}\) zero-inflated Poisson (same order as in `lam`

), \(6^{th}\) regular NB, and \(7^{th}\) zero-inflated NB (same order as in `size`

). The order of the random effects in `corr.u`

must be \(1^{st}\) random intercept, \(2^{nd}\) random time slope, \(3^{rd}\) continuous non-mixture random effects, and \(4^{th}\) components of continuous mixture random effects.

The variables are generated from multivariate normal variables with intermediate correlations calculated using either `SimCorrMix::intercorr`

and **correlation method 1** for `corrsys`

or `SimCorrMix::intercorr2`

and **correlation method 2** for `corrsys2`

. See the **SimCorrMix** package for a description of the correlation methods and the techniques used to generate each variable type. The order of the variables returned is \(1^{st}\) covariates \(X\) (as specified in `corr.x`

), \(2^{nd}\) group-group or subject-subject interactions (ordered as in `int.var`

), \(3^{rd}\) subject-group interactions (\(1^{st}\) by subject-level variable as specified in `subj.var`

, \(2^{nd}\) by covariate as specified in `corr.x`

), and \(4^{th}\) `Time`

interactions (either as specified in `corr.x`

with group-level covariates or in `tint.var`

).

The `corrsys`

and `corrsys2`

functions contain no parameter checks in order to decrease simulation time. That should be done first using `checkpar`

. Summaries of the system can be obtained using `summary_sys`

. More information regarding function inputs can be found by consulting the function documentation. Some code has been adapted from the **SimMultiCorrData** (Fialkowski 2017) and **SimCorrMix** (Fialkowski 2018) packages.

The repeated measures model can be described using a HLM approach, which allows the data to be structured in at least two levels. **Level-1** is the repeated measure (time or condition) and other subject-level variables. **Level-1** is nested within **Level-2**, which describes the average of the outcome (the intercept) and growth (slope for time) as a function of group-level variables. The first level captures the within-subject variation, while the second level describes the between-subjects variability. Using a HLM provides a way to determine if: a) subjects differ at a specific time point with respect to the dependent variable, b) growth rates differ across conditions, or c) growth rates differ across subjects. HLM may contain random effects that describe deviation at the subject-level from the average (fixed) effect described by the beta coefficients (Leeden 1998; Kincaid 2005; McCulloch, SR, and Neuhaus 2008; M Lininger 2015).

The **major assumptions** are:

- The independent variables are uncorrelated with the error terms.

- The random effects are uncorrelated with the error terms.

- The independent variables are uncorrelated with the random effects.

Consider a two level model where the outcome \(Y\) is measured at `M`

times for `n`

subjects, and the growth is assumed to be linear with respect to `Time`

. The **Level-1** model contains subject-specific factors, including any time-varying covariates. Assume there are three time-varying covariates: \(X_{cont1}\), which has a continuous non-mixture distribution, \(X_{mix1}\), which has a continuous mixture distribution, and \(X_{nb1}\), which has a Negative Binomial (regular or zero-inflated) distribution.

**Level-1** is represented by a function of the \(p^{th}\) time point, \(p = 1, ..., M\), for the \(i^{th}\) subject, \(i = 1, ..., n\) as follows:

Since the repeated measurements are nested within subjects, the subject index \(i\) appears \(1^{st}\), followed by the measurement (`Time`

) index \(p\), with the independent variable index last. This aids in thinking of the outcomes as the rows of a matrix, with the independent variables as the columns. The slope (\(\lambda\)) coefficients are assumed to be randomly sampled from normal distributions with finite variance.

The **Level-2** model represents the averages of \(Y\) (\(\lambda_{0}\)), time-varying covariates (\(\lambda_{1}, \lambda_{2}, \lambda_{3}\)), and growth (\(\lambda_{T}\)) as a function of other independent variables (covariates). Assume there is one ordinal covariate \(X_{ord(1)}\) (i.e., drug treatment group) and one Poisson (regular or zero-inflated) covariate \(X_{pois(1)}\) that are static across outcomes. In addition, there is an interaction between \(X_{ord(1)}\) and \(X_{pois(1)}\) which is itself a group-level variable.

**Level-2** is expressed as follows:

Again, the slope (\(\gamma\)) coefficients are assumed to be randomly sampled from normal distributions with finite variance.

Combining these into a single-equation gives:

\[\begin{equation} \begin{split} Y_{ip} &= (\gamma_{00} + \gamma_{0,ord(1)}X_{i,ord(1)} + \gamma_{0,pois(1)}X_{i,pois(1)} + \gamma_{0,int1}X_{i,ord(1)} * X_{i,pois(1)} + \eta_{0}) \\ &\ \ \ + (\gamma_{10} + \gamma_{1,ord(1)}X_{i,ord(1)} + \gamma_{1,pois(1)}X_{i,pois(1)} + \gamma_{1,int1}X_{i,ord(1)} * X_{i,pois(1)} + \eta_{1}) * X_{i,cont(p1)} \\ &\ \ \ + (\gamma_{20} + \gamma_{2,ord(1)}X_{i,ord(1)} + \gamma_{2,pois(1)}X_{i,pois(1)} + \gamma_{2,int1}X_{i,ord(1)} * X_{i,pois(1)} + \eta_{2}) * X_{i,mix(p1)} \\ &\ \ \ + (\gamma_{30} + \gamma_{3,ord(1)}X_{i,ord(1)} + \gamma_{3,pois(1)}X_{i,pois(1)} + \gamma_{3,int1}X_{i,ord(1)} * X_{i,pois(1)} + \eta_{3}) * X_{i,nb(p1)} \\ &\ \ \ + (\gamma_{T0} + \gamma_{T,ord(1)}X_{i,ord(1)} + \gamma_{T,ord(2)}X_{i,ord(2)} + \gamma_{T,pois(1)}X_{i,pois(1)} + \gamma_{T,int1}X_{i,ord(1)} * X_{i,pois(1)} \\ &\ \ \ + \eta_{T}) * Time_{ip} + E_{ip}. \end{split} \tag{3} \end{equation}\]Since \(E[\eta_{0}] = E[\eta_{1}] = E[\eta_{2}] = E[\eta_{3}] = E[\eta_{T}] = 0\), \(Y_{ip}\) may be re-expressed as:

\[\begin{equation} \begin{split} Y_{ip} &= \beta_{0} + \beta_{1}X_{i,ord(1)} + \beta_{2}X_{i,cont(p1)} + \beta_{3}X_{i,mix(p1)} + \beta_{4}X_{i,pois(1)} + \beta_{5}X_{i,nb(p1)} + \beta_{int1}X_{i,ord(1)} * X_{i,pois(1)} \\ &\ \ \ + \beta_{T}Time_{ip} + (\beta_{subj1}X_{i,ord(1)} + \beta_{subj2}X_{i,pois(1)} + \beta_{subj3}X_{i,ord(1)} * X_{i,pois(1)}) * X_{i,cont(p1)} \\ &\ \ \ + (\beta_{subj4}X_{i,ord(1)} + \beta_{subj5}X_{i,pois(1)} + \beta_{subj6}X_{i,ord(1)} * X_{i,pois(1)}) * X_{i,mix(p1)} \\ &\ \ \ + (\beta_{subj7}X_{i,ord(1)} + \beta_{subj8}X_{i,pois(1)} + \beta_{subj9}X_{i,ord(1)} * X_{i,pois(1)}) * X_{i,nb(p1)} \\ &\ \ \ + (beta_{tint1} * X_{i,ord(1)} + beta_{tint2} * X_{i,pois(1)} + beta_{tint3}X_{i,ord(1)} * X_{i,pois(1)}) * Time_{ip} + E_{ip}. \end{split} \tag{4} \end{equation}\]Note that each of the group-level covariates (\(X_{ord(1)}, X_{pois(1)},\) and \(X_{ord(1)} * X_{pois(1)}\)) has an interaction with each of the subject-level covariates (\(X_{cont(1)}, X_{mix(1)}, X_{nb(1)}, Time\)). The slope coefficients may be the same across outcomes, as in the above example, or they may differ across outcomes. Setting a coefficient to 0 will eliminate that term.

**Random effects** may be added for the intercept, time slope, or effects of any of the covariates. Random effects describe deviation at the subject-level from the average (fixed) effect described by the slope coefficients (betas). Adding a random intercept and slope for time to the above example yields the following:

The type of random intercept and time slope (i.e., non-mixture or mixture) is specified in `rand.int`

and `rand.tsl`

. This type may vary by equation. The random effects for independent variables are specified in `rand.var`

and may also contain a combination of non-mixture and mixture continuous distributions.

Fialkowski, A C. 2017. *SimMultiCorrData: Simulation of Correlated Data with Multiple Variable Types*. https://CRAN.R-project.org/package=SimMultiCorrData.

———. 2018. *SimCorrMix: Simulation of Correlated Data with Multiple Variable Types Including Continuous and Count Mixture Distributions*. https://CRAN.R-project.org/package=SimCorrMix.

Fleishman, A I. 1978. “A Method for Simulating Non-Normal Distributions.” *Psychometrika* 43: 521–32. https://doi.org/10.1007/BF02293811.

Headrick, T C. 2002. “Fast Fifth-Order Polynomial Transforms for Generating Univariate and Multivariate Non-Normal Distributions.” *Computational Statistics and Data Analysis* 40 (4): 685–711. https://doi.org/10.1016/S0167-9473(02)00072-5.

Kincaid, C. 2005. “Guidelines for Selecting the Covariance Structure in Mixed Model Analysis.” *Computational Statistics and Data Analysis* 198 (30): 1–8.

Leeden, R Van Der. 1998. “Multilevel Analysis of Repeated Measures Data.” *Quality & Quantity* 32 (1): 15–29.

M Lininger, CC Cheatham, J Spybrook. 2015. “Hierarchical Linear Model: Thinking Outside the Traditional Repeated-Measures Analysis-of-Variance Box.” *Journal of Athletic Training* 50 (4): 438–41. https://doi.org/10.4085/1062-6050-49.5.09.

McCulloch, CE, SR Searle SR, and JM Neuhaus. 2008. *Generalized, Linear, and Mixed Models*. 2nd ed. Wiley Series in Probability and Statistics. Hoboken, New Jersey: John Wiley & Sons, Inc.