Repeated measures may be thought of as a system of \(M\) equations. The dependent variables \(Y_{p}\), for \(p = 1,\ ...,\ M\), may represent the same variable collected at \(M\) time points or under \(M\) different conditions (i.e., different treatment drugs). The \(K\) independent variables \(X_{pi}\), for \(p = 1,\ ...,\ M\) and \(i = 1,\ ...,\ K\), may be non-varying covariates (*static*, i.e., gender or height), time-varying covariates (i.e., age or blood sugar), or a combination of the two. Although Fleishman (1978)’s third and Headrick (2002)’s fifth-order polynomial transformation methods (power method transformations, PMT’s) allow the simulation of correlated non-normal variables for a single equation or system of independent equations, they have certain limitations when producing a system of correlated equations. For example, there is no way to control the distributions of and correlations between the error terms. Headrick and Beasley (2004) derived a general, computationally-efficient procedure for simulating correlated non-normal systems of statistical equations that permits control of correlated non-normal: a) error (*stochastic disturbance*) terms, b) independent variables, and c) dependent and independent variables. Such a procedure is important in order to investigate the properties of analytical models and test statistics that rely on certain distributional assumptions. For example, sufficient sample size, multivariate normality, independent errors, sphericity, and balanced data.

Headrick and Beasley (2004)’s technique allows the user to specify the distributions of the independent variables \(X_{(pj)}\), for \(p = 1, ..., M\) equations, and stochastic disturbance (error) terms \(E\). The user also controls the correlations between: 1) independent variables \(X_{(pj)}\) for a given outcome \(Y_p\), 2) an outcome \(Y_p\) and its \(X_{(pj)}\) terms, and 3) stochastic disturbance (error) terms \(E\). The technique calculates the beta (slope) coefficients based on correlations between independent variables \(X_{(pj)}\) for a given outcome \(Y_p\), the correlations between that outcome \(Y_p\) and the \(X_{(pj)}\) terms, and the variances. Headrick and Beasley (2004) also derived equations to calculate the expected correlations based on these betas between: 1) \(X_{(pj)}\) terms and outcome \(Y_q\), 2) outcomes \(Y_p\) and \(Y_q\), and 3) outcome \(Y_p\) and error term \(E_q\), for \(p\) not equal to \(q\). All continuous variables are generated from intermediate random normal variables using Headrick (2002)’s fifth-order power transformation method (PMT). The intermediate correlations required to generate the target correlations after variable transformations are calculated using Headrick (2002)’s equation.

Define a system of \(M\) equations as follows:

\[\begin{equation} \begin{split} Y_{1} &= \beta_{10} + \beta_{11}X_{11} + ... + \beta_{1i}X_{1i} + ... + \beta_{1j}X_{1j} + ... + \beta_{1K}X_{1K} + \sigma_{1}E_{1} \\ &... \\ Y_{p} &= \beta_{p0} + \beta_{p1}X_{p1} + ... + \beta_{pi}X_{pi} + ... + \beta_{pj}X_{pj} + ... + \beta_{pK}X_{pK} + \sigma_{p}E_{p} \\ &... \\ Y_{q} &= \beta_{q0} + \beta_{q1}X_{q1} + ... + \beta_{qi}X_{qi} + ... + \beta_{qj}X_{qj} + ... + \beta_{qK}X_{qK} + \sigma_{q}E_{q} \\ &... \\ Y_{M} &= \beta_{M0} + \beta_{M1}X_{M1} + ... + \beta_{Mi}X_{Mi} + ... + \beta_{Mj}X_{Mj} + ... + \beta_{MK}X_{MK} + \sigma_{M}E_{M}, \end{split} \tag{1} \end{equation}\]for \(1 \leq p < q \leq M\) and \(1 \leq i < j \leq K\). Each dependent variable \(Y\), independent variable \(X\), and error term \(E\) has length \(n\) (the sample size). Assume the error terms have expected values of \(0\) and variances of \(1\). They can have any combination of skew (\(\gamma_{1}\)), standardized kurtosis (\(\gamma_{2}\)), and standardized fifth (\(\gamma_{3}\)) and sixth (\(\gamma_{4}\)) cumulants permitted under the desired PMT’s constraints. The \(\sigma\) values are real positive scalars that may vary across equations.

Alternatively, the last terms may be rexpressed where the error terms are random variables with standard deviations described by the \(\sigma\) values (i.e., using \(E_p\) instead of \(\sigma_{p}E_{p}\)). This notation follows the technique used in the `nonnormsys`

function, which simulates the correlated system of statistical equations containing continuous variables within the **SimRepeat** package. In addition, the \(M\) equations do not have to include the same number of independent variables, and a variable may be designated as being the same for all equations (in input `same.var`

). The \(M\) dependent variables are created by generating the right-hand sides of (1).

- There are at least 2 equations and a total of at least 1 independent variable.

- The independent variables are uncorrelated with the error terms within the same outcome and across outcomes.

- Each equation has an error term.

- All error terms have either a non-mixture or mixture distribution.

Without loss of generality, consider the variable pair \(X_{pi}\) and \(X_{pj}\), for \(p = 1,\ ...,\ M\), \(i = 1,\ ...,\ K\), \(j = 1,\ ...,\ K\), and \(i \neq j\), with expected values of \(\mu_{X_{pi}}\) and \(\mu_{X_{pj}}\) and variances of \(\sigma_{X_{pi}}^2\) and \(\sigma_{X_{pj}}^2\). The variables \(X_{pi}\) and \(X_{pj}\) have distributions specified by skews \(\gamma_{1i}\) and \(\gamma_{1j}\), standardized kurtoses \(\gamma_{2i}\) and \(\gamma_{2j}\), and standardized fifth and sixth cumulants \(\gamma_{3i}\), \(\gamma_{3j}\), \(\gamma_{4i}\), and \(\gamma_{4j}\). \(X_{pi}\) and \(X_{pj}\) are generated from location-scale transformations applied to \(X_{pi}'\) and \(X_{pj}'\), which come from Headrick (2002)’s fifth-order PMT given by:

\[\begin{equation} \begin{split} X_{pi}' &= c_{0pi} + c_{1pi}Z_{pi} + c_{2pi}Z_{pi}^2 + c_{3pi}Z_{pi}^3 + c_{4pi}Z_{pi}^4 + c_{5pi}Z_{pi}^5 \\ X_{pj}' &= c_{0pj} + c_{1pj}Z_{pj} + c_{2pj}Z_{pj}^2 + c_{3pj}Z_{pj}^3 + c_{4pj}Z_{pj}^4 + c_{5pj}Z_{pj}^5, \end{split} \tag{2} \end{equation}\]where \(Z_{pi}\) and \(Z_{pj}\) are both \(N(0,1)\) variables. The constants \(c_{0pi},\ ...,\ c_{5pi}\) are determined by solving Headrick (2002)’s system of equations (contained in `SimMultiCorrData::poly`

) using \(\gamma_{1i},\ \gamma_{2i},\ \gamma_{3i},\) and \(\gamma_{4i}\) and likewise for \(c_{0pj},\ ...,\ c_{5pj}\) (Fialkowski 2017). The results also hold for Fleishman (1978)’s method, with \(c_{4pi} = c_{5pi} = c_{4pj} = c_{5pj} = 0\) and the system of equations contained in `SimMultiCorrData::fleish`

. Then the intermediate correlation \(\rho_{ZpiZpj}\) between \(Z_{pi}\) and \(Z_{pj}\) required to achieve a target correlation \(\rho_{XpiXpj}\) between \(X_{pi}\) and \(X_{pj}\) is given by Headrick (2002)’s Equation 26 (p. 694), or Headrick and Sawilowsky (1999)’s Equation 7b (p. 28) for Fleishman’s method. The intermediate correlations are calculated by the function `SimCorrMix::intercorr_cont`

. See the **SimCorrMix** package for a vignette (Fialkowski 2018). The purpose of correlating \(Z_{pi}\) and \(Z_{pj}\) at an intermediate level is to adjust *a priori* for the non-normalization effect of the constants so that \(X_{pi}\) and \(X_{pj}\) have the desired final correlation (Headrick and Beasley 2004).

The equations given by Headrick and Beasley (2004) to determine the beta coefficients and theoretical correlations between outcomes \(Y\), outcomes and independent variables, and outcomes and error terms are derived below (with some modifications and corrections). These are contained in the **SimRepeat** functions `calc_betas`

, `calc_corr_y`

, `calc_corr_ye`

, and `calc_corr_yx`

.

\[\begin{equation} Y_{p} = \beta_{p0} + \beta_{p1}X_{p1} + ... + \beta_{pi}X_{pi} + ... + \beta_{pj}X_{pj} + ... + \beta_{pK}X_{pK} + E_{p}. \tag{3} \end{equation}\] In (3), the error term \(E_p\) has variance \(\sigma_p^2 = E[E_p^2] - E[E_p]^2\). Obtaining the values for \(\beta_{p1},\ ...,\ \beta_{pi},\ ...,\ \beta_{pK}\) requires solving a system of \(K\) equations. The correlation between \(Y_p\) and \(X_{pi}\) is given by:

\[\begin{equation} \rho_{Y_pX_{pi}} = \frac{E[Y_pX_{pi}] - E[Y_p]E[X_{pi}]}{\sigma_{Y_p} \sigma_{X_{pi}}}, \tag{4} \end{equation}\] where \[\begin{equation} \sigma_{Y_p}^2 = E[Y_p^2] - E[Y_p]^2. \tag{5} \end{equation}\] The correlation between two independent variables \(X_{pi}\) and \(X_{pj}\) is given by:

\[\begin{equation} \rho_{X_{pi}X_{pj}} = \frac{E[X_{pi}X_{pj}] - E[X_{pi}]E[X_{pj}]}{\sigma_{X_{pi}} \sigma_{X_{pj}}}, \tag{6} \end{equation}\] where \[\begin{equation} \sigma_{X_{pi}}^2 = E[X_{pi}^2] - E[X_{pi}]^2\ \mathrm{and}\ \sigma_{X_{pj}}^2 = E[X_{pj}^2] - E[X_{pj}]^2. \tag{7} \end{equation}\] Using (3),

\[\begin{align} E[Y_{p}] &= \beta_{p0} + \beta_{p1}E[X_{p1}] + ... + \beta_{pi}E[X_{pi}] + ... + \beta_{pj}E[X_{pj}] + ... + \beta_{pK}E[X_{pK}] + E[E_{p}] \tag{8} \\ E[Y_{p}]^2 &= \beta_{p0}^2 + 2\beta_{p0} \sum_{pi} \beta_{pi}E[X_{pi}] + 2\beta_{p0}E[E_{p}] + \sum_{pi} \beta_{pi}^2 E[X_{pi}]^2 \notag \\ &\ \ \ + 2\sum_{pi \neq pj} \beta_{pi}\beta_{pj}E[X_{pi}]E[X_{pj}] + 2\sum_{pi} \beta_{pi}E[X_{pi}]E[E_p] + E[E_p]^2 \tag{9} \end{align}\] \[\begin{align} \begin{split} E[Y_{p}^2] &= E[(\beta_{p0} + \beta_{p1}X_{p1} + ... + \beta_{pi}X_{pi} + ... + \beta_{pj}X_{pj} + ... + \beta_{pK}X_{pK} + E_{p})^2]\\ &= E[\beta_{p0}^2 + \beta_{p0}(\beta_{p1}X_{p1} + ... + \beta_{pi}X_{pi} + ... + \beta_{pj}X_{pj} + ... + \beta_{pK}X_{pK}) + \beta_{p0}E_{p}\\ &\ \ \ + \beta_{p0}\beta_{p1}X_{p1} + \beta_{p1}^2 X_{p1}^2 + \beta_{p1}X_{p1}(\beta_{p2}X_{p2} + ... + \beta_{pi}X_{pi} + ... + \beta_{pj}X_{pj} + ... + \beta_{pK}X_{pK}) + \beta_{p1}X_{p1}E_{p}\\ &\ \ \ \ + ...\\ &\ \ \ + \beta_{p0}\beta_{pK}X_{pK} + \beta_{pK}X_{pK}(\beta_{p1}X_{p1} + ... + \beta_{pi}X_{pi} + ... + \beta_{pj}X_{pj} + ... + \beta_{p(K - 1)}X_{p(K - 1)}) + \beta_{pK}^2 X_{pK}^2\\ &\ \ \ + \beta_{pK}X_{pK}E_{p} + \beta_{p0}E_{p} + E_p(\beta_{p1}X_{p1} + ... + \beta_{pi}X_{pi} + ... + \beta_{pj}X_{pj} + ... + \beta_{pK}X_{pK}) + E_p^2]\\ &= \beta_{p0}^2 + 2\beta_{p0} \sum_{pi} \beta_{pi}E[X_{pi}] + 2\beta_{p0}E[E_{p}] + E[E_p^2] + \sum_{pi} \beta_{pi}^2 E[X_{pi}^2] + 2\sum_{pi \neq pj} \beta_{pi}\beta_{pj}E[X_{pi}X_{pj}]\\ &\ \ \ + 2\sum_{pi} \beta_{pi}E[X_{pi}E_p]. \end{split} \tag{10} \end{align}\] From (9) and (10), the variance of \(Y\) is given by:

\[\begin{align} \begin{split} \sigma_{Y_p}^2 &= E[E_p^2] - E[E_p]^2 + \sum_{pi} \beta_{pi}^2 E[X_{pi}^2] - \sum_{pi} \beta_{pi}^2 E[X_{pi}]^2 + 2\sum_{pi \neq pj} \beta_{pi}\beta_{pj}E[X_{pi}X_{pj}] \\ &\ \ \ - 2\sum_{pi \neq pj} \beta_{pi}\beta_{pj}E[X_{pi}]E[X_{pj}] + 2\sum_{pi} \beta_{pi}E[X_{pi}E_p] - 2\sum_{pi} \beta_{pi}E[X_{pi}]E[E_p]. \end{split} \tag{11} \end{align}\] Using model assumption 2 that the independent variables are uncorrelated with the error terms, \(E[X_{pi}E_p] = E[X_{pi}]E[E_p]\). Using (6), the variance of \(Y\) is:

\[\begin{equation} \sigma_{Y_p}^2 = \sigma_p^2 + \sum_{pi} \beta_{pi}^2 \sigma_{X_{pi}}^2 + 2\sum_{pi \neq pj} \beta_{pi}\beta_{pj}\sigma_{X_{pi}}\sigma_{X_{pj}}\rho_{X_{pi}X_{pj}}. \tag{12} \end{equation}\] Then,

\[\begin{align} \begin{split} E[Y_pX_{pi}] &= E[(\beta_{p0} + \beta_{p1}X_{p1} + ... + \beta_{pi}X_{pi} + ... + \beta_{pj}X_{pj} + ... + \beta_{pK}X_{pK} + E_{p}) X_{pi}]\\ &= \beta_{p0}E[X_{pi}] + \sum_{pj \neq pi} \beta_{pj}E[X_{pi}X_{pj}] + \beta_{pi}E[X_{pi}^2] + E[X_{pi}E_p] \end{split} \tag{13} \end{align}\] and

\[\begin{align} \begin{split} E[Y_p]E[X_{pi}] &= E[\beta_{p0} + \beta_{p1}X_{p1} + ... + \beta_{pi}X_{pi} + ... + \beta_{pj}X_{pj} + ... + \beta_{pK}X_{pK} + E_{p}]E[X_{pi}]\\ &= \beta_{p0}E[X_{pi}] + \sum_{pj \neq pi} \beta_{pj}E[X_{pi}]E[X_{pj}] + \beta_{pi}E[X_{pi}]^2 + E[X_{pi}]E[E_p]. \end{split} \tag{14} \end{align}\] Combining (6), (7), (13) and (14) and using model assumption 2 gives:

\[\begin{equation} E[Y_pX_{pi}] - E[Y_p]E[X_{pi}] = \sigma_{X_{pi}} \sum_{pj \neq pi} \beta_{pj}\sigma_{X_{pj}}\rho_{X_{pi}X_{pj}} + \beta_{pi}\sigma_{X_{pi}}^2. \tag{14} \end{equation}\] Finally, the correlation \(\rho_{Y_pX_{pi}}\) from (4) can be reexpressed as:

\[\begin{align} \begin{split} \rho_{Y_pX_{pi}} &= \frac{\sigma_{X_{pi}} \sum_{pj \neq pi} \beta_{pj}\sigma_{X_{pj}}\rho_{X_{pi}X_{pj}} + \beta_{pi}\sigma_{X_{pi}}^2}{\sigma_{X_{pi}} \sqrt{\sigma_p^2 + \sum_{pi} \beta_{pi}^2 \sigma_{X_{pi}}^2 + 2\sum_{pi \neq pj} \beta_{pi}\beta_{pj}\sigma_{X_{pi}}\sigma_{X_{pj}}\rho_{X_{pi}X_{pj}}}}\\ &= \frac{\sum_{pj \neq pi} \beta_{pj}\sigma_{X_{pj}}\rho_{X_{pi}X_{pj}} + \beta_{pi}\sigma_{X_{pi}}}{\sqrt{\sigma_p^2 + \sum_{pi} \beta_{pi}^2 \sigma_{X_{pi}}^2 + 2\sum_{pi \neq pj} \beta_{pi}\beta_{pj}\sigma_{X_{pi}}\sigma_{X_{pj}}\rho_{X_{pi}X_{pj}}}}. \end{split} \tag{15} \end{align}\]

For outcome \(p\) and \(i = 1,\ ...,\ K\), (15) gives a system of \(K\) equations that are solved to determine the \(K\) beta coefficients that yield the target correlations \(\rho_{Y_pX_{pi}}\). This is done in **SimRepeat** by the function `calc_betas`

. The required inputs \(\rho_{Y_pX_{pi}}\), \(\rho_{X_{pi}X_{pj}}\), and \(\sigma_{X_{pi}}^2\) are specified via the function parameters `corr.yx`

, `corr.x`

, and `vars`

.

After the beta coefficients are found, theoretical values for the remaining correlations in system (1) can be determined.

\[\begin{equation} Y_{q} = \beta_{q0} + \beta_{q1}X_{q1} + ... + \beta_{qi}X_{qi} + ... + \beta_{qj}X_{qj} + ... + \beta_{qL}X_{qL} + E_{q}. \tag{16} \end{equation}\] In (16), the error term \(E_q\) has variance \(\sigma_q^2 = E[E_q^2] - E[E_q]^2\). Let \(X_{pi}\), for \(i = 1,\ ...,\ K\), be one of the independent variables in (3). The correlation between \(Y_q\) and \(X_{pi}\) is given by:

\[\begin{equation} \rho_{Y_qX_{pi}} = \frac{E[Y_qX_{pi}] - E[Y_q]E[X_{pi}]}{\sigma_{Y_q} \sigma_{X_{pi}}}, \tag{17} \end{equation}\] where \[\begin{equation} \sigma_{Y_q}^2 = E[Y_q^2] - E[Y_q]^2. \tag{18} \end{equation}\] The correlation between two independent variables \(X_{pi}\), for \(i = 1,\ ...,\ K\), and \(X_{qj}\), for \(j = 1,\ ...,\ L\), is given by:

\[\begin{equation} \rho_{X_{pi}X_{qj}} = \frac{E[X_{pi}X_{qj}] - E[X_{pi}]E[X_{qj}]}{\sigma_{X_{pi}} \sigma_{X_{qj}}}, \tag{19} \end{equation}\] where \[\begin{equation} \sigma_{X_{pi}}^2 = E[X_{pi}^2] - E[X_{pi}]^2\ \mathrm{and}\ \sigma_{X_{qj}}^2 = E[X_{qj}^2] - E[X_{qj}]^2. \tag{20} \end{equation}\] Then,

\[\begin{align} \begin{split} E[Y_qX_{pi}] &= E[(\beta_{q0} + \beta_{q1}X_{q1} + ... + \beta_{qi}X_{qi} + ... + \beta_{qj}X_{qj} + ... + \beta_{qL}X_{qL} + E_{q}) X_{pi}]\\ &= \beta_{q0}E[X_{pi}] + \sum_{qj} \beta_{qj}E[X_{qj}X_{pi}] + E[X_{pi}E_q] \end{split} \tag{21} \end{align}\] and

\[\begin{align} \begin{split} E[Y_q]E[X_{pi}] &= E[\beta_{q0} + \beta_{q1}X_{q1} + ... + \beta_{qi}X_{qi} + ... + \beta_{qj}X_{qj} + ... + \beta_{qL}X_{qL} + E_{q}]E[X_{pi}]\\ &= \beta_{q0}E[X_{pi}] + \sum_{qj} \beta_{qj}E[X_{qj}]E[X_{pi}] + E[X_{pi}]E[E_q]. \end{split} \tag{22} \end{align}\] Combining (19), (21) and (22) and using model assumption 2 gives:

\[\begin{equation} E[Y_qX_{pi}] - E[Y_q]E[X_{pi}] = \sigma_{X_{pi}} \sum_{qj} \beta_{qj}\sigma_{X_{qj}}\rho_{X_{pi}X_{qj}}. \tag{23} \end{equation}\] Finally, the correlation \(\rho_{Y_qX_{pi}}\) from (17) can be reexpressed as:

\[\begin{align} \begin{split} \rho_{Y_qX_{pi}} &= \frac{\sigma_{X_{pi}} \sum_{qj} \beta_{qj}\sigma_{X_{qj}}\rho_{X_{pi}X_{qj}}}{\sigma_{X_{pi}} \sqrt{\sigma_q^2 + \sum_{qi} \beta_{qi}^2 \sigma_{X_{qi}}^2 + 2\sum_{qi \neq qj} \beta_{qi}\beta_{qj}\sigma_{X_{qi}}\sigma_{X_{qj}}\rho_{X_{qi}X_{qj}}}}\\ &= \frac{\sum_{qj} \beta_{qj}\sigma_{X_{qj}}\rho_{X_{pi}X_{qj}}}{\sqrt{\sigma_q^2 + \sum_{qi} \beta_{qi}^2 \sigma_{X_{qi}}^2 + 2\sum_{qi \neq qj} \beta_{qi}\beta_{qj}\sigma_{X_{qi}}\sigma_{X_{qj}}\rho_{X_{qi}X_{qj}}}}. \end{split} \tag{24} \end{align}\]

This correlation is found in **SimRepeat** by the function `calc_corr_yx`

. The required inputs \(\beta_{qj}\), \(\rho_{X_{pi}X_{qj}}\), \(\rho_{X_{qi}X_{qj}}\), \(\sigma_{X_{pi}}^2\), and \(\sigma_{X_{qj}}^2\) are specified via the function parameters `betas`

, `corr.x`

, and `vars`

.

\[\begin{align} \begin{split} E[Y_pY_q] &= E[(\beta_{p0} + \beta_{p1}X_{p1} + ... + \beta_{pi}X_{pi} + ... + \beta_{pj}X_{pj} + ... + \beta_{pK}X_{pK} + E_{p})\\ &\ \ \ * (\beta_{q0} + \beta_{q1}X_{q1} + ... + \beta_{qi}X_{qi} + ... + \beta_{qj}X_{qj} + ... + \beta_{qL}X_{qL} + E_{q})]\\ &= \beta_{p0}\beta_{q0} + \beta_{p0} \sum_{qj} \beta_{qj}E[X_{qj}] + \beta_{p0}E[E_q] + \beta_{q0}E[E_p] + \beta_{q0} \sum_{pi} \beta_{pi}E[X_{pi}]\\ &\ \ \ + \sum_{pi} \sum_{qj} \beta_{pi} \beta_{qj} E[X_{pi}X_{qj}] + \sum_{pi} \beta_{pi} E[X_{pi}E_q] + \sum_{qj} \beta_{qj} E[X_{qj}E_p] + E[E_pE_q] \end{split} \tag{26} \end{align}\] and

\[\begin{align} \begin{split} E[Y_p]E[Y_q] &= E[(\beta_{p0} + \beta_{p1}X_{p1} + ... + \beta_{pi}X_{pi} + ... + \beta_{pj}X_{pj} + ... + \beta_{pK}X_{pK} + E_{p})]\\ &\ \ \ * E[(\beta_{q0} + \beta_{q1}X_{q1} + ... + \beta_{qi}X_{qi} + ... + \beta_{qj}X_{qj} + ... + \beta_{qL}X_{qL} + E_{q})]\\ &= \beta_{p0}\beta_{q0} + \beta_{p0} \sum_{qj} \beta_{qj}E[X_{qj}] + \beta_{p0}E[E_q] + \beta_{q0}E[E_p] + \beta_{q0} \sum_{pi} \beta_{pi}E[X_{pi}]\\ &\ \ \ + \sum_{pi} \sum_{qj} \beta_{pi} \beta_{qj} E[X_{pi}]E[X_{qj}] + \sum_{pi} \beta_{pi} E[X_{pi}]E[E_q] + \sum_{qj} \beta_{qj} E[X_{qj}]E[E_p] + E[E_p]E[E_q] \end{split} \tag{27} \end{align}\] Combining (19), (26) and (27) and using model assumption 2 gives:

\[\begin{equation} E[Y_pY_q] - E[Y_p]E[Y_q] = \sum_{pi} \sum_{qj} \beta_{pi} \beta_{qj} \sigma_{X_{pi}} \sigma_{X_{qj}} \rho_{X_{pi}X_{qj}} + \sigma_p \sigma_q \rho_{E_pE_q}. \tag{28} \end{equation}\] Finally, the correlation \(\rho_{Y_pY_q}\) from (25) can be reexpressed as:

\[\begin{align} \begin{split} \rho_{Y_pY_q} &= \frac{\sum_{pi} \sum_{qj} \beta_{pi} \beta_{qj} \sigma_{X_{pi}} \sigma_{X_{qj}} \rho_{X_{pi}X_{qj}} + \sigma_p \sigma_q \rho_{E_pE_q}}{\sqrt{\sigma_p^2 + \sum_{pi} \beta_{pi}^2 \sigma_{X_{pi}}^2 + 2\sum_{pi \neq pj} \beta_{pi}\beta_{pj}\sigma_{X_{pi}}\sigma_{X_{pj}}\rho_{X_{pi}X_{pj}}}}\\ &\ \ \ * \frac{1}{\sqrt{\sigma_q^2 + \sum_{qi} \beta_{qi}^2 \sigma_{X_{qi}}^2 + 2\sum_{qi \neq qj} \beta_{qi}\beta_{qj}\sigma_{X_{qi}}\sigma_{X_{qj}}\rho_{X_{qi}X_{qj}}}}. \end{split} \tag{29} \end{align}\]

This correlation is found in **SimRepeat** by the function `calc_corr_y`

. The required inputs \(\beta_{pi}\), \(\beta_{qj}\), \(\rho_{X_{pi}X_{qj}}\), \(\rho_{X_{pi}X_{pj}}\), \(\rho_{X_{qi}X_{qj}}\), \(\rho_{E_pE_q}\), \(\sigma_{X_{pi}}^2\), and \(\sigma_{X_{qj}}^2\) are specified via the function parameters `betas`

, `corr.x`

, `corr.e`

, and `vars`

.

\[\begin{align} \begin{split} E[Y_pE_q] &= E[(\beta_{p0} + \beta_{p1}X_{p1} + ... + \beta_{pi}X_{pi} + ... + \beta_{pj}X_{pj} + ... + \beta_{pK}X_{pK} + E_{p}) E_q]\\ &= \beta_{p0}E[E_q] + \sum_{pi} \beta_{pi}E[X_{pi}E_q] + E[E_pE_q] \end{split} \tag{31} \end{align}\] and

\[\begin{align} \begin{split} E[Y_p]E[E_q] &= E[\beta_{p0} + \beta_{p1}X_{p1} + ... + \beta_{pi}X_{pi} + ... + \beta_{pj}X_{pj} + ... + \beta_{pK}X_{pK} + E_{p}]E[E_q]\\ &= \beta_{p0}E[E_q] + \sum_{pi} \beta_{pi}E[X_{pi}]E[E_q] + E[E_p]E[E_q] \end{split} \tag{32} \end{align}\] Combining (31) and (32) and using model assumption 2 gives:

\[\begin{equation} E[Y_pE_q] - E[Y_p]E[E_q] = \sigma_p \sigma_q \rho_{E_pE_q}. \tag{33} \end{equation}\] Finally, the correlation \(\rho_{Y_pE_q}\) from (30) can be reexpressed as:

\[\begin{align} \begin{split} \rho_{Y_pE_q} &= \frac{\sigma_p \sigma_q \rho_{E_pE_q}}{\sigma_q \sqrt{\sigma_p^2 + \sum_{pi} \beta_{pi}^2 \sigma_{X_{pi}}^2 + 2\sum_{pi \neq pj} \beta_{pi}\beta_{pj}\sigma_{X_{pi}}\sigma_{X_{pj}}\rho_{X_{pi}X_{pj}}}}\\ &= \frac{\sigma_p \rho_{E_pE_q}}{\sqrt{\sigma_p^2 + \sum_{pi} \beta_{pi}^2 \sigma_{X_{pi}}^2 + 2\sum_{pi \neq pj} \beta_{pi}\beta_{pj}\sigma_{X_{pi}}\sigma_{X_{pj}}\rho_{X_{pi}X_{pj}}}}. \end{split} \tag{34} \end{align}\]

This correlation is found in **SimRepeat** by the function `calc_corr_ye`

. The required inputs \(\beta_{pi}\), \(\rho_{X_{pi}X_{pj}}\), \(\rho_{E_pE_q}\), \(\sigma_{X_{pi}}^2\), and \(\sigma_p^2\) are specified via the function parameters `betas`

, `corr.x`

, `corr.e`

, and `vars`

.

The equations derived above have been extended in **SimRepeat** to permit independent variables or error terms with mixture distributions. If there are mixture distributions in the system, then the additional required parameters are `mix_pis`

, `mix_mus`

, `mix_sigmas`

, and `error_type`

(which defaults to “non_mix”). The vignette **Expected Cumulants and Correlations for Continuous Mixture Variables** from **SimCorrMix** derives the equations for the mean, variance, skew, standardized kurtosis (*skurtosis*), and standardized fifth and sixth cumulants for a continuous mixture distribution. Equations are also given which approximate: 1) correlations between two mixture variables and 2) correlations between a mixture variable and a non-mixture variable based on the components’ distributions and correlations with components (Fialkowski 2018). These equations are summarized below (Davenport, Bezder, and Hathaway 1988; Shork, Allison, and Thiel 1996; Everitt 1996; Pearson 2011).

Assume \(M1\) and \(M2\) are two continuous mixture variables. Let \(M1\) have \(k_{M1}\) components with mixing probabilities \(\alpha_1, ..., \alpha_{k_{M1}}\). The standard deviations of the components are \(\sigma_{M1_1}, \sigma_{M1_2}, ..., \sigma_{M1_{k_{M1}}}\). Let \(M2\) have \(k_{M2}\) components with mixing probabilities \(\gamma_1, ..., \gamma_{k_{M1}}\). The standard deviations of the components are \(\sigma_{M2_1}, \sigma_{M2_2}, ..., \sigma_{M2_{k_{M2}}}\).

\[\begin{equation} \rho_{M1 M2} = \frac{E[M1M2] - E[M1]E[M2]}{\sigma_{M1} \sigma_{M2}}. \tag{35} \end{equation}\]

Equation (35) requires the expected value of the product of \(M1\) and \(M2\). Since there is no general method for determining this expected value, it is approximated by expressing \(M1\) and \(M2\) as sums of their component variables:

\[\begin{equation} \begin{split} E[(\sum_{i = 1}^{k_{M1}} \alpha_i M1_i) (\sum_{j = 1}^{k_{M2}} \gamma_j M2_j)] &= E[\alpha_1 M1_1 \gamma_1 M2_1 + \alpha_1 M1_1 \gamma_2 M2_2 + ... + \alpha_{k_{M1}} M1_{k_{M1}} \gamma_{k_{M2}} M2_{k_{M2}}] \\ &= \alpha_1 \gamma_1 E[M1_1 M2_1] + \alpha_1 \gamma_2 E[M1_1 M2_2] + ... + \alpha_{k_{M1}} \gamma_{k_{M2}} E[M1_{k_{M1}} M2_{k_{M2}}]. \end{split} \tag{36} \end{equation}\] The correlation is approximated as:\[\begin{equation} \rho_{M1 M2} = \frac{\sum_{i = 1}^{k_{M1}} \alpha_i \sigma_{M1_i} \sum_{j = 1}^{k_{M2}} \gamma_j \sigma_{M2_j} \rho_{M1_i M2_j}}{\sigma_{M1} \sigma_{M2}}, \tag{37} \end{equation}\]

where \(\rho_{M1_i M2_j}\) is the correlation between component \(i\) of \(M1\) and component \(j\) of \(M2\). The function `SimCorrMix::rho_M1M2`

returns this value.

\[\begin{equation} \rho_{M1 Y} = \frac{E[M1Y] - E[M1]E[Y]}{\sigma_{M1} \sigma_Y}. \tag{38} \end{equation}\]

Equation (38) requires the expected value of the product of \(M1\) and \(Y\). Since there is no general method for determining this expected value, it is again approximated by expressing \(M1\) as a sum of its component variables:

\[\begin{equation} \begin{split} E[(\sum_{i = 1}^{k_{M1}} \alpha_i M1_i) Y] &= E[\alpha_1 M1_1 Y + \alpha_2 M1_2 Y + ... + \alpha_{k_{M1}} M1_{k_{M1}} Y] \\ &= \alpha_1 E[M1_1 Y] + \alpha_2 E[M1_2 Y] + ... + \alpha_{k_{M1}} E[M1_{k_{M1}} Y]. \end{split} \tag{39} \end{equation}\]The correlation is approximated as:

\[\begin{equation} \rho_{M1 Y} = \frac{\sum_{i = 1}^{k_{M1}} \alpha_i \sigma_{M1_i} \rho_{M1_i Y}}{\sigma_{M1}}, \tag{40} \end{equation}\]where \(\rho_{M1_i Y}\) is the correlation between component \(i\) of \(M1\) and \(Y\). The function `SimCorrMix::rho_M1Y`

returns this value.

For outcome \(Y_p\), (15) requires the inputs \(\rho_{Y_pX_{pi}}\), \(\rho_{X_{pi}X_{pj}}\), and \(\sigma_{X_{pi}}^2\). These are specified by the parameters `corr.yx`

, `corr.x`

, and `vars`

, which are input as lists to the function `calc_betas`

. If all independent variables have non-mixture distributions, then the matrices `corr.yx[[p]]`

and `corr.x[[p]][[p]]`

will both have \(K\) columns.

\[\begin{equation} Y_{p} = \beta_{p0} + \beta_{p1}X_{p1} + \beta_{pi}X_{p2} + \beta_{p3}X_{p3} + + \beta_{p4}X_{p4} + E_{p}, \tag{41} \end{equation}\]

where \(X_{p1}\) and \(X_{p2}\) both have non-mixture distributions, while \(X_{p3}\) and \(X_{p4}\) both have mixture distributions. The PDF’s of \(X_{p3}\) and \(X_{p4}\) are given by:

\[\begin{equation} \begin{split} f_{X_{p3}}(x_{p3}) &= \alpha_1 f_{X_{p3_1}}(x_{p3_1}) + \alpha_2 f_{X_{p3_2}}(x_{p3_2}) \\ f_{X_{p4}}(x_{p4}) &= \gamma_1 f_{X_{p4_1}}(x_{p4_1}) + \gamma_2 f_{X_{p4_2}}(x_{p4_2}) + \gamma_2 f_{X_{p4_3}}(x_{p4_3}) \end{split} \tag{42} \end{equation}\]so that \(X_{p3}\) has 2 components and \(X_{p4}\) has 3 components. Since simulation is done at the component-level in `nonnormsys`

, the correlations between independent variables within an outcome and across outcomes is specified in terms of correlations between non-mixture and components of mixture variables. Therefore, \(K\) is 7 and `corr.x[[p]][[p]]`

should have dimension \(7 \times 7\). The user may specify the correlations `corr.yx`

either by: 1) correlations between outcomes and non-mixture or components of mixture variables or 2) correlations between outcomes and non-mixture or mixture variables.

In the first case, `corr.yx[[p]]`

should have dimension \(1 \times 7\) and the beta coefficients will be calculated in terms of non-mixture and components of mixture variables. Then the components \(X_{p3_1}\), \(X_{p3_2}\), \(X_{p4_1}\), \(X_{p4_2}\), and \(X_{p4_3}\) are treated as “non-mixture” continuous variables and (15) applies, giving 7 beta coefficients.

In the second case, `corr.yx[[p]]`

should have dimension \(1 \times 4\) and the beta coefficients will be calculated in terms of non-mixture and mixture variables. In order to use (15), the correlations in `corr.x[[p]][[p]]`

are reexpressed in terms of correlations between non-mixture and mixture variables using (37) and (40). These equations require the mixing parameters \(\alpha_1, \alpha_2, \gamma_1, \gamma_2\), and \(\gamma_3\) to be specified in `mix_pis`

, the component means to be specified in `mix_mus`

, and the component standard deviations to be specified in `mix_sigmas`

. Additionally, the `error_type`

must be given if the system contains any mixture variables.

For example, the correlation `rho_{X_{p3}X_{p4}}`

is calculated as:

where

\[\begin{equation} \begin{split} \sigma_{X_{p3}}^2 &= \alpha_1 (\sigma_{X_{p3_1}}^2 + \mu_{X_{p3_1}}^2) + \alpha_2 (\sigma_{X_{p3_2}}^2 + \mu_{X_{p3_2}}^2) - [\alpha_1 \mu_{X_{p3_1}} + \alpha_2 \mu_{X_{p3_2}}]^2 \\ \sigma_{X_{p4}}^2 &= \gamma_1 (\sigma_{X_{p4_1}}^2 + \mu_{X_{p4_1}}^2) + \gamma_2 (\sigma_{X_{p4_2}}^2 + \mu_{X_{p4_2}}^2) + \gamma_3 (\sigma_{X_{p4_3}}^2 + \mu_{X_{p4_3}}^2) - [\gamma_1 \mu_{X_{p4_1}} + \gamma_2 \mu_{X_{p4_2}} + \gamma_3 \mu_{X_{p4_3}}]^2. \end{split} \tag{44} \end{equation}\]The correlation `rho_{X_{p1}X_{p3}}`

is calculated as:

The remaining correlations are found similarly. Note that these are *approximations* of the expected correlations with mixture variables, so that accuracy will vary. After these correlations are found, (15) again applies.

For outcome \(Y_q\), (24) requires the inputs \(\beta_{qj}\), \(\rho_{X_{pi}X_{qj}}\), \(\rho_{X_{qi}X_{qj}}\), \(\sigma_{X_{pi}}^2\), and \(\sigma_{X_{qj}}^2\). These are specified via the function parameters `betas`

, `corr.x`

, and `vars`

, which are input as a matrix or lists to the function `calc_corr_yx`

. If all independent variables have non-mixture distributions, then `betas[q, ]`

, `corr.x[[p]][[q]]`

, `corr.x[[q]][[q]]`

will all have the same number of columns.

\[\begin{equation} Y_{q} = \beta_{q0} + \beta_{q1}X_{q1} + \beta_{qi}X_{q2} + \beta_{q3}X_{q3} + + \beta_{q4}X_{q4} + E_{q}, \tag{46} \end{equation}\]

where \(X_{q1}\) and \(X_{q2}\) both have non-mixture distributions, while \(X_{q3}\) and \(X_{q4}\) both have mixture distributions. The PDF’s of \(X_{q3}\) and \(X_{q4}\) are given by:

\[\begin{equation} \begin{split} f_{X_{q3}}(x_{q3}) &= \alpha_1 f_{X_{q3_1}}(x_{q3_1}) + \alpha_2 f_{X_{q3_2}}(x_{q3_2}) \\ f_{X_{q4}}(x_{q4}) &= \gamma_1 f_{X_{q4_1}}(x_{q4_1}) + \gamma_2 f_{X_{q4_2}}(x_{q4_2}) + \gamma_3 f_{X_{q4_3}}(x_{q4_3}) \end{split} \tag{47} \end{equation}\]so that \(X_{q3}\) has 2 components and \(X_{q4}\) has 3 components. The matrices `corr.x[[p]][[q]]`

and `corr.x[[q]][[q]]`

should both have 7 columns. The user may specify the slope coefficients `betas`

either for: 1) non-mixture and components of mixture variables or 2) non-mixture and mixture variables.

In the first case, `betas[q, ]`

should have 7 columns and the correlations between outcomes and independent variables will be calculated in terms of non-mixture and components of mixture variables. Then the components \(X_{q3_1}\), \(X_{q3_2}\), \(X_{q4_1}\), \(X_{q4_2}\), and \(X_{q4_3}\) are treated as “non-mixture” continuous variables and (24) applies.

In the second case, `betas[q, ]`

should have 4 columns and the correlations between outcomes and independent variables will be calculated in terms of non-mixture and mixture variables. In order to use (24), the correlations in `corr.x[[q]][[q]]`

are reexpressed in terms of correlations between non-mixture and mixture variables using (37) and (40). These equations require the mixing parameters \(\alpha_1, \alpha_2, \gamma_1, \gamma_2\), and \(\gamma_3\) to be specified in `mix_pis`

, the component means to be specified in `mix_mus`

, and the component standard deviations to be specified in `mix_sigmas`

. Additionally, the `error_type`

must be given if the system contains any mixture variables. The calculations are similar to the example in the previous section.

For outcomes \(Y_p\) and \(Y_q\), (29) requires the inputs \(\beta_{pi}\), \(\beta_{qj}\), \(\rho_{X_{pi}X_{qj}}\), \(\rho_{X_{pi}X_{pj}}\), \(\rho_{X_{qi}X_{qj}}\), \(\rho_{E_pE_q}\), \(\sigma_{X_{pi}}^2\), and \(\sigma_{X_{qj}}^2\). These are specified via the function parameters `betas`

, `corr.x`

, `corr.e`

, and `vars`

, which are input as a matrix or lists to the function `calc_corr_y`

. If all independent variables have non-mixture distributions, then `betas[p, ]`

, `corr.x[[q]][[p]]`

, `corr.x[[p]][[p]]`

will all have the same number of columns, and `betas[q, ]`

, `corr.x[[p]][[q]]`

, `corr.x[[q]][[q]]`

will all have the same number of columns.

If the equations for \(Y_p\) and/or \(Y_q\) contain mixture independent variables, then the user may specify the slope coefficients `betas`

either for: 1) non-mixture and components of mixture variables or 2) non-mixture and mixture variables. In the first case, the components of the mixture independent variables are treated as “non-mixture” continuous variables and (29) applies. In the second case, the correlations in `corr.x`

are reexpressed in terms of correlations between non-mixture and mixture variables using (37) and (40). These equations require the mixing parameters to be specified in `mix_pis`

, the component means to be specified in `mix_mus`

, and the component standard deviations to be specified in `mix_sigmas`

. The calculations are similar to the previous examples.

If all error terms have mixture distributions (`error_type = "mix"`

), then the correlations among components given in `corr.e`

are reexpressed as correlations among mixture variables using (37). For example, assume the system contains 2 equations with error terms \(E_1\) and \(E_2\), both with mixture distributions. The PDF’s of \(E_1\) and \(E_2\) are given by:

so that \(E_1\) has 2 components and \(E_2\) has 3 components. The correlation `rho_{E_1 E_2}`

is calculated as:

where

\[\begin{equation} \begin{split} \sigma_1^2 &= \alpha_1 (\sigma_{E_{11}}^2 + \mu_{E_{11}}^2) + \alpha_2 (\sigma_{E_{12}}^2 + \mu_{E_{12}}^2) - [\alpha_1 \mu_{E_{11}} + \alpha_2 \mu_{E_{12}}]^2 \\ \sigma_2^2 &= \gamma_1 (\sigma_{E_{21}}^2 + \mu_{E_{21}}^2) + \gamma_2 (\sigma_{E_{22}}^2 + \mu_{E_{22}}^2) + \gamma_3 (\sigma_{E_{23}}^2 + \mu_{E_{23}}^2) - [\gamma_1 \mu_{E_{21}} + \gamma_2 \mu_{E_{22}} + \gamma_3 \mu_{E_{23}}]^2. \end{split} \tag{50} \end{equation}\]The parameters for the component distributions are given as the last entries in `mix_pis`

, `mix_mus`

, and `mix_sigmas`

. Again, note that the correlation computed in (49) is an *approximation* of the actual correlation.

For outcomes \(Y_p\) and \(Y_q\), (34) requires the inputs \(\beta_{pi}\), \(\rho_{X_{pi}X_{pj}}\), \(\rho_{E_pE_q}\), \(\sigma_{X_{pi}}^2\), and \(\sigma_p^2\). These are specified via the function parameters `betas`

, `corr.x`

, `corr.e`

, and `vars`

, which are input as a matrix or lists to the function `calc_corr_ye`

. If all independent variables have non-mixture distributions, then `betas[p, ]`

and `corr.x[[p]][[p]]`

will have the same number of columns.

If the equations for \(Y_p\) and/or \(Y_q\) contain mixture independent variables, then the user may specify the slope coefficients `betas`

either for: 1) non-mixture and components of mixture variables or 2) non-mixture and mixture variables. In the first case, the components of the mixture independent variables are treated as “non-mixture” continuous variables and (34) applies. In the second case, the correlations in `corr.x`

are reexpressed in terms of correlations between non-mixture and mixture variables using (37) and (40). These equations require the mixing parameters to be specified in `mix_pis`

, the component means to be specified in `mix_mus`

, and the component standard deviations to be specified in `mix_sigmas`

. The calculations are similar to the previous examples.

If all error terms have mixture distributions (`error_type = "mix"`

), then the correlations among components given in `corr.e`

are reexpressed as correlations among mixture variables using (37). See the examples in the previous sections.

`nonnormsys`

The function `nonnormsys`

returns the `betas`

used in the simulation. Summaries for the outcomes, independent variables, and error terms can be found using `summary_sys`

. This function also calculates the simulated correlations among outcomes (returned as `rho.y`

), outcomes and independent variables (returned as `rho.yx`

for non-mixture and components of mixture variables and `rho.yxall`

for non-mixture and mixture variables), and outcomes and error terms (returned as `rho.ye`

for non-mixture or components of mixture error terms and/or `rho.yemix`

for mixture error terms). These can be used to compare simulated values to expected values calculated from the above equations.

Davenport, JW, JC Bezder, and RJ Hathaway. 1988. “Parameter Estimation for Finite Mixture Distributions.” *Computers & Mathematics with Applications* 15 (10): 819–28.

Everitt, B S. 1996. “An Introduction to Finite Mixture Distributions.” *Statistical Methods in Medical Research* 5 (2): 107–27. https://doi.org/10.1177/096228029600500202.

Fialkowski, A C. 2017. *SimMultiCorrData: Simulation of Correlated Data with Multiple Variable Types*. https://CRAN.R-project.org/package=SimMultiCorrData.

———. 2018. *SimCorrMix: Simulation of Correlated Data with Multiple Variable Types Including Continuous and Count Mixture Distributions*. https://CRAN.R-project.org/package=SimCorrMix.

Fleishman, A I. 1978. “A Method for Simulating Non-Normal Distributions.” *Psychometrika* 43: 521–32. https://doi.org/10.1007/BF02293811.

Headrick, T C. 2002. “Fast Fifth-Order Polynomial Transforms for Generating Univariate and Multivariate Non-Normal Distributions.” *Computational Statistics and Data Analysis* 40 (4): 685–711. https://doi.org/10.1016/S0167-9473(02)00072-5.

Headrick, T C, and T M Beasley. 2004. “A Method for Simulating Correlated Non-Normal Systems of Statistical Equations.” *Communications in Statistics - Simulation and Computation* 33 (1): 19–33. http://dx.doi.org/10.1081/SAC-120028431.

Headrick, T C, and S S Sawilowsky. 1999. “Simulating Correlated Non-Normal Distributions: Extending the Fleishman Power Method.” *Psychometrika* 64: 25–35. https://doi.org/10.1007/BF02294317.

Pearson, RK. 2011. “Exploring Data in Engineering, the Sciences, and Medicine.” In. New York: Oxford University Press.

Shork, N J, D B Allison, and B Thiel. 1996. “Mixture Distributions in Human Genetics Research.” *Statistical Methods in Medical Research* 5: 155–78. https://doi.org/10.1177/096228029600500204.