Repeated measures may be thought of as a system of \(M\) equations. The dependent variables \(Y_{p}\), for \(p = 1,\ ...,\ M\), may represent the same variable collected at \(M\) time points or under \(M\) different conditions (i.e., different treatment drugs). The \(K\) independent variables \(X_{pi}\), for \(p = 1,\ ...,\ M\) and \(i = 1,\ ...,\ K\), may be non-varying covariates (static, i.e., gender or height), time-varying covariates (i.e., age or blood sugar), or a combination of the two. Although Fleishman (1978)’s third and Headrick (2002)’s fifth-order polynomial transformation methods (power method transformations, PMT’s) allow the simulation of correlated non-normal variables for a single equation or system of independent equations, they have certain limitations when producing a system of correlated equations. For example, there is no way to control the distributions of and correlations between the error terms. Headrick and Beasley (2004) derived a general, computationally-efficient procedure for simulating correlated non-normal systems of statistical equations that permits control of correlated non-normal: a) error (stochastic disturbance) terms, b) independent variables, and c) dependent and independent variables. Such a procedure is important in order to investigate the properties of analytical models and test statistics that rely on certain distributional assumptions. For example, sufficient sample size, multivariate normality, independent errors, sphericity, and balanced data.

Headrick and Beasley (2004)’s technique allows the user to specify the distributions of the independent variables \(X_{(pj)}\), for \(p = 1, ..., M\) equations, and stochastic disturbance (error) terms \(E\). The user also controls the correlations between: 1) independent variables \(X_{(pj)}\) for a given outcome \(Y_p\), 2) an outcome \(Y_p\) and its \(X_{(pj)}\) terms, and 3) stochastic disturbance (error) terms \(E\). The technique calculates the beta (slope) coefficients based on correlations between independent variables \(X_{(pj)}\) for a given outcome \(Y_p\), the correlations between that outcome \(Y_p\) and the \(X_{(pj)}\) terms, and the variances. Headrick and Beasley (2004) also derived equations to calculate the expected correlations based on these betas between: 1) \(X_{(pj)}\) terms and outcome \(Y_q\), 2) outcomes \(Y_p\) and \(Y_q\), and 3) outcome \(Y_p\) and error term \(E_q\), for \(p\) not equal to \(q\). All continuous variables are generated from intermediate random normal variables using Headrick (2002)’s fifth-order power transformation method (PMT). The intermediate correlations required to generate the target correlations after variable transformations are calculated using Headrick (2002)’s equation.

Description of the model

Define a system of \(M\) equations as follows:

\[\begin{equation} \begin{split} Y_{1} &= \beta_{10} + \beta_{11}X_{11} + ... + \beta_{1i}X_{1i} + ... + \beta_{1j}X_{1j} + ... + \beta_{1K}X_{1K} + \sigma_{1}E_{1} \\ &... \\ Y_{p} &= \beta_{p0} + \beta_{p1}X_{p1} + ... + \beta_{pi}X_{pi} + ... + \beta_{pj}X_{pj} + ... + \beta_{pK}X_{pK} + \sigma_{p}E_{p} \\ &... \\ Y_{q} &= \beta_{q0} + \beta_{q1}X_{q1} + ... + \beta_{qi}X_{qi} + ... + \beta_{qj}X_{qj} + ... + \beta_{qK}X_{qK} + \sigma_{q}E_{q} \\ &... \\ Y_{M} &= \beta_{M0} + \beta_{M1}X_{M1} + ... + \beta_{Mi}X_{Mi} + ... + \beta_{Mj}X_{Mj} + ... + \beta_{MK}X_{MK} + \sigma_{M}E_{M}, \end{split} \tag{1} \end{equation}\]

for \(1 \leq p < q \leq M\) and \(1 \leq i < j \leq K\). Each dependent variable \(Y\), independent variable \(X\), and error term \(E\) has length \(n\) (the sample size). Assume the error terms have expected values of \(0\) and variances of \(1\). They can have any combination of skew (\(\gamma_{1}\)), standardized kurtosis (\(\gamma_{2}\)), and standardized fifth (\(\gamma_{3}\)) and sixth (\(\gamma_{4}\)) cumulants permitted under the desired PMT’s constraints. The \(\sigma\) values are real positive scalars that may vary across equations.

Alternatively, the last terms may be rexpressed where the error terms are random variables with standard deviations described by the \(\sigma\) values (i.e., using \(E_p\) instead of \(\sigma_{p}E_{p}\)). This notation follows the technique used in the nonnormsys function, which simulates the correlated system of statistical equations containing continuous variables within the SimRepeat package. In addition, the \(M\) equations do not have to include the same number of independent variables, and a variable may be designated as being the same for all equations (in input same.var). The \(M\) dependent variables are created by generating the right-hand sides of (1).

Important assumptions for this system:

  1. There are at least 2 equations and a total of at least 1 independent variable.
  2. The independent variables are uncorrelated with the error terms within the same outcome and across outcomes.
  3. Each equation has an error term.
  4. All error terms have either a non-mixture or mixture distribution.

Without loss of generality, consider the variable pair \(X_{pi}\) and \(X_{pj}\), for \(p = 1,\ ...,\ M\), \(i = 1,\ ...,\ K\), \(j = 1,\ ...,\ K\), and \(i \neq j\), with expected values of \(\mu_{X_{pi}}\) and \(\mu_{X_{pj}}\) and variances of \(\sigma_{X_{pi}}^2\) and \(\sigma_{X_{pj}}^2\). The variables \(X_{pi}\) and \(X_{pj}\) have distributions specified by skews \(\gamma_{1i}\) and \(\gamma_{1j}\), standardized kurtoses \(\gamma_{2i}\) and \(\gamma_{2j}\), and standardized fifth and sixth cumulants \(\gamma_{3i}\), \(\gamma_{3j}\), \(\gamma_{4i}\), and \(\gamma_{4j}\). \(X_{pi}\) and \(X_{pj}\) are generated from location-scale transformations applied to \(X_{pi}'\) and \(X_{pj}'\), which come from Headrick (2002)’s fifth-order PMT given by:

\[\begin{equation} \begin{split} X_{pi}' &= c_{0pi} + c_{1pi}Z_{pi} + c_{2pi}Z_{pi}^2 + c_{3pi}Z_{pi}^3 + c_{4pi}Z_{pi}^4 + c_{5pi}Z_{pi}^5 \\ X_{pj}' &= c_{0pj} + c_{1pj}Z_{pj} + c_{2pj}Z_{pj}^2 + c_{3pj}Z_{pj}^3 + c_{4pj}Z_{pj}^4 + c_{5pj}Z_{pj}^5, \end{split} \tag{2} \end{equation}\]

where \(Z_{pi}\) and \(Z_{pj}\) are both \(N(0,1)\) variables. The constants \(c_{0pi},\ ...,\ c_{5pi}\) are determined by solving Headrick (2002)’s system of equations (contained in SimMultiCorrData::poly) using \(\gamma_{1i},\ \gamma_{2i},\ \gamma_{3i},\) and \(\gamma_{4i}\) and likewise for \(c_{0pj},\ ...,\ c_{5pj}\) (Fialkowski 2017). The results also hold for Fleishman (1978)’s method, with \(c_{4pi} = c_{5pi} = c_{4pj} = c_{5pj} = 0\) and the system of equations contained in SimMultiCorrData::fleish. Then the intermediate correlation \(\rho_{ZpiZpj}\) between \(Z_{pi}\) and \(Z_{pj}\) required to achieve a target correlation \(\rho_{XpiXpj}\) between \(X_{pi}\) and \(X_{pj}\) is given by Headrick (2002)’s Equation 26 (p. 694), or Headrick and Sawilowsky (1999)’s Equation 7b (p. 28) for Fleishman’s method. The intermediate correlations are calculated by the function SimCorrMix::intercorr_cont. See the SimCorrMix package for a vignette (Fialkowski 2018). The purpose of correlating \(Z_{pi}\) and \(Z_{pj}\) at an intermediate level is to adjust a priori for the non-normalization effect of the constants so that \(X_{pi}\) and \(X_{pj}\) have the desired final correlation (Headrick and Beasley 2004).

The equations given by Headrick and Beasley (2004) to determine the beta coefficients and theoretical correlations between outcomes \(Y\), outcomes and independent variables, and outcomes and error terms are derived below (with some modifications and corrections). These are contained in the SimRepeat functions calc_betas, calc_corr_y, calc_corr_ye, and calc_corr_yx.

System of equations to determine the beta (slope) coefficients

Consider the following equation describing outcome \(Y_p\), for \(p = 1,\ ...,\ M\):
\[\begin{equation} Y_{p} = \beta_{p0} + \beta_{p1}X_{p1} + ... + \beta_{pi}X_{pi} + ... + \beta_{pj}X_{pj} + ... + \beta_{pK}X_{pK} + E_{p}. \tag{3} \end{equation}\] In (3), the error term \(E_p\) has variance \(\sigma_p^2 = E[E_p^2] - E[E_p]^2\). Obtaining the values for \(\beta_{p1},\ ...,\ \beta_{pi},\ ...,\ \beta_{pK}\) requires solving a system of \(K\) equations. The correlation between \(Y_p\) and \(X_{pi}\) is given by:
\[\begin{equation} \rho_{Y_pX_{pi}} = \frac{E[Y_pX_{pi}] - E[Y_p]E[X_{pi}]}{\sigma_{Y_p} \sigma_{X_{pi}}}, \tag{4} \end{equation}\] where \[\begin{equation} \sigma_{Y_p}^2 = E[Y_p^2] - E[Y_p]^2. \tag{5} \end{equation}\] The correlation between two independent variables \(X_{pi}\) and \(X_{pj}\) is given by:
\[\begin{equation} \rho_{X_{pi}X_{pj}} = \frac{E[X_{pi}X_{pj}] - E[X_{pi}]E[X_{pj}]}{\sigma_{X_{pi}} \sigma_{X_{pj}}}, \tag{6} \end{equation}\] where \[\begin{equation} \sigma_{X_{pi}}^2 = E[X_{pi}^2] - E[X_{pi}]^2\ \mathrm{and}\ \sigma_{X_{pj}}^2 = E[X_{pj}^2] - E[X_{pj}]^2. \tag{7} \end{equation}\] Using (3),
\[\begin{align} E[Y_{p}] &= \beta_{p0} + \beta_{p1}E[X_{p1}] + ... + \beta_{pi}E[X_{pi}] + ... + \beta_{pj}E[X_{pj}] + ... + \beta_{pK}E[X_{pK}] + E[E_{p}] \tag{8} \\ E[Y_{p}]^2 &= \beta_{p0}^2 + 2\beta_{p0} \sum_{pi} \beta_{pi}E[X_{pi}] + 2\beta_{p0}E[E_{p}] + \sum_{pi} \beta_{pi}^2 E[X_{pi}]^2 \notag \\ &\ \ \ + 2\sum_{pi \neq pj} \beta_{pi}\beta_{pj}E[X_{pi}]E[X_{pj}] + 2\sum_{pi} \beta_{pi}E[X_{pi}]E[E_p] + E[E_p]^2 \tag{9} \end{align}\] \[\begin{align} \begin{split} E[Y_{p}^2] &= E[(\beta_{p0} + \beta_{p1}X_{p1} + ... + \beta_{pi}X_{pi} + ... + \beta_{pj}X_{pj} + ... + \beta_{pK}X_{pK} + E_{p})^2]\\ &= E[\beta_{p0}^2 + \beta_{p0}(\beta_{p1}X_{p1} + ... + \beta_{pi}X_{pi} + ... + \beta_{pj}X_{pj} + ... + \beta_{pK}X_{pK}) + \beta_{p0}E_{p}\\ &\ \ \ + \beta_{p0}\beta_{p1}X_{p1} + \beta_{p1}^2 X_{p1}^2 + \beta_{p1}X_{p1}(\beta_{p2}X_{p2} + ... + \beta_{pi}X_{pi} + ... + \beta_{pj}X_{pj} + ... + \beta_{pK}X_{pK}) + \beta_{p1}X_{p1}E_{p}\\ &\ \ \ \ + ...\\ &\ \ \ + \beta_{p0}\beta_{pK}X_{pK} + \beta_{pK}X_{pK}(\beta_{p1}X_{p1} + ... + \beta_{pi}X_{pi} + ... + \beta_{pj}X_{pj} + ... + \beta_{p(K - 1)}X_{p(K - 1)}) + \beta_{pK}^2 X_{pK}^2\\ &\ \ \ + \beta_{pK}X_{pK}E_{p} + \beta_{p0}E_{p} + E_p(\beta_{p1}X_{p1} + ... + \beta_{pi}X_{pi} + ... + \beta_{pj}X_{pj} + ... + \beta_{pK}X_{pK}) + E_p^2]\\ &= \beta_{p0}^2 + 2\beta_{p0} \sum_{pi} \beta_{pi}E[X_{pi}] + 2\beta_{p0}E[E_{p}] + E[E_p^2] + \sum_{pi} \beta_{pi}^2 E[X_{pi}^2] + 2\sum_{pi \neq pj} \beta_{pi}\beta_{pj}E[X_{pi}X_{pj}]\\ &\ \ \ + 2\sum_{pi} \beta_{pi}E[X_{pi}E_p]. \end{split} \tag{10} \end{align}\] From (9) and (10), the variance of \(Y\) is given by:
\[\begin{align} \begin{split} \sigma_{Y_p}^2 &= E[E_p^2] - E[E_p]^2 + \sum_{pi} \beta_{pi}^2 E[X_{pi}^2] - \sum_{pi} \beta_{pi}^2 E[X_{pi}]^2 + 2\sum_{pi \neq pj} \beta_{pi}\beta_{pj}E[X_{pi}X_{pj}] \\ &\ \ \ - 2\sum_{pi \neq pj} \beta_{pi}\beta_{pj}E[X_{pi}]E[X_{pj}] + 2\sum_{pi} \beta_{pi}E[X_{pi}E_p] - 2\sum_{pi} \beta_{pi}E[X_{pi}]E[E_p]. \end{split} \tag{11} \end{align}\] Using model assumption 2 that the independent variables are uncorrelated with the error terms, \(E[X_{pi}E_p] = E[X_{pi}]E[E_p]\). Using (6), the variance of \(Y\) is:
\[\begin{equation} \sigma_{Y_p}^2 = \sigma_p^2 + \sum_{pi} \beta_{pi}^2 \sigma_{X_{pi}}^2 + 2\sum_{pi \neq pj} \beta_{pi}\beta_{pj}\sigma_{X_{pi}}\sigma_{X_{pj}}\rho_{X_{pi}X_{pj}}. \tag{12} \end{equation}\] Then,
\[\begin{align} \begin{split} E[Y_pX_{pi}] &= E[(\beta_{p0} + \beta_{p1}X_{p1} + ... + \beta_{pi}X_{pi} + ... + \beta_{pj}X_{pj} + ... + \beta_{pK}X_{pK} + E_{p}) X_{pi}]\\ &= \beta_{p0}E[X_{pi}] + \sum_{pj \neq pi} \beta_{pj}E[X_{pi}X_{pj}] + \beta_{pi}E[X_{pi}^2] + E[X_{pi}E_p] \end{split} \tag{13} \end{align}\] and
\[\begin{align} \begin{split} E[Y_p]E[X_{pi}] &= E[\beta_{p0} + \beta_{p1}X_{p1} + ... + \beta_{pi}X_{pi} + ... + \beta_{pj}X_{pj} + ... + \beta_{pK}X_{pK} + E_{p}]E[X_{pi}]\\ &= \beta_{p0}E[X_{pi}] + \sum_{pj \neq pi} \beta_{pj}E[X_{pi}]E[X_{pj}] + \beta_{pi}E[X_{pi}]^2 + E[X_{pi}]E[E_p]. \end{split} \tag{14} \end{align}\] Combining (6), (7), (13) and (14) and using model assumption 2 gives:
\[\begin{equation} E[Y_pX_{pi}] - E[Y_p]E[X_{pi}] = \sigma_{X_{pi}} \sum_{pj \neq pi} \beta_{pj}\sigma_{X_{pj}}\rho_{X_{pi}X_{pj}} + \beta_{pi}\sigma_{X_{pi}}^2. \tag{14} \end{equation}\] Finally, the correlation \(\rho_{Y_pX_{pi}}\) from (4) can be reexpressed as:
\[\begin{align} \begin{split} \rho_{Y_pX_{pi}} &= \frac{\sigma_{X_{pi}} \sum_{pj \neq pi} \beta_{pj}\sigma_{X_{pj}}\rho_{X_{pi}X_{pj}} + \beta_{pi}\sigma_{X_{pi}}^2}{\sigma_{X_{pi}} \sqrt{\sigma_p^2 + \sum_{pi} \beta_{pi}^2 \sigma_{X_{pi}}^2 + 2\sum_{pi \neq pj} \beta_{pi}\beta_{pj}\sigma_{X_{pi}}\sigma_{X_{pj}}\rho_{X_{pi}X_{pj}}}}\\ &= \frac{\sum_{pj \neq pi} \beta_{pj}\sigma_{X_{pj}}\rho_{X_{pi}X_{pj}} + \beta_{pi}\sigma_{X_{pi}}}{\sqrt{\sigma_p^2 + \sum_{pi} \beta_{pi}^2 \sigma_{X_{pi}}^2 + 2\sum_{pi \neq pj} \beta_{pi}\beta_{pj}\sigma_{X_{pi}}\sigma_{X_{pj}}\rho_{X_{pi}X_{pj}}}}. \end{split} \tag{15} \end{align}\]

For outcome \(p\) and \(i = 1,\ ...,\ K\), (15) gives a system of \(K\) equations that are solved to determine the \(K\) beta coefficients that yield the target correlations \(\rho_{Y_pX_{pi}}\). This is done in SimRepeat by the function calc_betas. The required inputs \(\rho_{Y_pX_{pi}}\), \(\rho_{X_{pi}X_{pj}}\), and \(\sigma_{X_{pi}}^2\) are specified via the function parameters corr.yx, corr.x, and vars.

After the beta coefficients are found, theoretical values for the remaining correlations in system (1) can be determined.

Correlation between outcome \(Y_q\) and independent variable \(X_{pi}\)

Consider the following equation describing outcome \(Y_q\), for \(q = 1,\ ...,\ M\):
\[\begin{equation} Y_{q} = \beta_{q0} + \beta_{q1}X_{q1} + ... + \beta_{qi}X_{qi} + ... + \beta_{qj}X_{qj} + ... + \beta_{qL}X_{qL} + E_{q}. \tag{16} \end{equation}\] In (16), the error term \(E_q\) has variance \(\sigma_q^2 = E[E_q^2] - E[E_q]^2\). Let \(X_{pi}\), for \(i = 1,\ ...,\ K\), be one of the independent variables in (3). The correlation between \(Y_q\) and \(X_{pi}\) is given by:
\[\begin{equation} \rho_{Y_qX_{pi}} = \frac{E[Y_qX_{pi}] - E[Y_q]E[X_{pi}]}{\sigma_{Y_q} \sigma_{X_{pi}}}, \tag{17} \end{equation}\] where \[\begin{equation} \sigma_{Y_q}^2 = E[Y_q^2] - E[Y_q]^2. \tag{18} \end{equation}\] The correlation between two independent variables \(X_{pi}\), for \(i = 1,\ ...,\ K\), and \(X_{qj}\), for \(j = 1,\ ...,\ L\), is given by:
\[\begin{equation} \rho_{X_{pi}X_{qj}} = \frac{E[X_{pi}X_{qj}] - E[X_{pi}]E[X_{qj}]}{\sigma_{X_{pi}} \sigma_{X_{qj}}}, \tag{19} \end{equation}\] where \[\begin{equation} \sigma_{X_{pi}}^2 = E[X_{pi}^2] - E[X_{pi}]^2\ \mathrm{and}\ \sigma_{X_{qj}}^2 = E[X_{qj}^2] - E[X_{qj}]^2. \tag{20} \end{equation}\] Then,
\[\begin{align} \begin{split} E[Y_qX_{pi}] &= E[(\beta_{q0} + \beta_{q1}X_{q1} + ... + \beta_{qi}X_{qi} + ... + \beta_{qj}X_{qj} + ... + \beta_{qL}X_{qL} + E_{q}) X_{pi}]\\ &= \beta_{q0}E[X_{pi}] + \sum_{qj} \beta_{qj}E[X_{qj}X_{pi}] + E[X_{pi}E_q] \end{split} \tag{21} \end{align}\] and
\[\begin{align} \begin{split} E[Y_q]E[X_{pi}] &= E[\beta_{q0} + \beta_{q1}X_{q1} + ... + \beta_{qi}X_{qi} + ... + \beta_{qj}X_{qj} + ... + \beta_{qL}X_{qL} + E_{q}]E[X_{pi}]\\ &= \beta_{q0}E[X_{pi}] + \sum_{qj} \beta_{qj}E[X_{qj}]E[X_{pi}] + E[X_{pi}]E[E_q]. \end{split} \tag{22} \end{align}\] Combining (19), (21) and (22) and using model assumption 2 gives:
\[\begin{equation} E[Y_qX_{pi}] - E[Y_q]E[X_{pi}] = \sigma_{X_{pi}} \sum_{qj} \beta_{qj}\sigma_{X_{qj}}\rho_{X_{pi}X_{qj}}. \tag{23} \end{equation}\] Finally, the correlation \(\rho_{Y_qX_{pi}}\) from (17) can be reexpressed as:
\[\begin{align} \begin{split} \rho_{Y_qX_{pi}} &= \frac{\sigma_{X_{pi}} \sum_{qj} \beta_{qj}\sigma_{X_{qj}}\rho_{X_{pi}X_{qj}}}{\sigma_{X_{pi}} \sqrt{\sigma_q^2 + \sum_{qi} \beta_{qi}^2 \sigma_{X_{qi}}^2 + 2\sum_{qi \neq qj} \beta_{qi}\beta_{qj}\sigma_{X_{qi}}\sigma_{X_{qj}}\rho_{X_{qi}X_{qj}}}}\\ &= \frac{\sum_{qj} \beta_{qj}\sigma_{X_{qj}}\rho_{X_{pi}X_{qj}}}{\sqrt{\sigma_q^2 + \sum_{qi} \beta_{qi}^2 \sigma_{X_{qi}}^2 + 2\sum_{qi \neq qj} \beta_{qi}\beta_{qj}\sigma_{X_{qi}}\sigma_{X_{qj}}\rho_{X_{qi}X_{qj}}}}. \end{split} \tag{24} \end{align}\]

This correlation is found in SimRepeat by the function calc_corr_yx. The required inputs \(\beta_{qj}\), \(\rho_{X_{pi}X_{qj}}\), \(\rho_{X_{qi}X_{qj}}\), \(\sigma_{X_{pi}}^2\), and \(\sigma_{X_{qj}}^2\) are specified via the function parameters betas, corr.x, and vars.

Correlation between outcomes \(Y_p\) and \(Y_q\)

Consider the outcomes \(Y_p\) and \(Y_q\) from (3) and (16), for \(1 \leq p < q \leq M\). Their correlation is given by: \[\begin{equation} \rho_{Y_pY_q} = \frac{E[Y_pY_q] - E[Y_p]E[Y_q]}{\sigma_{Y_p} \sigma_{Y_q}}. \tag{25} \end{equation}\] Then
\[\begin{align} \begin{split} E[Y_pY_q] &= E[(\beta_{p0} + \beta_{p1}X_{p1} + ... + \beta_{pi}X_{pi} + ... + \beta_{pj}X_{pj} + ... + \beta_{pK}X_{pK} + E_{p})\\ &\ \ \ * (\beta_{q0} + \beta_{q1}X_{q1} + ... + \beta_{qi}X_{qi} + ... + \beta_{qj}X_{qj} + ... + \beta_{qL}X_{qL} + E_{q})]\\ &= \beta_{p0}\beta_{q0} + \beta_{p0} \sum_{qj} \beta_{qj}E[X_{qj}] + \beta_{p0}E[E_q] + \beta_{q0}E[E_p] + \beta_{q0} \sum_{pi} \beta_{pi}E[X_{pi}]\\ &\ \ \ + \sum_{pi} \sum_{qj} \beta_{pi} \beta_{qj} E[X_{pi}X_{qj}] + \sum_{pi} \beta_{pi} E[X_{pi}E_q] + \sum_{qj} \beta_{qj} E[X_{qj}E_p] + E[E_pE_q] \end{split} \tag{26} \end{align}\] and
\[\begin{align} \begin{split} E[Y_p]E[Y_q] &= E[(\beta_{p0} + \beta_{p1}X_{p1} + ... + \beta_{pi}X_{pi} + ... + \beta_{pj}X_{pj} + ... + \beta_{pK}X_{pK} + E_{p})]\\ &\ \ \ * E[(\beta_{q0} + \beta_{q1}X_{q1} + ... + \beta_{qi}X_{qi} + ... + \beta_{qj}X_{qj} + ... + \beta_{qL}X_{qL} + E_{q})]\\ &= \beta_{p0}\beta_{q0} + \beta_{p0} \sum_{qj} \beta_{qj}E[X_{qj}] + \beta_{p0}E[E_q] + \beta_{q0}E[E_p] + \beta_{q0} \sum_{pi} \beta_{pi}E[X_{pi}]\\ &\ \ \ + \sum_{pi} \sum_{qj} \beta_{pi} \beta_{qj} E[X_{pi}]E[X_{qj}] + \sum_{pi} \beta_{pi} E[X_{pi}]E[E_q] + \sum_{qj} \beta_{qj} E[X_{qj}]E[E_p] + E[E_p]E[E_q] \end{split} \tag{27} \end{align}\] Combining (19), (26) and (27) and using model assumption 2 gives:
\[\begin{equation} E[Y_pY_q] - E[Y_p]E[Y_q] = \sum_{pi} \sum_{qj} \beta_{pi} \beta_{qj} \sigma_{X_{pi}} \sigma_{X_{qj}} \rho_{X_{pi}X_{qj}} + \sigma_p \sigma_q \rho_{E_pE_q}. \tag{28} \end{equation}\] Finally, the correlation \(\rho_{Y_pY_q}\) from (25) can be reexpressed as:
\[\begin{align} \begin{split} \rho_{Y_pY_q} &= \frac{\sum_{pi} \sum_{qj} \beta_{pi} \beta_{qj} \sigma_{X_{pi}} \sigma_{X_{qj}} \rho_{X_{pi}X_{qj}} + \sigma_p \sigma_q \rho_{E_pE_q}}{\sqrt{\sigma_p^2 + \sum_{pi} \beta_{pi}^2 \sigma_{X_{pi}}^2 + 2\sum_{pi \neq pj} \beta_{pi}\beta_{pj}\sigma_{X_{pi}}\sigma_{X_{pj}}\rho_{X_{pi}X_{pj}}}}\\ &\ \ \ * \frac{1}{\sqrt{\sigma_q^2 + \sum_{qi} \beta_{qi}^2 \sigma_{X_{qi}}^2 + 2\sum_{qi \neq qj} \beta_{qi}\beta_{qj}\sigma_{X_{qi}}\sigma_{X_{qj}}\rho_{X_{qi}X_{qj}}}}. \end{split} \tag{29} \end{align}\]

This correlation is found in SimRepeat by the function calc_corr_y. The required inputs \(\beta_{pi}\), \(\beta_{qj}\), \(\rho_{X_{pi}X_{qj}}\), \(\rho_{X_{pi}X_{pj}}\), \(\rho_{X_{qi}X_{qj}}\), \(\rho_{E_pE_q}\), \(\sigma_{X_{pi}}^2\), and \(\sigma_{X_{qj}}^2\) are specified via the function parameters betas, corr.x, corr.e, and vars.

Correlation between outcome \(Y_p\) and error term \(E_q\)

Consider the outcome \(Y_p\) from (3) and the error term \(E_q\) from (16), for \(1 \leq p < q \leq M\). Their correlation is given by: \[\begin{equation} \rho_{Y_pE_q} = \frac{E[Y_pE_q] - E[Y_p]E[E_q]}{\sigma_{Y_p} \sigma_q}. \tag{30} \end{equation}\] Then
\[\begin{align} \begin{split} E[Y_pE_q] &= E[(\beta_{p0} + \beta_{p1}X_{p1} + ... + \beta_{pi}X_{pi} + ... + \beta_{pj}X_{pj} + ... + \beta_{pK}X_{pK} + E_{p}) E_q]\\ &= \beta_{p0}E[E_q] + \sum_{pi} \beta_{pi}E[X_{pi}E_q] + E[E_pE_q] \end{split} \tag{31} \end{align}\] and
\[\begin{align} \begin{split} E[Y_p]E[E_q] &= E[\beta_{p0} + \beta_{p1}X_{p1} + ... + \beta_{pi}X_{pi} + ... + \beta_{pj}X_{pj} + ... + \beta_{pK}X_{pK} + E_{p}]E[E_q]\\ &= \beta_{p0}E[E_q] + \sum_{pi} \beta_{pi}E[X_{pi}]E[E_q] + E[E_p]E[E_q] \end{split} \tag{32} \end{align}\] Combining (31) and (32) and using model assumption 2 gives:
\[\begin{equation} E[Y_pE_q] - E[Y_p]E[E_q] = \sigma_p \sigma_q \rho_{E_pE_q}. \tag{33} \end{equation}\] Finally, the correlation \(\rho_{Y_pE_q}\) from (30) can be reexpressed as:
\[\begin{align} \begin{split} \rho_{Y_pE_q} &= \frac{\sigma_p \sigma_q \rho_{E_pE_q}}{\sigma_q \sqrt{\sigma_p^2 + \sum_{pi} \beta_{pi}^2 \sigma_{X_{pi}}^2 + 2\sum_{pi \neq pj} \beta_{pi}\beta_{pj}\sigma_{X_{pi}}\sigma_{X_{pj}}\rho_{X_{pi}X_{pj}}}}\\ &= \frac{\sigma_p \rho_{E_pE_q}}{\sqrt{\sigma_p^2 + \sum_{pi} \beta_{pi}^2 \sigma_{X_{pi}}^2 + 2\sum_{pi \neq pj} \beta_{pi}\beta_{pj}\sigma_{X_{pi}}\sigma_{X_{pj}}\rho_{X_{pi}X_{pj}}}}. \end{split} \tag{34} \end{align}\]

This correlation is found in SimRepeat by the function calc_corr_ye. The required inputs \(\beta_{pi}\), \(\rho_{X_{pi}X_{pj}}\), \(\rho_{E_pE_q}\), \(\sigma_{X_{pi}}^2\), and \(\sigma_p^2\) are specified via the function parameters betas, corr.x, corr.e, and vars.

Extension to include continuous mixture distributions

The equations derived above have been extended in SimRepeat to permit independent variables or error terms with mixture distributions. If there are mixture distributions in the system, then the additional required parameters are mix_pis, mix_mus, mix_sigmas, and error_type (which defaults to “non_mix”). The vignette Expected Cumulants and Correlations for Continuous Mixture Variables from SimCorrMix derives the equations for the mean, variance, skew, standardized kurtosis (skurtosis), and standardized fifth and sixth cumulants for a continuous mixture distribution. Equations are also given which approximate: 1) correlations between two mixture variables and 2) correlations between a mixture variable and a non-mixture variable based on the components’ distributions and correlations with components (Fialkowski 2018). These equations are summarized below (Davenport, Bezder, and Hathaway 1988; Shork, Allison, and Thiel 1996; Everitt 1996; Pearson 2011).

Approximate Correlations for Continuous Mixture Variables:

Assume \(M1\) and \(M2\) are two continuous mixture variables. Let \(M1\) have \(k_{M1}\) components with mixing probabilities \(\alpha_1, ..., \alpha_{k_{M1}}\). The standard deviations of the components are \(\sigma_{M1_1}, \sigma_{M1_2}, ..., \sigma_{M1_{k_{M1}}}\). Let \(M2\) have \(k_{M2}\) components with mixing probabilities \(\gamma_1, ..., \gamma_{k_{M1}}\). The standard deviations of the components are \(\sigma_{M2_1}, \sigma_{M2_2}, ..., \sigma_{M2_{k_{M2}}}\).

Correlation between continuous mixture variables M1 and M2:

The correlation between the mixture variables \(M1\) and \(M2\) is given by:
\[\begin{equation} \rho_{M1 M2} = \frac{E[M1M2] - E[M1]E[M2]}{\sigma_{M1} \sigma_{M2}}. \tag{35} \end{equation}\]

Equation (35) requires the expected value of the product of \(M1\) and \(M2\). Since there is no general method for determining this expected value, it is approximated by expressing \(M1\) and \(M2\) as sums of their component variables:

\[\begin{equation} \begin{split} E[(\sum_{i = 1}^{k_{M1}} \alpha_i M1_i) (\sum_{j = 1}^{k_{M2}} \gamma_j M2_j)] &= E[\alpha_1 M1_1 \gamma_1 M2_1 + \alpha_1 M1_1 \gamma_2 M2_2 + ... + \alpha_{k_{M1}} M1_{k_{M1}} \gamma_{k_{M2}} M2_{k_{M2}}] \\ &= \alpha_1 \gamma_1 E[M1_1 M2_1] + \alpha_1 \gamma_2 E[M1_1 M2_2] + ... + \alpha_{k_{M1}} \gamma_{k_{M2}} E[M1_{k_{M1}} M2_{k_{M2}}]. \end{split} \tag{36} \end{equation}\] The correlation is approximated as:
\[\begin{equation} \rho_{M1 M2} = \frac{\sum_{i = 1}^{k_{M1}} \alpha_i \sigma_{M1_i} \sum_{j = 1}^{k_{M2}} \gamma_j \sigma_{M2_j} \rho_{M1_i M2_j}}{\sigma_{M1} \sigma_{M2}}, \tag{37} \end{equation}\]

where \(\rho_{M1_i M2_j}\) is the correlation between component \(i\) of \(M1\) and component \(j\) of \(M2\). The function SimCorrMix::rho_M1M2 returns this value.

Correlation between continuous mixture variable M1 and other random variable Y:

Here \(Y\) can be any random variable without a continuous mixture distribution (i.e., continuous non-mixture, ordinal, Poisson, or Negative Binomial). The correlation between the mixture variable \(M1\) and \(Y\) is given by:
\[\begin{equation} \rho_{M1 Y} = \frac{E[M1Y] - E[M1]E[Y]}{\sigma_{M1} \sigma_Y}. \tag{38} \end{equation}\]

Equation (38) requires the expected value of the product of \(M1\) and \(Y\). Since there is no general method for determining this expected value, it is again approximated by expressing \(M1\) as a sum of its component variables:

\[\begin{equation} \begin{split} E[(\sum_{i = 1}^{k_{M1}} \alpha_i M1_i) Y] &= E[\alpha_1 M1_1 Y + \alpha_2 M1_2 Y + ... + \alpha_{k_{M1}} M1_{k_{M1}} Y] \\ &= \alpha_1 E[M1_1 Y] + \alpha_2 E[M1_2 Y] + ... + \alpha_{k_{M1}} E[M1_{k_{M1}} Y]. \end{split} \tag{39} \end{equation}\]

The correlation is approximated as:

\[\begin{equation} \rho_{M1 Y} = \frac{\sum_{i = 1}^{k_{M1}} \alpha_i \sigma_{M1_i} \rho_{M1_i Y}}{\sigma_{M1}}, \tag{40} \end{equation}\]

where \(\rho_{M1_i Y}\) is the correlation between component \(i\) of \(M1\) and \(Y\). The function SimCorrMix::rho_M1Y returns this value.

System of equations to determine the beta (slope) coefficients

For outcome \(Y_p\), (15) requires the inputs \(\rho_{Y_pX_{pi}}\), \(\rho_{X_{pi}X_{pj}}\), and \(\sigma_{X_{pi}}^2\). These are specified by the parameters corr.yx, corr.x, and vars, which are input as lists to the function calc_betas. If all independent variables have non-mixture distributions, then the matrices corr.yx[[p]] and corr.x[[p]][[p]] will both have \(K\) columns.

Let outcome \(Y_p\) be described by the following equation:
\[\begin{equation} Y_{p} = \beta_{p0} + \beta_{p1}X_{p1} + \beta_{pi}X_{p2} + \beta_{p3}X_{p3} + + \beta_{p4}X_{p4} + E_{p}, \tag{41} \end{equation}\]

where \(X_{p1}\) and \(X_{p2}\) both have non-mixture distributions, while \(X_{p3}\) and \(X_{p4}\) both have mixture distributions. The PDF’s of \(X_{p3}\) and \(X_{p4}\) are given by:

\[\begin{equation} \begin{split} f_{X_{p3}}(x_{p3}) &= \alpha_1 f_{X_{p3_1}}(x_{p3_1}) + \alpha_2 f_{X_{p3_2}}(x_{p3_2}) \\ f_{X_{p4}}(x_{p4}) &= \gamma_1 f_{X_{p4_1}}(x_{p4_1}) + \gamma_2 f_{X_{p4_2}}(x_{p4_2}) + \gamma_2 f_{X_{p4_3}}(x_{p4_3}) \end{split} \tag{42} \end{equation}\]

so that \(X_{p3}\) has 2 components and \(X_{p4}\) has 3 components. Since simulation is done at the component-level in nonnormsys, the correlations between independent variables within an outcome and across outcomes is specified in terms of correlations between non-mixture and components of mixture variables. Therefore, \(K\) is 7 and corr.x[[p]][[p]] should have dimension \(7 \times 7\). The user may specify the correlations corr.yx either by: 1) correlations between outcomes and non-mixture or components of mixture variables or 2) correlations between outcomes and non-mixture or mixture variables.

In the first case, corr.yx[[p]] should have dimension \(1 \times 7\) and the beta coefficients will be calculated in terms of non-mixture and components of mixture variables. Then the components \(X_{p3_1}\), \(X_{p3_2}\), \(X_{p4_1}\), \(X_{p4_2}\), and \(X_{p4_3}\) are treated as “non-mixture” continuous variables and (15) applies, giving 7 beta coefficients.

In the second case, corr.yx[[p]] should have dimension \(1 \times 4\) and the beta coefficients will be calculated in terms of non-mixture and mixture variables. In order to use (15), the correlations in corr.x[[p]][[p]] are reexpressed in terms of correlations between non-mixture and mixture variables using (37) and (40). These equations require the mixing parameters \(\alpha_1, \alpha_2, \gamma_1, \gamma_2\), and \(\gamma_3\) to be specified in mix_pis, the component means to be specified in mix_mus, and the component standard deviations to be specified in mix_sigmas. Additionally, the error_type must be given if the system contains any mixture variables.

For example, the correlation rho_{X_{p3}X_{p4}} is calculated as:

\[\begin{equation} \begin{split} \rho_{X_{p3}X_{p4}} &= \frac{\sum_{i = 1}^{2} \alpha_i \sigma_{X_{p3_i}} \sum_{j = 1}^{3} \gamma_j \sigma_{X_{p4_j}} \rho_{X_{p3_i}X_{p4_j}}}{\sigma_{X_{p3}} \sigma_{X_{p4}}} \\ &= \frac{\alpha_1 \sigma_{X_{p3_1}} (\gamma_1 \sigma_{X_{p4_1}} \rho_{X_{p3_1}X_{p4_1}} + \gamma_2 \sigma_{X_{p4_2}} \rho_{X_{p3_1}X_{p4_2}} + \gamma_3 \sigma_{X_{p4_3}} \rho_{X_{p3_1}X_{p4_3}})}{\sigma_{X_{p3}} \sigma_{X_{p4}}} \\ &\ \ \ + \frac{\alpha_2 \sigma_{X_{p3_2}} (\gamma_1 \sigma_{X_{p4_1}} \rho_{X_{p3_2}X_{p4_1}} + \gamma_2 \sigma_{X_{p4_2}} \rho_{X_{p3_2}X_{p4_2}} + \gamma_3 \sigma_{X_{p4_3}} \rho_{X_{p3_2}X_{p4_3}})}{\sigma_{X_{p3}} \sigma_{X_{p4}}}, \end{split} \tag{43} \end{equation}\]

where

\[\begin{equation} \begin{split} \sigma_{X_{p3}}^2 &= \alpha_1 (\sigma_{X_{p3_1}}^2 + \mu_{X_{p3_1}}^2) + \alpha_2 (\sigma_{X_{p3_2}}^2 + \mu_{X_{p3_2}}^2) - [\alpha_1 \mu_{X_{p3_1}} + \alpha_2 \mu_{X_{p3_2}}]^2 \\ \sigma_{X_{p4}}^2 &= \gamma_1 (\sigma_{X_{p4_1}}^2 + \mu_{X_{p4_1}}^2) + \gamma_2 (\sigma_{X_{p4_2}}^2 + \mu_{X_{p4_2}}^2) + \gamma_3 (\sigma_{X_{p4_3}}^2 + \mu_{X_{p4_3}}^2) - [\gamma_1 \mu_{X_{p4_1}} + \gamma_2 \mu_{X_{p4_2}} + \gamma_3 \mu_{X_{p4_3}}]^2. \end{split} \tag{44} \end{equation}\]

The correlation rho_{X_{p1}X_{p3}} is calculated as:

\[\begin{equation} \begin{split} \rho_{X_{p1}X_{p3}} &= \frac{\sum_{i = 1}^{2} \alpha_i \sigma_{X_{p3_i}} \rho_{X_{p3_i}X_{p1}}}{\sigma_{X_{p3}}} \\ &= \frac{\alpha_1 \sigma_{X_{p3_1}} \rho_{X_{p3_1}X_{p1}} + \alpha_2 \sigma_{X_{p3_2}} \rho_{X_{p3_2}X_{p1}}}{\sigma_{X_{p3}}}. \end{split} \tag{45} \end{equation}\]

The remaining correlations are found similarly. Note that these are approximations of the expected correlations with mixture variables, so that accuracy will vary. After these correlations are found, (15) again applies.

Correlation between outcome \(Y_q\) and independent variable \(X_{pi}\)

For outcome \(Y_q\), (24) requires the inputs \(\beta_{qj}\), \(\rho_{X_{pi}X_{qj}}\), \(\rho_{X_{qi}X_{qj}}\), \(\sigma_{X_{pi}}^2\), and \(\sigma_{X_{qj}}^2\). These are specified via the function parameters betas, corr.x, and vars, which are input as a matrix or lists to the function calc_corr_yx. If all independent variables have non-mixture distributions, then betas[q, ], corr.x[[p]][[q]], corr.x[[q]][[q]] will all have the same number of columns.

Similar to the example above, let outcome \(Y_q\) be described by the following equation:
\[\begin{equation} Y_{q} = \beta_{q0} + \beta_{q1}X_{q1} + \beta_{qi}X_{q2} + \beta_{q3}X_{q3} + + \beta_{q4}X_{q4} + E_{q}, \tag{46} \end{equation}\]

where \(X_{q1}\) and \(X_{q2}\) both have non-mixture distributions, while \(X_{q3}\) and \(X_{q4}\) both have mixture distributions. The PDF’s of \(X_{q3}\) and \(X_{q4}\) are given by:

\[\begin{equation} \begin{split} f_{X_{q3}}(x_{q3}) &= \alpha_1 f_{X_{q3_1}}(x_{q3_1}) + \alpha_2 f_{X_{q3_2}}(x_{q3_2}) \\ f_{X_{q4}}(x_{q4}) &= \gamma_1 f_{X_{q4_1}}(x_{q4_1}) + \gamma_2 f_{X_{q4_2}}(x_{q4_2}) + \gamma_3 f_{X_{q4_3}}(x_{q4_3}) \end{split} \tag{47} \end{equation}\]

so that \(X_{q3}\) has 2 components and \(X_{q4}\) has 3 components. The matrices corr.x[[p]][[q]] and corr.x[[q]][[q]] should both have 7 columns. The user may specify the slope coefficients betas either for: 1) non-mixture and components of mixture variables or 2) non-mixture and mixture variables.

In the first case, betas[q, ] should have 7 columns and the correlations between outcomes and independent variables will be calculated in terms of non-mixture and components of mixture variables. Then the components \(X_{q3_1}\), \(X_{q3_2}\), \(X_{q4_1}\), \(X_{q4_2}\), and \(X_{q4_3}\) are treated as “non-mixture” continuous variables and (24) applies.

In the second case, betas[q, ] should have 4 columns and the correlations between outcomes and independent variables will be calculated in terms of non-mixture and mixture variables. In order to use (24), the correlations in corr.x[[q]][[q]] are reexpressed in terms of correlations between non-mixture and mixture variables using (37) and (40). These equations require the mixing parameters \(\alpha_1, \alpha_2, \gamma_1, \gamma_2\), and \(\gamma_3\) to be specified in mix_pis, the component means to be specified in mix_mus, and the component standard deviations to be specified in mix_sigmas. Additionally, the error_type must be given if the system contains any mixture variables. The calculations are similar to the example in the previous section.

Correlation between outcomes \(Y_p\) and \(Y_q\)

For outcomes \(Y_p\) and \(Y_q\), (29) requires the inputs \(\beta_{pi}\), \(\beta_{qj}\), \(\rho_{X_{pi}X_{qj}}\), \(\rho_{X_{pi}X_{pj}}\), \(\rho_{X_{qi}X_{qj}}\), \(\rho_{E_pE_q}\), \(\sigma_{X_{pi}}^2\), and \(\sigma_{X_{qj}}^2\). These are specified via the function parameters betas, corr.x, corr.e, and vars, which are input as a matrix or lists to the function calc_corr_y. If all independent variables have non-mixture distributions, then betas[p, ], corr.x[[q]][[p]], corr.x[[p]][[p]] will all have the same number of columns, and betas[q, ], corr.x[[p]][[q]], corr.x[[q]][[q]] will all have the same number of columns.

If the equations for \(Y_p\) and/or \(Y_q\) contain mixture independent variables, then the user may specify the slope coefficients betas either for: 1) non-mixture and components of mixture variables or 2) non-mixture and mixture variables. In the first case, the components of the mixture independent variables are treated as “non-mixture” continuous variables and (29) applies. In the second case, the correlations in corr.x are reexpressed in terms of correlations between non-mixture and mixture variables using (37) and (40). These equations require the mixing parameters to be specified in mix_pis, the component means to be specified in mix_mus, and the component standard deviations to be specified in mix_sigmas. The calculations are similar to the previous examples.

If all error terms have mixture distributions (error_type = "mix"), then the correlations among components given in corr.e are reexpressed as correlations among mixture variables using (37). For example, assume the system contains 2 equations with error terms \(E_1\) and \(E_2\), both with mixture distributions. The PDF’s of \(E_1\) and \(E_2\) are given by:

\[\begin{equation} \begin{split} f_{E_1}(e_1) &= \alpha_1 f_{E_{11}}(e_{11}) + \alpha_2 f_{E_{12}}(e_{12}) \\ f_{E_2}(E_2) &= \gamma_1 f_{E_{21}}(e_{21}) + \gamma_2 f_{E_{22}}(e_{22}) + \gamma_3 f_{E_{23}}(e_{23}) \end{split} \tag{48} \end{equation}\]

so that \(E_1\) has 2 components and \(E_2\) has 3 components. The correlation rho_{E_1 E_2} is calculated as:

\[\begin{equation} \begin{split} \rho_{E_1 E_2} &= \frac{\sum_{i = 1}^{2} \alpha_i \sigma_{E_{1i}} \sum_{j = 1}^{3} \gamma_j \sigma_{E_{2j}} \rho_{E_{1i}E_{2j}}}{\sigma_{E_1} \sigma_{E_2}} \\ &= \frac{\alpha_1 \sigma_{E_{11}} (\gamma_1 \sigma_{E_{21}} \rho_{E_{11}E_{21}} + \gamma_2 \sigma_{E_{22}} \rho_{E_{11}E_{22}} + \gamma_3 \sigma_{E_{23}} \rho_{E_{11}E_{23}})}{\sigma_1 \sigma_2} \\ &\ \ \ + \frac{\alpha_2 \sigma_{E_{12}} (\gamma_1 \sigma_{E_{21}} \rho_{E_{12}E_{21}} + \gamma_2 \sigma_{E_{22}} \rho_{E_{12}E_{22}} + \gamma_3 \sigma_{E_{23}} \rho_{E_{12}E_{23}})}{\sigma_1 \sigma_2}, \end{split} \tag{49} \end{equation}\]

where

\[\begin{equation} \begin{split} \sigma_1^2 &= \alpha_1 (\sigma_{E_{11}}^2 + \mu_{E_{11}}^2) + \alpha_2 (\sigma_{E_{12}}^2 + \mu_{E_{12}}^2) - [\alpha_1 \mu_{E_{11}} + \alpha_2 \mu_{E_{12}}]^2 \\ \sigma_2^2 &= \gamma_1 (\sigma_{E_{21}}^2 + \mu_{E_{21}}^2) + \gamma_2 (\sigma_{E_{22}}^2 + \mu_{E_{22}}^2) + \gamma_3 (\sigma_{E_{23}}^2 + \mu_{E_{23}}^2) - [\gamma_1 \mu_{E_{21}} + \gamma_2 \mu_{E_{22}} + \gamma_3 \mu_{E_{23}}]^2. \end{split} \tag{50} \end{equation}\]

The parameters for the component distributions are given as the last entries in mix_pis, mix_mus, and mix_sigmas. Again, note that the correlation computed in (49) is an approximation of the actual correlation.

Correlation between outcome \(Y_p\) and error term \(E_q\)

For outcomes \(Y_p\) and \(Y_q\), (34) requires the inputs \(\beta_{pi}\), \(\rho_{X_{pi}X_{pj}}\), \(\rho_{E_pE_q}\), \(\sigma_{X_{pi}}^2\), and \(\sigma_p^2\). These are specified via the function parameters betas, corr.x, corr.e, and vars, which are input as a matrix or lists to the function calc_corr_ye. If all independent variables have non-mixture distributions, then betas[p, ] and corr.x[[p]][[p]] will have the same number of columns.

If the equations for \(Y_p\) and/or \(Y_q\) contain mixture independent variables, then the user may specify the slope coefficients betas either for: 1) non-mixture and components of mixture variables or 2) non-mixture and mixture variables. In the first case, the components of the mixture independent variables are treated as “non-mixture” continuous variables and (34) applies. In the second case, the correlations in corr.x are reexpressed in terms of correlations between non-mixture and mixture variables using (37) and (40). These equations require the mixing parameters to be specified in mix_pis, the component means to be specified in mix_mus, and the component standard deviations to be specified in mix_sigmas. The calculations are similar to the previous examples.

If all error terms have mixture distributions (error_type = "mix"), then the correlations among components given in corr.e are reexpressed as correlations among mixture variables using (37). See the examples in the previous sections.

Simulated correlations from nonnormsys

The function nonnormsys returns the betas used in the simulation. Summaries for the outcomes, independent variables, and error terms can be found using summary_sys. This function also calculates the simulated correlations among outcomes (returned as rho.y), outcomes and independent variables (returned as rho.yx for non-mixture and components of mixture variables and rho.yxall for non-mixture and mixture variables), and outcomes and error terms (returned as rho.ye for non-mixture or components of mixture error terms and/or rho.yemix for mixture error terms). These can be used to compare simulated values to expected values calculated from the above equations.

References

Davenport, JW, JC Bezder, and RJ Hathaway. 1988. “Parameter Estimation for Finite Mixture Distributions.” Computers & Mathematics with Applications 15 (10): 819–28.

Everitt, B S. 1996. “An Introduction to Finite Mixture Distributions.” Statistical Methods in Medical Research 5 (2): 107–27. https://doi.org/10.1177/096228029600500202.

Fialkowski, A C. 2017. SimMultiCorrData: Simulation of Correlated Data with Multiple Variable Types. https://CRAN.R-project.org/package=SimMultiCorrData.

———. 2018. SimCorrMix: Simulation of Correlated Data with Multiple Variable Types Including Continuous and Count Mixture Distributions. https://CRAN.R-project.org/package=SimCorrMix.

Fleishman, A I. 1978. “A Method for Simulating Non-Normal Distributions.” Psychometrika 43: 521–32. https://doi.org/10.1007/BF02293811.

Headrick, T C. 2002. “Fast Fifth-Order Polynomial Transforms for Generating Univariate and Multivariate Non-Normal Distributions.” Computational Statistics and Data Analysis 40 (4): 685–711. https://doi.org/10.1016/S0167-9473(02)00072-5.

Headrick, T C, and T M Beasley. 2004. “A Method for Simulating Correlated Non-Normal Systems of Statistical Equations.” Communications in Statistics - Simulation and Computation 33 (1): 19–33. http://dx.doi.org/10.1081/SAC-120028431.

Headrick, T C, and S S Sawilowsky. 1999. “Simulating Correlated Non-Normal Distributions: Extending the Fleishman Power Method.” Psychometrika 64: 25–35. https://doi.org/10.1007/BF02294317.

Pearson, RK. 2011. “Exploring Data in Engineering, the Sciences, and Medicine.” In. New York: Oxford University Press.

Shork, N J, D B Allison, and B Thiel. 1996. “Mixture Distributions in Human Genetics Research.” Statistical Methods in Medical Research 5: 155–78. https://doi.org/10.1177/096228029600500204.