Systems of statistical equations with correlated variables are common in clinical and genetic studies. These systems may represent repeated measurements or clustered observations. Headrick and Beasley (2004) developed a method for simulating correlated systems of equations containing normal or non-normal continuous variables. The technique allows the user to specify the distributions of the independent variables \(X_{(pj)}\), for \(p = 1, ..., M\) equations, and stochastic disturbance (error) terms \(E\). The user also controls the correlations between: 1) independent variables \(X_{(pj)}\) for a given outcome \(Y_p\), 2) an outcome \(Y_p\) and its \(X_{(pj)}\) terms, and 3) stochastic disturbance (error) terms \(E\). Their technique calculates the beta (slope) coefficients based on correlations between independent variables \(X_{(pj)}\) for a given outcome \(Y_p\), the correlations between that outcome \(Y_p\) and the \(X_{(pj)}\) terms, and the variances. Headrick and Beasley (2004) also derived equations to calculate the expected correlations based on these betas between: 1) \(X_{(pj)}\) terms and outcome \(Y_q\), 2) outcomes \(Y_p\) and \(Y_q\), and 3) outcome \(Y_p\) and error term \(E_q\), for \(p\) not equal to \(q\). All continuous variables are generated from intermediate random normal variables using Headrick (2002)’s fifth-order power transformation method (PMT). The intermediate correlations required to generate the target correlations after variable transformations are calculated using Headrick (2002)’s equation.

The package SimRepeat extends Headrick and Beasley (2004)’s method to include continuous variables (independent variables or error terms) that have mixture distributions. The nonnormsys function simulates the system of equations containing correlated continuous normal or non-normal variables with non-mixture or mixture distributions. Mixture distributions are generated using the techniques of Fialkowski (2018). Mixture distributions describe random variables that are drawn from more than one component distribution. Mixture distributions provide a useful way for describing heterogeneity in a population, especially when an outcome is a composite response from multiple sources. For a random variable \(Y\) from a finite mixture distribution with \(k\) components, the PDF can be described by: \[h_{Y}(y) = \sum_{i=1}^{k} \pi_{i} f_{Y_{i}}(y),\] where \(\sum_{i=1}^{k} \pi_{i} = 1\). The \(\pi_{i}\) are mixing parameters which determine the weight of each component distribution \(f_{Y_{i}}(y)\) in the overall probability distribution. The overall distribution \(h_{Y}(y)\) has a valid PDF if each component distribution has a valid PDF. The main assumption is statistical independence between the process of randomly selecting the component distribution and the distributions themselves. Assume there is a random selection process that first generates the numbers \(1,\ ...,\ k\) with probabilities \(\pi_{1},\ ...,\ \pi_{k}\). After selecting number \(i\), where \(1 \leq i \leq k\), a random variable \(y\) is drawn from component distribution \(f_{Y_{i}}(y)\) (Davenport, Bezder, and Hathaway 1988; Shork, Allison, and Thiel 1996; Everitt 1996; Pearson 2011).

The package SimCorrMix contains vignettes describing mixture variables. Continuous mixture variables are generated componentwise and then transformed to the desired mixture variables using random multinomial variables generated based on mixing probabilities. The correlation matrices are specified in terms of correlations with components of the mixture variables. Headrick and Beasley (2004)’s equations have been adapted in SimRepeat to permit mixture variables. These are contained in the functions calc_betas, calc_corr_y, calc_corr_ye, and calc_corr_yx.

The package SimRepeat also allows the user to specify the correlations between independent variables across outcomes (i.e., \(X_{(pj)}\) and \(X_{(qj)}\) for \(p\) not equal to \(q\)). This allows imposing a specific correlation structure (i.e., AR(1) or Toeplitz) on time-varying covariates. Independent variables can be designated as the same across outcomes (i.e., to include a stationary covariate as in height). Continuous variables are generated using the techniques of Fialkowski (2017), which permits either Fleishman (1978)’s third-order (method = "Fleishman") or Headrick (2002)’s fifth-order (method = "Polynomial") PMT. When using the fifth-order method, sixth cumulant corrections can be specified in order to obtain random variables with valid probability distribution functions (PDF’s). The error loop from SimCorrMix can be used within the nonnormsys function to attempt to correct the final correlations among all \(X\) terms to be within a user-specified precision value (epsilon) of the target correlations.

The following examples demonstrate usage of the nonnormsys function. Example 1 is Headrick and Beasley (2004)’s example from their paper. Example 2 gives a system with non-mixture and mixture variables. Example 3 gives a system with missing variables. Details about theory and equations are in the Theory and Equations for Correlated Systems of Continuous Variables vignette.

Example 1: System of three equations for 2 independent variables

In this example taken from Headrick and Beasley (2004):

  1. \(X_{11}, X_{21}, X_{31}\) each have Exponential(2) distribution

  2. \(X_{12}, X_{22}, X_{32}\) each have Laplace(0, 1) distribution

  3. \(E_1, E_2, E_3\) each have Cauchy(0, 1) distribution (the standardized cumulants were taken from the paper)

\[\begin{equation} \begin{split} Y_1 &= \beta_{10} + \beta_{11} * X_{11} + \beta_{12} * X_{12} + E_1 \\ Y_2 &= \beta_{20} + \beta_{21} * X_{21} + \beta_{22} * X_{22} + E_2 \\ Y_3 &= \beta_{30} + \beta_{31} * X_{31} + \beta_{32} * X_{32} + E_3 \end{split} \tag{1} \end{equation}\]

Each term was generated with zero mean and unit variance. All intercepts were set at \(0\). The correlations were given as:

  1. \(\rho_{X_{11}, X_{12}} = 0.10\), \(\rho_{X_{21}, X_{22}} = 0.35\), \(\rho_{X_{31}, X_{32}} = 0.70\)

  2. \(\rho_{Y_1, X_{11}} = \rho_{Y_1, X_{12}} = 0.40\), \(\rho_{Y_2, X_{21}} = \rho_{Y_2, X_{22}} = 0.50\), \(\rho_{Y_3, X_{31}} = \rho_{Y_3, X_{32}} = 0.60\)

  3. \(\rho_{E_1, E_2} = \rho_{E_1, E_3} = \rho_{E_2, E_3} = 0.40\)

The across outcome correlations between \(X\) terms were set as below in order to obtain the same results as in the paper. No sixth cumulant corrections were used, although the Laplace(0, 1) distribution requires a correction of \(25.14\) in order to have a valid PDF.

Step 1: Set up parameter inputs

library("SimRepeat")
library("printr")
options(scipen = 999)
seed <- 111
n <- 10000
M <- 3

method <- "Polynomial"
Stcum1 <- calc_theory("Exponential", 2)
Stcum2 <- calc_theory("Laplace", c(0, 1))
Stcum3 <- c(0, 1, 0, 25, 0, 1500)

means <- lapply(seq_len(M), function(x) rep(0, 3))
vars <- lapply(seq_len(M), function(x) rep(1, 3))

skews <- lapply(seq_len(M), function(x) c(Stcum1[3], Stcum2[3], Stcum3[3]))
skurts <- lapply(seq_len(M), function(x) c(Stcum1[4], Stcum2[4], Stcum3[4]))
fifths <- lapply(seq_len(M), function(x) c(Stcum1[5], Stcum2[5], Stcum3[5]))
sixths <- lapply(seq_len(M), function(x) c(Stcum1[6], Stcum2[6], Stcum3[6]))
Six <- list()

# Note the following would be used to obtain all valid PDF's
# Six <- lapply(seq_len(M), function(x) list(NULL, 25.14, NULL))

betas.0 <- 0
corr.x <- list()
corr.x[[1]] <- corr.x[[2]] <- corr.x[[3]] <- list()
corr.x[[1]][[1]] <- matrix(c(1, 0.1, 0.1, 1), nrow = 2, ncol = 2)
corr.x[[1]][[2]] <- matrix(c(0.1974318, 0.1859656, 0.1879483, 0.1858601),
                           2, 2, byrow = TRUE)
corr.x[[1]][[3]] <- matrix(c(0.2873190, 0.2589830, 0.2682057, 0.2589542),
                           2, 2, byrow = TRUE)
corr.x[[2]][[1]] <- t(corr.x[[1]][[2]])
corr.x[[2]][[2]] <- matrix(c(1, 0.35, 0.35, 1), nrow = 2, ncol = 2)
corr.x[[2]][[3]] <- matrix(c(0.5723303, 0.4883054, 0.5004441, 0.4841808),
                           2, 2, byrow = TRUE)
corr.x[[3]][[1]] <- t(corr.x[[1]][[3]])
corr.x[[3]][[2]] <- t(corr.x[[2]][[3]])
corr.x[[3]][[3]] <- matrix(c(1, 0.7, 0.7, 1), nrow = 2, ncol = 2)

corr.yx <- list(matrix(c(0.4, 0.4), 1), matrix(c(0.5, 0.5), 1), 
  matrix(c(0.6, 0.6), 1))
corr.e <- matrix(0.4, nrow = M, ncol = M)
diag(corr.e) <- 1
error_type <- "non_mix"

Step 2: Check parameter inputs

checkpar(M, method, error_type, means, vars, skews, skurts, fifths, sixths, 
  Six, betas.0 = betas.0, corr.x = corr.x, corr.yx = corr.yx, corr.e = corr.e, 
  quiet = TRUE)
## [1] TRUE

Step 3: Generate system

Sys1 <- nonnormsys(n, M, method, error_type, means, vars, skews, skurts,
  fifths, sixths, Six, betas.0 = betas.0, corr.x = corr.x, corr.yx = corr.yx, 
  corr.e = corr.e, seed = seed, quiet = TRUE)
## Total Simulation time: 0.042 minutes
knitr::kable(Sys1$betas, booktabs = TRUE, caption = "Beta coefficients")
Table 1: Beta coefficients
B1 B2
Y1 0.4318335 0.4318335
Y2 0.4667600 0.4667598
Y3 0.4648507 0.4648503
knitr::kable(Sys1$constants[[1]], booktabs = TRUE, caption = "PMT constants")
Table 2: PMT constants
c0 c1 c2 c3 c4 c5
-0.3077396 0.8005605 0.318764 0.0335001 -0.0036748 0.0001587
0.0000000 0.7277089 0.000000 0.0963035 0.0000000 -0.0022321
0.0000000 -0.1596845 0.000000 0.3550358 0.0000000 -0.0094729
Sys1$valid.pdf
## [[1]]
## [1] "TRUE"  "FALSE" "FALSE"
## 
## [[2]]
## [1] "TRUE"  "FALSE" "FALSE"
## 
## [[3]]
## [1] "TRUE"  "FALSE" "FALSE"

Step 4: Describe results

Sum1 <- summary_sys(Sys1$Y, Sys1$E, E_mix = NULL, Sys1$X, Sys1$X_all, M,
  method, means, vars, skews, skurts, fifths, sixths, corr.x = corr.x,
  corr.e = corr.e)
knitr::kable(Sum1$cont_sum_y, booktabs = TRUE, 
  caption = "Simulated Distributions of Outcomes")
Table 3: Simulated Distributions of Outcomes
Outcome N Mean SD Median Min Max Skew Skurtosis Fifth Sixth
Y1 1 10000 0.0071862 1.184474 -0.0735469 -12.27685 9.139193 0.0593231 10.859887 -15.709407 344.2901
Y2 2 10000 0.0029877 1.253034 -0.0880282 -10.82648 10.005708 0.3167011 9.653009 4.236132 271.6446
Y3 3 10000 0.0100362 1.308292 -0.1012901 -12.35040 14.358996 0.4478936 10.291990 10.681747 419.8943
knitr::kable(Sum1$target_sum_x, booktabs = TRUE, 
  caption = "Target Distributions of Independent Variables")
Table 4: Target Distributions of Independent Variables
Outcome X Mean SD Skew Skurtosis Fifth Sixth
cont1_1 1 1 0 1 2 6 24 120
cont1_2 1 2 0 1 0 3 0 30
cont2_1 2 1 0 1 2 6 24 120
cont2_2 2 2 0 1 0 3 0 30
cont3_1 3 1 0 1 2 6 24 120
cont3_2 3 2 0 1 0 3 0 30
knitr::kable(Sum1$cont_sum_x, booktabs = TRUE, 
  caption = "Simulated Distributions of Independent Variables")
Table 5: Simulated Distributions of Independent Variables
Outcome X N Mean SD Median Min Max Skew Skurtosis Fifth Sixth
cont1_1 1 1 10000 -0.0008979 0.9972631 -0.2978603 -1.328231 9.250562 2.0468429 6.364092 25.635123 119.57370
cont1_2 1 2 10000 -0.0011300 1.0028701 0.0063327 -7.066301 5.654474 -0.1199282 3.037337 -3.488146 27.09002
cont2_1 2 1 10000 -0.0010172 0.9928807 -0.2910767 -1.291738 8.621562 2.0224749 6.245030 24.969443 112.32254
cont2_2 2 2 10000 -0.0008947 1.0008280 -0.0017667 -7.616110 6.442554 -0.0951022 3.630147 -4.626091 60.43407
cont3_1 3 1 10000 -0.0002394 0.9984306 -0.3077666 -1.578368 9.545770 2.0078250 6.085216 24.522409 120.55127
cont3_2 3 2 10000 -0.0003199 0.9874384 -0.0012571 -6.035376 5.227375 -0.0527893 2.236968 -1.027774 12.76165
knitr::kable(Sum1$target_sum_e, booktabs = TRUE, 
  caption = "Target Distributions of Error Terms")
Table 6: Target Distributions of Error Terms
Outcome Mean SD Skew Skurtosis Fifth Sixth
E1 1 0 1 0 25 0 1500
E2 2 0 1 0 25 0 1500
E3 3 0 1 0 25 0 1500
knitr::kable(Sum1$cont_sum_e, booktabs = TRUE, 
  caption = "Simulated Distributions of Error Terms")
Table 7: Simulated Distributions of Error Terms
Outcome N Mean SD Median Min Max Skew Skurtosis Fifth Sixth
E1 1 10000 0.0080619 0.9935061 -0.0004688 -11.77427 8.960461 -0.1369344 21.30282 -35.984933 965.4108
E2 2 10000 0.0038802 0.9978020 -0.0012622 -13.02425 10.362046 0.1759016 24.93013 1.313834 1463.0695
E3 3 10000 0.0102962 0.9951096 0.0001988 -12.04561 12.090892 0.4188715 26.37023 26.936516 1586.3607

Maximum Correlation Errors for X Variables by Outcome:

maxerr <- do.call(rbind, Sum1$maxerr)
rownames(maxerr) <- colnames(maxerr) <- paste("Y", 1:M, sep = "")
knitr::kable(as.data.frame(maxerr), digits = 5, booktabs = TRUE, 
  caption = "Maximum Correlation Errors for X Variables")
Table 8: Maximum Correlation Errors for X Variables
Y1 Y2 Y3
Y1 0.00413 0.00205 0.00575
Y2 0.00205 0.00495 0.00541
Y3 0.00575 0.00541 0.00516

Step 5: Compare simulated correlations to theoretical values

knitr::kable(calc_corr_y(Sys1$betas, corr.x, corr.e, vars), 
  booktabs = TRUE, caption = "Expected Y Correlations")
Table 9: Expected Y Correlations
Y1 Y2 Y3
Y1 1.0000000 0.3692526 0.3935111
Y2 0.3692526 1.0000000 0.5083399
Y3 0.3935111 0.5083399 1.0000000
knitr::kable(Sum1$rho.y, booktabs = TRUE, 
  caption = "Simulated Y Correlations")
Table 10: Simulated Y Correlations
Y1 Y2 Y3
Y1 1.0000000 0.3741871 0.4015689
Y2 0.3741871 1.0000000 0.5019800
Y3 0.4015689 0.5019800 1.0000000
knitr::kable(calc_corr_ye(Sys1$betas, corr.x, corr.e, vars), 
  booktabs = TRUE, caption = "Expected Y, E Correlations")
Table 11: Expected Y, E Correlations
E1 E2 E3
Y1 0.8420754 0.3368301 0.3368301
Y2 0.3173969 0.7934921 0.3173969
Y3 0.3037028 0.3037028 0.7592569
knitr::kable(Sum1$rho.ye, 
  booktabs = TRUE, caption = "Simulated Y, E Correlations")
Table 12: Simulated Y, E Correlations
E1 E2 E3
Y1 0.8405035 0.3422090 0.3489198
Y2 0.3218245 0.7931439 0.3098100
Y3 0.3100882 0.2917064 0.7603231
knitr::kable(calc_corr_yx(Sys1$betas, corr.x, vars), 
  booktabs = TRUE, caption = "Expected Y, X Correlations")
Table 13: Expected Y, X Correlations
X1_1 X1_2
Y1 0.4000000 0.4000000
Y2 0.1419990 0.1384475
Y3 0.1928124 0.1860563
X2_1 X2_2
Y1 0.1401382 0.1352093
Y2 0.5000000 0.4999998
Y3 0.3743418 0.3475145
X3_1 X3_2
Y1 0.2020090 0.1883408
Y2 0.3973238 0.3601800
Y3 0.5999997 0.5999996
knitr::kable(Sum1$rho.yx, 
  booktabs = TRUE, caption = "Simulated Y, X Correlations")
Table 14: Simulated Y, X Correlations
X1_1 X1_2
Y1 0.4013407 0.4077626
Y2 0.1413510 0.1445654
Y3 0.1980463 0.1900016
X2_1 X2_2
Y1 0.1408618 0.1330024
Y2 0.4987203 0.4934377
Y3 0.3818031 0.3446263
X3_1 X3_2
Y1 0.1976696 0.1882595
Y2 0.3899509 0.3648209
Y3 0.5952966 0.5999793

Example 2: System of 3 equations, each with 2 continuous non-mixture covariates and 1 mixture covariate

In this example:

  1. \(X_{11} = X_{21} = X_{31}\) has a Logistic(0, 1) distribution and is the same variable for each Y (static covariate)

  2. \(X_{12}, X_{22}, X_{32}\) each have a Beta(4, 1.5) distribution

  3. \(X_{13}, X_{23}, X_{33}\) each have a Normal(-2, 1), Normal(2, 1) mixture distribution

  4. \(E_1, E_2, E_3\) each have a Logistic(0, 1), Chisq(4), Beta(4, 1.5) mixture distribution

\[\begin{equation} \begin{split} Y_1 &= \beta_{10} + \beta_{11} * X_{11} + \beta_{12} * X_{12} + \beta_{13} * X_{13} + E_1 \\ Y_2 &= \beta_{20} + \beta_{21} * X_{11} + \beta_{22} * X_{22} + \beta_{23} * X_{23} + E_2 \\ Y_3 &= \beta_{30} + \beta_{31} * X_{11} + \beta_{32} * X_{32} + \beta_{33} * X_{33} + E_3 \end{split} \tag{2} \end{equation}\]

Step 1: Set up parameter inputs

seed <- 276
n <- 10000
M <- 3

method <- "Polynomial"
L <- calc_theory("Logistic", c(0, 1))
C <- calc_theory("Chisq", 4)
B <- calc_theory("Beta", c(4, 1.5))

skews <- lapply(seq_len(M), function(x) c(L[3], B[3]))
skurts <- lapply(seq_len(M), function(x) c(L[4], B[4]))
fifths <- lapply(seq_len(M), function(x) c(L[5], B[5]))
sixths <- lapply(seq_len(M), function(x) c(L[6], B[6]))
Six <- lapply(seq_len(M), function(x) list(1.75, 0.03))

mix_pis <- lapply(seq_len(M), function(x) list(c(0.4, 0.6), c(0.3, 0.2, 0.5)))
mix_mus <- lapply(seq_len(M), function(x) list(c(-2, 2), c(L[1], C[1], B[1])))
mix_sigmas <- lapply(seq_len(M),
  function(x) list(c(1, 1), c(L[2], C[2], B[2])))
mix_skews <- lapply(seq_len(M),
  function(x) list(c(0, 0), c(L[3], C[3], B[3])))
mix_skurts <- lapply(seq_len(M),
  function(x) list(c(0, 0), c(L[4], C[4], B[4])))
mix_fifths <- lapply(seq_len(M),
  function(x) list(c(0, 0), c(L[5], C[5], B[5])))
mix_sixths <- lapply(seq_len(M),
  function(x) list(c(0, 0), c(L[6], C[6], B[6])))
mix_Six <- lapply(seq_len(M), function(x) list(NULL, NULL, 1.75, NULL, 0.03))
Nstcum <- calc_mixmoments(mix_pis[[1]][[1]], mix_mus[[1]][[1]],
  mix_sigmas[[1]][[1]], mix_skews[[1]][[1]], mix_skurts[[1]][[1]],
  mix_fifths[[1]][[1]], mix_sixths[[1]][[1]])
Mstcum <- calc_mixmoments(mix_pis[[2]][[2]], mix_mus[[2]][[2]],
  mix_sigmas[[2]][[2]], mix_skews[[2]][[2]], mix_skurts[[2]][[2]],
  mix_fifths[[2]][[2]], mix_sixths[[2]][[2]])
means <- lapply(seq_len(M),
  function(x) c(L[1], B[1], Nstcum[1], Mstcum[1]))
vars <- lapply(seq_len(M),
  function(x) c(L[2]^2, B[2]^2, Nstcum[2]^2, Mstcum[2]^2))

same.var <- 1
betas.0 <- 0

corr.x <- list()
corr.x[[1]] <- list(matrix(0.1, 4, 4), matrix(0.2, 4, 4), matrix(0.3, 4, 4))
diag(corr.x[[1]][[1]]) <- 1

# set correlations between components of the same mixture variable to 0
corr.x[[1]][[1]][3:4, 3:4] <- diag(2)

# since X1 is the same across outcomes, set correlation the same
corr.x[[1]][[2]][, same.var] <- corr.x[[1]][[3]][, same.var] <-
  corr.x[[1]][[1]][, same.var]

corr.x[[2]] <- list(t(corr.x[[1]][[2]]), matrix(0.35, 4, 4), matrix(0.4, 4, 4))
diag(corr.x[[2]][[2]]) <- 1

# set correlations between components of the same mixture variable to 0
corr.x[[2]][[2]][3:4, 3:4] <- diag(2)

# since X1 is the same across outcomes, set correlation the same
corr.x[[2]][[2]][, same.var] <- corr.x[[2]][[3]][, same.var] <-
  t(corr.x[[1]][[2]][same.var, ])
corr.x[[2]][[3]][same.var, ] <- corr.x[[1]][[3]][same.var, ]
corr.x[[2]][[2]][same.var, ] <- t(corr.x[[2]][[2]][, same.var])

corr.x[[3]] <- list(t(corr.x[[1]][[3]]), t(corr.x[[2]][[3]]), 
  matrix(0.5, 4, 4))
diag(corr.x[[3]][[3]]) <- 1

# set correlations between components of the same mixture variable to 0
corr.x[[3]][[3]][3:4, 3:4] <- diag(2)

# since X1 is the same across outcomes, set correlation the same
corr.x[[3]][[3]][, same.var] <- t(corr.x[[1]][[3]][same.var, ])
corr.x[[3]][[3]][same.var, ] <- t(corr.x[[3]][[3]][, same.var])

corr.yx <- list(matrix(0.3, 1, 4), matrix(0.4, 1, 4), matrix(0.5, 1, 4))

corr.e <- matrix(0.4, 9, 9)
corr.e[1:3, 1:3] <- corr.e[4:6, 4:6] <- corr.e[7:9, 7:9] <- diag(3)
error_type = "mix"

Step 2: Check parameter inputs

checkpar(M, method, error_type, means, vars, skews, skurts, fifths, sixths, 
  Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, 
  mix_sixths, mix_Six, same.var = same.var, betas.0 = betas.0, corr.x = corr.x, 
  corr.yx = corr.yx, corr.e = corr.e, quiet = TRUE)
## [1] TRUE

Step 3: Generate system

Note that use.nearPD = FALSE and adjgrad = FALSE so that negative eigen-values will be replaced with eigmin (default \(0\)) instead of using the nearest positive-definite matrix (found with Bates and Maechler (2017)’s Matrix::nearPD function by Higham (2002)’s algorithm) or the adjusted gradient updating method via adj_grad (Yin and Zhang 2013; Zhang and Yin Year not provided; Maree 2012).

Sys2 <- nonnormsys(n, M, method, error_type, means, vars, skews, skurts,
  fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts,
  mix_fifths, mix_sixths, mix_Six, same.var, betas.0, corr.x, corr.yx, corr.e,
  seed, use.nearPD = FALSE, quiet = TRUE)
## Total Simulation time: 0.014 minutes
knitr::kable(Sys2$betas, booktabs = TRUE, caption = "Beta coefficients")
Table 15: Beta coefficients
B1 B2 B3 B4
Y1 0.3213032 3.336166 0.6556270 0.6556270
Y2 0.3881446 2.223553 0.8415819 0.8415821
Y3 0.4345749 0.000000 1.3794056 1.3794056
knitr::kable(Sys2$constants[[1]], booktabs = TRUE, caption = "PMT constants")
Table 16: PMT constants
c0 c1 c2 c3 c4 c5
0.0000000 0.8879258 0.0000000 0.0360523 0.0000000 0.0000006
0.1629657 1.0899841 -0.1873287 -0.0449503 0.0081210 0.0014454
0.0000000 1.0000000 0.0000000 0.0000000 0.0000000 0.0000000
0.0000000 1.0000000 0.0000000 0.0000000 0.0000000 0.0000000
0.0000000 0.8879258 0.0000000 0.0360523 0.0000000 0.0000006
-0.2275081 0.9007156 0.2316099 0.0154662 -0.0013673 0.0000552
0.1629657 1.0899841 -0.1873287 -0.0449503 0.0081210 0.0014454
Sys2$valid.pdf
## [[1]]
## [1] "TRUE" "TRUE" "TRUE" "TRUE" "TRUE" "TRUE" "TRUE"
## 
## [[2]]
## [1] "TRUE" "TRUE" "TRUE" "TRUE" "TRUE" "TRUE" "TRUE"
## 
## [[3]]
## [1] "TRUE" "TRUE" "TRUE" "TRUE" "TRUE" "TRUE" "TRUE"

Step 4: Describe results

Sum2 <- summary_sys(Sys2$Y, Sys2$E, Sys2$E_mix, Sys2$X, Sys2$X_all, M, method, 
  means, vars, skews, skurts, fifths, sixths, mix_pis, mix_mus, mix_sigmas,
  mix_skews, mix_skurts, mix_fifths, mix_sixths, corr.x = corr.x,
  corr.e = corr.e)
knitr::kable(Sum2$cont_sum_y, booktabs = TRUE, 
  caption = "Simulated Distributions of Outcomes")
Table 17: Simulated Distributions of Outcomes
Outcome N Mean SD Median Min Max Skew Skurtosis Fifth Sixth
Y1 1 10000 3.589862 2.576976 3.365901 -8.590069 23.70402 1.1030234 4.039389 11.893356 42.501607
Y2 2 10000 2.781013 2.779635 2.584989 -7.634977 22.73812 0.9484330 3.323180 10.374270 37.360759
Y3 3 10000 1.163812 3.226658 1.024133 -11.625259 18.97849 0.5523652 1.714578 3.843356 9.620764
knitr::kable(Sum2$target_sum_x, booktabs = TRUE, 
  caption = "Target Distributions of Independent Variables")
Table 18: Target Distributions of Independent Variables
Outcome X Mean SD Skew Skurtosis Fifth Sixth
cont1_1 1 1 0.0000000 1.8137994 0.0000000 1.2000000 0.000000 6.857143
cont1_2 1 2 0.7272727 0.1746854 -0.6938789 -0.0686029 1.828251 -3.379110
cont1_3 1 3 -2.0000000 1.0000000 0.0000000 0.0000000 0.000000 0.000000
cont1_4 1 4 2.0000000 1.0000000 0.0000000 0.0000000 0.000000 0.000000
cont2_1 2 1 0.0000000 1.8137994 0.0000000 1.2000000 0.000000 6.857143
cont2_2 2 2 0.7272727 0.1746854 -0.6938789 -0.0686029 1.828251 -3.379110
cont2_3 2 3 -2.0000000 1.0000000 0.0000000 0.0000000 0.000000 0.000000
cont2_4 2 4 2.0000000 1.0000000 0.0000000 0.0000000 0.000000 0.000000
cont3_1 3 1 0.0000000 1.8137994 0.0000000 1.2000000 0.000000 6.857143
cont3_2 3 2 0.7272727 0.1746854 -0.6938789 -0.0686029 1.828251 -3.379110
cont3_3 3 3 -2.0000000 1.0000000 0.0000000 0.0000000 0.000000 0.000000
cont3_4 3 4 2.0000000 1.0000000 0.0000000 0.0000000 0.000000 0.000000
knitr::kable(Sum2$cont_sum_x, booktabs = TRUE, 
  caption = "Simulated Distributions of Independent Variables")
Table 19: Simulated Distributions of Independent Variables
Outcome X N Mean SD Median Min Max Skew Skurtosis Fifth Sixth
cont1_1 1 1 10000 0.0004036 1.8201505 0.0211754 -9.1086030 11.353203 0.0314930 1.3636885 0.8191860 7.6044595
cont1_2 1 2 10000 0.7272107 0.1743913 0.7553300 0.0765447 1.176209 -0.6805537 -0.0960855 1.7904701 -2.9223486
cont1_3 1 3 10000 -2.0000000 0.9999500 -2.0027696 -5.8445854 1.692666 0.0009459 0.0193741 0.0376516 -0.0283331
cont1_4 1 4 10000 2.0000000 0.9999500 2.0095202 -1.8793929 5.929108 -0.0006385 0.0089325 0.0295979 -0.3287443
cont2_1 2 1 10000 0.0004036 1.8201505 0.0211754 -9.1086030 11.353203 0.0314930 1.3636885 0.8191860 7.6044595
cont2_2 2 2 10000 0.7273131 0.1749365 0.7548160 0.0850562 1.108987 -0.6935349 -0.0667186 1.7992307 -3.3734961
cont2_3 2 3 10000 -2.0000000 0.9999500 -1.9899973 -6.0984500 1.691800 -0.0387498 0.0458874 0.1134769 -0.4972766
cont2_4 2 4 10000 2.0000000 0.9999500 2.0110466 -2.1448770 5.491031 -0.0455281 0.0073620 -0.0195780 -0.2759494
cont3_1 3 1 10000 0.0004036 1.8201505 0.0211754 -9.1086030 11.353203 0.0314930 1.3636885 0.8191860 7.6044595
cont3_2 3 2 10000 0.7274490 0.1748960 0.7565852 0.0747992 1.071272 -0.7079415 -0.0175061 1.7502794 -3.6178723
cont3_3 3 3 10000 -2.0000000 0.9999500 -2.0047639 -6.1416294 1.976896 -0.0137807 0.0222297 0.0420496 -0.1082238
cont3_4 3 4 10000 2.0000000 0.9999500 1.9920150 -1.9378583 5.545766 -0.0258160 -0.0029318 -0.2124639 -0.2961942
knitr::kable(Sum2$target_mix_x, booktabs = TRUE, 
  caption = "Target Distributions of Mixture Independent Variables")
Table 20: Target Distributions of Mixture Independent Variables
Outcome X Mean SD Skew Skurtosis Fifth Sixth
mix1_1 1 1 0.4 2.2 -0.2885049 -1.154019 1.793022 6.173267
mix2_1 2 1 0.4 2.2 -0.2885049 -1.154019 1.793022 6.173267
mix3_1 3 1 0.4 2.2 -0.2885049 -1.154019 1.793022 6.173267
knitr::kable(Sum2$mix_sum_x, booktabs = TRUE, 
  caption = "Simulated Distributions of Mixture Independent Variables")
Table 21: Simulated Distributions of Mixture Independent Variables
Outcome X N Mean SD Median Min Max Skew Skurtosis Fifth Sixth
mix1_1 1 1 10000 0.4 2.19989 1.045485 -5.445049 5.916135 -0.2731288 -1.168640 1.738283 6.351654
mix2_1 2 1 10000 0.4 2.19989 1.057698 -6.054823 5.473999 -0.2947029 -1.156161 1.804279 6.198049
mix3_1 3 1 10000 0.4 2.19989 1.058490 -6.162909 5.538341 -0.3087157 -1.129261 1.847631 5.899136
Nplot <- plot_simpdf_theory(sim_y = Sys2$X_all[[1]][, 3], ylower = -10, 
  yupper = 10, 
  title = "PDF of X_11: Mixture of N(-2, 1) and N(2, 1) Distributions",
  fx = function(x) mix_pis[[1]][[1]][1] * dnorm(x, mix_mus[[1]][[1]][1], 
    mix_sigmas[[1]][[1]][1]) + mix_pis[[1]][[1]][2] * 
    dnorm(x, mix_mus[[1]][[1]][2], mix_sigmas[[1]][[1]][2]), 
  lower = -Inf, upper = Inf)
Nplot

knitr::kable(Sum2$target_mix_e, booktabs = TRUE, 
  caption = "Target Distributions of Mixture Error Terms")
Table 22: Target Distributions of Mixture Error Terms
Outcome Mean SD Skew Skurtosis Fifth Sixth
E1 1 1.163636 2.17086 2.013281 8.789539 36.48103 192.722
E2 2 1.163636 2.17086 2.013281 8.789539 36.48103 192.722
E3 3 1.163636 2.17086 2.013281 8.789539 36.48103 192.722
knitr::kable(Sum2$mix_sum_e, booktabs = TRUE, 
  caption = "Simulated Distributions of Mixture Error Terms")
Table 23: Simulated Distributions of Mixture Error Terms
Outcome N Mean SD Median Min Max Skew Skurtosis Fifth Sixth
E1 1 10000 1.163636 2.170751 0.7848816 -9.673026 23.10077 1.947736 8.372072 32.80773 167.3890
E2 2 10000 1.163636 2.170751 0.7845759 -9.483456 21.01385 2.079222 9.163271 36.52046 169.1260
E3 3 10000 1.163636 2.170751 0.8101482 -7.411084 20.11514 1.921233 8.176876 32.02805 146.7574
Mplot <- plot_simpdf_theory(sim_y = Sys2$E_mix[, 1], 
  title = paste("PDF of E_1: Mixture of Logistic(0, 1), Chisq(4),", 
    "\nand Beta(4, 1.5) Distributions", sep = ""),
  fx = function(x) mix_pis[[1]][[2]][1] * dlogis(x, 0, 1) + 
    mix_pis[[1]][[2]][2] * dchisq(x, 4) + mix_pis[[1]][[2]][3] * 
    dbeta(x, 4, 1.5), 
  lower = -Inf, upper = Inf)
Mplot

Maximum Correlation Errors for X Variables by Outcome:

maxerr <- do.call(rbind, Sum2$maxerr)
rownames(maxerr) <- colnames(maxerr) <- paste("Y", 1:M, sep = "")
knitr::kable(as.data.frame(maxerr), digits = 5, booktabs = TRUE, 
  caption = "Maximum Correlation Errors for X Variables")
Table 24: Maximum Correlation Errors for X Variables
Y1 Y2 Y3
Y1 0.00288 0.00405 0.00521
Y2 0.00405 0.00170 0.00341
Y3 0.00521 0.00341 0.00409

Step 5: Compare simulated correlations to theoretical values

Since the system contains mixture variables, the mixture parameter inputs mix_pis, mix_mus, and mix_sigmas and the error_type must also be used.

knitr::kable(calc_corr_y(Sys2$betas, corr.x, corr.e, vars, mix_pis, 
  mix_mus, mix_sigmas, "mix"), booktabs = TRUE, 
  caption = "Expected Y Correlations")
Table 25: Expected Y Correlations
Y1 Y2 Y3
Y1 1.0000000 0.2978052 0.3885223
Y2 0.2978052 1.0000000 0.4785063
Y3 0.3885223 0.4785063 1.0000000
knitr::kable(Sum2$rho.y, booktabs = TRUE, 
  caption = "Simulated Y Correlations")
Table 26: Simulated Y Correlations
Y1 Y2 Y3
Y1 1.0000000 0.2838656 0.3958264
Y2 0.2838656 1.0000000 0.4717876
Y3 0.3958264 0.4717876 1.0000000
knitr::kable(calc_corr_ye(Sys2$betas, corr.x, corr.e, vars, mix_pis, 
  mix_mus, mix_sigmas, "mix"), booktabs = TRUE, 
  caption = "Expected Y, E Correlations")
Table 27: Expected Y, E Correlations
E1 E2 E3
Y1 0.8433984 0.1025981 0.1025981
Y2 0.0944555 0.7764627 0.0944555
Y3 0.0817148 0.0817148 0.6717288
knitr::kable(Sum2$rho.yemix, booktabs = TRUE, 
  caption = "Simulated Y, E Correlations")
Table 28: Simulated Y, E Correlations
E1 E2 E3
Y1 0.8433998 0.0788837 0.1088860
Y2 0.0857085 0.7728459 0.0893955
Y3 0.0881229 0.0692333 0.6696976
knitr::kable(calc_corr_yx(Sys2$betas, corr.x, vars, mix_pis, 
  mix_mus, mix_sigmas, "mix"), booktabs = TRUE, 
  caption = "Expected Y, X Correlations")
Table 29: Expected Y, X Correlations
X1_1 X1_2 X1_3 X1_4
Y1 0.2999999 0.3000000 0.3000000 0.3000000
Y2 0.4000002 0.1733719 0.1733719 0.1733719
Y3 0.5000000 0.2804878 0.2804878 0.2804878
X2_1 X2_2 X2_3 X2_4
Y1 0.2999999 0.1924528 0.1924528 0.1924528
Y2 0.4000002 0.4000000 0.4000000 0.4000000
Y3 0.5000000 0.3902439 0.3902439 0.3902439
X3_1 X3_2 X3_3 X3_4
Y1 0.2999999 0.2886792 0.2886792 0.2886792
Y2 0.4000002 0.3719248 0.3719248 0.3719248
Y3 0.5000000 0.5000000 0.5000000 0.5000000
knitr::kable(Sum2$rho.yx, booktabs = TRUE, 
  caption = "Simulated Y, X Correlations")
Table 30: Simulated Y, X Correlations
X1_1 X1_2 X1_3 X1_4
Y1 0.3046310 0.2926216 0.3197465 0.2869766
Y2 0.4002240 0.1577374 0.1697532 0.1652599
Y3 0.4934152 0.2918114 0.2789633 0.2860321
X2_1 X2_2 X2_3 X2_4
Y1 0.3046310 0.1964025 0.1918824 0.1957318
Y2 0.4002240 0.3971412 0.3928657 0.3970529
Y3 0.4934152 0.4008303 0.3914031 0.3888395
X3_1 X3_2 X3_3 X3_4
Y1 0.3046310 0.2854515 0.2892292 0.2906667
Y2 0.4002240 0.3729806 0.3633462 0.3700993
Y3 0.4934152 0.5087496 0.5014829 0.5009007

Example 3: System of 3 equations, \(Y_1\) and \(Y_3\) as in Example 2, \(Y_2\) has only error term

In this example:

  1. \(X_{11} = X_{31}\) has a Logistic(0, 1) distribution and is the same variable for \(Y_1\) and \(Y_3\)

  2. \(X_{12}, X_{32}\) have Beta(4, 1.5) distributions

  3. \(X_{13}, X_{33}\) have normal mixture distributions

  4. \(E_1, E_2, E_3\) have Logistic, Chisq, Beta mixture distributions

\[\begin{equation} \begin{split} Y_1 &= \beta_{10} + \beta_{11} * X_{11} + \beta_{12} * X_{12} + \beta_{13} * X_{13} + E_1 \\ Y_2 &= \beta_{20} + E_2 \\ Y_3 &= \beta_{30} + \beta_{31} * X_{11} + \beta_{32} * X_{32} + \beta_{33} * X_{33} + E_3 \end{split} \tag{3} \end{equation}\]

Step 1: Set up parameter inputs

seed <- 276
n <- 10000
M <- 3

method <- "Polynomial"
L <- calc_theory("Logistic", c(0, 1))
C <- calc_theory("Chisq", 4)
B <- calc_theory("Beta", c(4, 1.5))

skews <- list(c(L[3], B[3]), NULL, c(L[3], B[3]))
skurts <- list(c(L[4], B[4]), NULL, c(L[4], B[4]))
fifths <- list(c(L[5], B[5]), NULL, c(L[5], B[5]))
sixths <- list(c(L[6], B[6]), NULL, c(L[6], B[6]))
Six <- list(list(1.75, 0.03), NULL, list(1.75, 0.03))

mix_pis <- list(list(c(0.4, 0.6), c(0.3, 0.2, 0.5)), list(c(0.3, 0.2, 0.5)), 
  list(c(0.4, 0.6), c(0.3, 0.2, 0.5)))
mix_mus <- list(list(c(-2, 2), c(L[1], C[1], B[1])), list(c(L[1], C[1], B[1])), 
  list(c(-2, 2), c(L[1], C[1], B[1])))
mix_sigmas <- list(list(c(1, 1), c(L[2], C[2], B[2])), 
  list(c(L[2], C[2], B[2])), list(c(1, 1), c(L[2], C[2], B[2])))
mix_skews <- list(list(c(0, 0), c(L[3], C[3], B[3])), 
  list(c(L[3], C[3], B[3])), list(c(0, 0), c(L[3], C[3], B[3])))
mix_skurts <- list(list(c(0, 0), c(L[4], C[4], B[4])), 
  list(c(L[4], C[4], B[4])),list(c(0, 0), c(L[4], C[4], B[4])))
mix_fifths <- list(list(c(0, 0), c(L[5], C[5], B[5])), 
  list(c(L[5], C[5], B[5])), list(c(0, 0), c(L[5], C[5], B[5])))
mix_sixths <- list(list(c(0, 0), c(L[6], C[6], B[6])), 
  list(c(L[6], C[6], B[6])), list(c(0, 0), c(L[6], C[6], B[6])))
mix_Six <- list(list(NULL, NULL, 1.75, NULL, 0.03), list(1.75, NULL, 0.03), 
  list(NULL, NULL, 1.75, NULL, 0.03))
means <- list(c(L[1], B[1], Nstcum[1], Mstcum[1]), Mstcum[1], 
  c(L[1], B[1], Nstcum[1], Mstcum[1]))
vars <- list(c(L[2]^2, B[2]^2, Nstcum[2]^2, Mstcum[2]^2), Mstcum[2]^2, 
  c(L[2]^2, B[2]^2, Nstcum[2]^2, Mstcum[2]^2))

# since Y_2 has no X variables, same.var must be changed to a matrix
same.var <- matrix(c(1, 1, 3, 1), 1, 4)

corr.x <- list(list(matrix(0.1, 4, 4), NULL, matrix(0.3, 4, 4)), NULL)
diag(corr.x[[1]][[1]]) <- 1

# set correlations between components of the same mixture variable to 0
corr.x[[1]][[1]][3:4, 3:4] <- diag(2)

# since X1 is the same across outcomes, set correlation the same
corr.x[[1]][[3]][, 1] <- corr.x[[1]][[1]][, 1]

corr.x <- append(corr.x, list(list(t(corr.x[[1]][[3]]), NULL, matrix(0.5, 4, 4))))
diag(corr.x[[3]][[3]]) <- 1

# set correlations between components of the same mixture variable to 0
corr.x[[3]][[3]][3:4, 3:4] <- diag(2)

# since X1 is the same across outcomes, set correlation the same
corr.x[[3]][[3]][, 1] <- t(corr.x[[1]][[3]][1, ])
corr.x[[3]][[3]][1, ] <- t(corr.x[[3]][[3]][, 1])

corr.yx <- list(matrix(0.3, 1, 4), NULL, matrix(0.5, 1, 4))

corr.e <- matrix(0.4, 9, 9)
corr.e[1:3, 1:3] <- corr.e[4:6, 4:6] <- corr.e[7:9, 7:9] <- diag(3)
error_type = "mix"

Step 2: Check parameter inputs

checkpar(M, method, error_type, means, vars, skews, skurts, fifths, sixths, 
  Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, 
  mix_sixths, mix_Six, same.var = same.var, betas.0 = betas.0, corr.x = corr.x, 
  corr.yx = corr.yx, corr.e = corr.e, quiet = TRUE)
## [1] TRUE

Step 3: Generate system

Sys3 <- nonnormsys(n, M, method, error_type, means, vars, skews, skurts,
  fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts,
  mix_fifths, mix_sixths, mix_Six, same.var, betas.0, corr.x, corr.yx, corr.e,
  seed, use.nearPD = FALSE, quiet = TRUE)
## Total Simulation time: 0.006 minutes
knitr::kable(Sys3$betas, booktabs = TRUE, caption = "Beta coefficients")
Table 31: Beta coefficients
B1 B2 B3 B4
Y1 0.3213033 3.336167 0.6556271 0.6556271
Y2 0.0000000 0.000000 0.0000000 0.0000000
Y3 0.4345749 0.000000 1.3794056 1.3794056
knitr::kable(Sys3$constants[[1]], booktabs = TRUE, caption = "PMT constants")
Table 32: PMT constants
c0 c1 c2 c3 c4 c5
0.0000000 0.8879258 0.0000000 0.0360523 0.0000000 0.0000006
0.1629657 1.0899841 -0.1873287 -0.0449503 0.0081210 0.0014454
0.0000000 1.0000000 0.0000000 0.0000000 0.0000000 0.0000000
0.0000000 1.0000000 0.0000000 0.0000000 0.0000000 0.0000000
0.0000000 0.8879258 0.0000000 0.0360523 0.0000000 0.0000006
-0.2275081 0.9007156 0.2316099 0.0154662 -0.0013673 0.0000552
0.1629657 1.0899841 -0.1873287 -0.0449503 0.0081210 0.0014454
Sys3$valid.pdf
## [[1]]
## [1] "TRUE" "TRUE" "TRUE" "TRUE" "TRUE" "TRUE" "TRUE"
## 
## [[2]]
## [1] "TRUE" "TRUE" "TRUE"
## 
## [[3]]
## [1] "TRUE" "TRUE" "TRUE" "TRUE" "TRUE" "TRUE" "TRUE"

Step 4: Describe results

Sum3 <- summary_sys(Sys3$Y, Sys3$E, Sys3$E_mix, Sys3$X, Sys3$X_all, M, method, 
  means, vars, skews, skurts, fifths, sixths, mix_pis, mix_mus, mix_sigmas,
  mix_skews, mix_skurts, mix_fifths, mix_sixths, corr.x = corr.x,
  corr.e = corr.e)
knitr::kable(Sum3$cont_sum_y, booktabs = TRUE, 
  caption = "Simulated Distributions of Outcomes")
Table 33: Simulated Distributions of Outcomes
Outcome N Mean SD Median Min Max Skew Skurtosis Fifth Sixth
Y1 1 10000 3.589515 2.561356 3.3343275 -6.173603 27.29735 1.2077687 4.570660 17.098230 83.88966
Y2 2 10000 1.163636 2.170751 0.8114017 -10.198298 17.51876 1.9371481 8.312887 29.395190 114.56563
Y3 3 10000 1.163810 3.223408 0.9884224 -11.205198 21.47867 0.5828432 1.737082 4.733577 14.03912
knitr::kable(Sum3$target_sum_x, booktabs = TRUE,
  caption = "Target Distributions of Independent Variables")
Table 34: Target Distributions of Independent Variables
Outcome X Mean SD Skew Skurtosis Fifth Sixth
cont1_1 1 1 0.0000000 1.8137994 0.0000000 1.2000000 0.000000 6.857143
cont1_2 1 2 0.7272727 0.1746854 -0.6938789 -0.0686029 1.828251 -3.379110
cont1_3 1 3 -2.0000000 1.0000000 0.0000000 0.0000000 0.000000 0.000000
cont1_4 1 4 2.0000000 1.0000000 0.0000000 0.0000000 0.000000 0.000000
cont3_1 3 1 0.0000000 1.8137994 0.0000000 1.2000000 0.000000 6.857143
cont3_2 3 2 0.7272727 0.1746854 -0.6938789 -0.0686029 1.828251 -3.379110
cont3_3 3 3 -2.0000000 1.0000000 0.0000000 0.0000000 0.000000 0.000000
cont3_4 3 4 2.0000000 1.0000000 0.0000000 0.0000000 0.000000 0.000000
knitr::kable(Sum3$cont_sum_x, booktabs = TRUE,
  caption = "Simulated Distributions of Independent Variables")
Table 35: Simulated Distributions of Independent Variables
Outcome X N Mean SD Median Min Max Skew Skurtosis Fifth Sixth
cont1_1 1 1 10000 0.0004003 1.8086659 -0.0034553 -9.3357455 8.093322 0.0093534 0.9026131 -0.0030437 1.4023271
cont1_2 1 2 10000 0.7271071 0.1743004 0.7532079 0.0867398 1.083444 -0.6761726 -0.1000023 1.7778230 -3.0225169
cont1_3 1 3 10000 -2.0000000 0.9999500 -1.9820874 -6.1670779 1.579041 -0.0138124 0.0450324 -0.0467214 -0.1399622
cont1_4 1 4 10000 2.0000000 0.9999500 2.0192696 -1.8671048 5.818930 -0.0259989 0.0660603 0.0980029 -0.4313656
cont3_1 3 1 10000 0.0004003 1.8086659 -0.0034553 -9.3357455 8.093322 0.0093534 0.9026131 -0.0030437 1.4023271
cont3_2 3 2 10000 0.7273631 0.1752534 0.7601429 0.0711494 1.034962 -0.7030286 -0.0889725 1.9270159 -3.3525250
cont3_3 3 3 10000 -2.0000000 0.9999500 -1.9947523 -5.8932382 1.483459 -0.0447962 -0.0478362 -0.0040827 -0.1491799
cont3_4 3 4 10000 2.0000000 0.9999500 2.0119310 -1.6266392 6.083537 -0.0019161 -0.0060812 0.2566953 0.2512733
knitr::kable(Sum3$target_mix_x, booktabs = TRUE,
  caption = "Target Distributions of Mixture Independent Variables")
Table 36: Target Distributions of Mixture Independent Variables
Outcome X Mean SD Skew Skurtosis Fifth Sixth
mix1_1 1 1 0.4 2.2 -0.2885049 -1.154019 1.793022 6.173267
mix3_1 3 1 0.4 2.2 -0.2885049 -1.154019 1.793022 6.173267
knitr::kable(Sum3$mix_sum_x, booktabs = TRUE,
  caption = "Simulated Distributions of Mixture Independent Variables")
Table 37: Simulated Distributions of Mixture Independent Variables
Outcome X N Mean SD Median Min Max Skew Skurtosis Fifth Sixth
mix1_1 1 1 10000 0.4 2.19989 1.03944 -6.140646 5.801265 -0.2961194 -1.141168 1.808030 6.086474
mix3_1 3 1 10000 0.4 2.19989 1.05123 -5.487086 6.058566 -0.2866761 -1.152828 1.791524 6.199999
knitr::kable(Sum3$target_mix_e, booktabs = TRUE,
  caption = "Target Distributions of Mixture Error Terms")
Table 38: Target Distributions of Mixture Error Terms
Outcome Mean SD Skew Skurtosis Fifth Sixth
E1 1 1.163636 2.17086 2.013281 8.789539 36.48103 192.722
E2 2 1.163636 2.17086 2.013281 8.789539 36.48103 192.722
E3 3 1.163636 2.17086 2.013281 8.789539 36.48103 192.722
knitr::kable(Sum3$mix_sum_e, booktabs = TRUE,
  caption = "Simulated Distributions of Mixture Error Terms")
Table 39: Simulated Distributions of Mixture Error Terms
Outcome N Mean SD Median Min Max Skew Skurtosis Fifth Sixth
E1 1 10000 1.163636 2.170751 0.7763939 -7.337748 25.01447 2.027964 8.810501 39.02100 230.4321
E2 2 10000 1.163636 2.170751 0.8114017 -10.198298 17.51876 1.937148 8.312887 29.39519 114.5656
E3 3 10000 1.163636 2.170751 0.8072152 -9.407838 18.95352 1.920803 8.312115 30.29603 124.0343

Maximum Correlation Errors for X Variables by Outcome:

maxerr <- rbind(Sum3$maxerr[[1]][-2], Sum3$maxerr[[3]][-2])
rownames(maxerr) <- colnames(maxerr) <- c("Y1", "Y3")
knitr::kable(as.data.frame(maxerr), digits = 5, booktabs = TRUE,
  caption = "Maximum Correlation Errors for X Variables")
Table 40: Maximum Correlation Errors for X Variables
Y1 Y3
Y1 0.00177 0.00147
Y3 0.00147 0.00257

Step 5: Compare simulated correlations to theoretical values

knitr::kable(calc_corr_y(Sys3$betas, corr.x, corr.e, vars, mix_pis, 
  mix_mus, mix_sigmas, "mix"), booktabs = TRUE,
  caption = "Expected Y Correlations")
Table 41: Expected Y Correlations
Y1 Y2 Y3
Y1 1.0000000 0.1025981 0.3885224
Y2 0.1025981 1.0000000 0.0817148
Y3 0.3885224 0.0817148 1.0000000
knitr::kable(Sum3$rho.y, booktabs = TRUE, 
  caption = "Simulated Y Correlations")
Table 42: Simulated Y Correlations
Y1 Y2 Y3
Y1 1.0000000 0.0738147 0.3825892
Y2 0.0738147 1.0000000 0.0611834
Y3 0.3825892 0.0611834 1.0000000
knitr::kable(calc_corr_ye(Sys3$betas, corr.x, corr.e, vars, mix_pis, 
  mix_mus, mix_sigmas, "mix"), booktabs = TRUE,
  caption = "Expected Y, E Correlations")
Table 43: Expected Y, E Correlations
E1 E2 E3
Y1 0.8433983 0.1025981 0.1025981
Y2 0.1216485 1.0000000 0.1216485
Y3 0.0817148 0.0817148 0.6717288
knitr::kable(Sum3$rho.yemix, booktabs = TRUE,
  caption = "Simulated Y, E Correlations")
Table 44: Simulated Y, E Correlations
E1 E2 E3
Y1 0.8421445 0.0738147 0.0966026
Y2 0.0912775 1.0000000 0.0993515
Y3 0.0742094 0.0611834 0.6704631
knitr::kable(calc_corr_yx(Sys3$betas, corr.x, vars, mix_pis, 
  mix_mus, mix_sigmas, "mix"), booktabs = TRUE,
  caption = "Expected Y, X Correlations")
Table 45: Expected Y, X Correlations
X1_1 X1_2 X1_3 X1_4
Y1 0.3 0.3000000 0.3000000 0.3000000
Y2 0.0 0.0000000 0.0000000 0.0000000
Y3 0.5 0.2804878 0.2804878 0.2804878
X3_1 X3_2 X3_3 X3_4
Y1 0.3 0.2886793 0.2886793 0.2886793
Y2 0.0 0.0000000 0.0000000 0.0000000
Y3 0.5 0.5000000 0.5000000 0.5000000
knitr::kable(Sum3$rho.yx, booktabs = TRUE,
  caption = "Simulated Y, X Correlations")
Table 46: Simulated Y, X Correlations
X1_1 X1_2 X1_3 X1_4
Y1 0.2889926 0.2964159 0.2877044 0.3116951
Y2 0.0014158 -0.0077903 -0.0056745 -0.0025133
Y3 0.5030255 0.2748050 0.2790685 0.2803257
X3_1 X3_2 X3_3 X3_4
Y1 0.2889926 0.2873456 0.2900545 0.2873188
Y2 0.0014158 -0.0002223 -0.0097569 -0.0044247
Y3 0.5030255 0.4990494 0.4886098 0.5065214

References

Bates, Douglas, and Martin Maechler. 2017. Matrix: Sparse and Dense Matrix Classes and Methods. https://CRAN.R-project.org/package=Matrix.

Davenport, JW, JC Bezder, and RJ Hathaway. 1988. “Parameter Estimation for Finite Mixture Distributions.” Computers & Mathematics with Applications 15 (10): 819–28.

Everitt, B S. 1996. “An Introduction to Finite Mixture Distributions.” Statistical Methods in Medical Research 5 (2): 107–27. https://doi.org/10.1177/096228029600500202.

Fialkowski, A C. 2017. SimMultiCorrData: Simulation of Correlated Data with Multiple Variable Types. https://CRAN.R-project.org/package=SimMultiCorrData.

———. 2018. SimCorrMix: Simulation of Correlated Data with Multiple Variable Types Including Continuous and Count Mixture Distributions. https://CRAN.R-project.org/package=SimCorrMix.

Fleishman, A I. 1978. “A Method for Simulating Non-Normal Distributions.” Psychometrika 43: 521–32. https://doi.org/10.1007/BF02293811.

Headrick, T C. 2002. “Fast Fifth-Order Polynomial Transforms for Generating Univariate and Multivariate Non-Normal Distributions.” Computational Statistics and Data Analysis 40 (4): 685–711. https://doi.org/10.1016/S0167-9473(02)00072-5.

Headrick, T C, and T M Beasley. 2004. “A Method for Simulating Correlated Non-Normal Systems of Statistical Equations.” Communications in Statistics - Simulation and Computation 33 (1): 19–33. http://dx.doi.org/10.1081/SAC-120028431.

Higham, N. 2002. “Computing the Nearest Correlation Matrix - a Problem from Finance.” IMA Journal of Numerical Analysis 22 (3): 329–43. https://doi.org/10.1093/imanum/22.3.329.

Maree, S. 2012. “Correcting Non Positive Definite Correlation Matrices.” BSc Thesis Applied Mathematics, Delft University of Technology. http://resolver.tudelft.nl/uuid:2175c274-ab03-4fd5-85a9-228fe421cdbf.

Pearson, RK. 2011. “Exploring Data in Engineering, the Sciences, and Medicine.” In. New York: Oxford University Press.

Shork, N J, D B Allison, and B Thiel. 1996. “Mixture Distributions in Human Genetics Research.” Statistical Methods in Medical Research 5: 155–78. https://doi.org/10.1177/096228029600500204.

Yin, JF, and Y Zhang. 2013. “Alternative Gradient Algorithms for Computing the Nearest Correlation Matrix.” Applied Mathematics and Computation 219 (14): 7591–9. https://doi.org/10.1016/j.amc.2013.01.045.

Zhang, Y, and JF Yin. Year not provided. “Modified Alternative Gradients Algorithm for Computing the Nearest Correlation Matrix.” Internal paper. Shanghai: Tongji University.