Systems of statistical equations with correlated variables are common in clinical and genetic studies. These systems may represent repeated measurements or clustered observations. Headrick and Beasley (2004) developed a method for simulating correlated systems of equations containing normal or non-normal continuous variables. The technique allows the user to specify the distributions of the independent variables $$X_{(pj)}$$, for $$p = 1, ..., M$$ equations, and stochastic disturbance (error) terms $$E$$. The user also controls the correlations between: 1) independent variables $$X_{(pj)}$$ for a given outcome $$Y_p$$, 2) an outcome $$Y_p$$ and its $$X_{(pj)}$$ terms, and 3) stochastic disturbance (error) terms $$E$$. Their technique calculates the beta (slope) coefficients based on correlations between independent variables $$X_{(pj)}$$ for a given outcome $$Y_p$$, the correlations between that outcome $$Y_p$$ and the $$X_{(pj)}$$ terms, and the variances. Headrick and Beasley (2004) also derived equations to calculate the expected correlations based on these betas between: 1) $$X_{(pj)}$$ terms and outcome $$Y_q$$, 2) outcomes $$Y_p$$ and $$Y_q$$, and 3) outcome $$Y_p$$ and error term $$E_q$$, for $$p$$ not equal to $$q$$. All continuous variables are generated from intermediate random normal variables using Headrick (2002)’s fifth-order power transformation method (PMT). The intermediate correlations required to generate the target correlations after variable transformations are calculated using Headrick (2002)’s equation.

The package SimRepeat extends Headrick and Beasley (2004)’s method to include continuous variables (independent variables or error terms) that have mixture distributions. The nonnormsys function simulates the system of equations containing correlated continuous normal or non-normal variables with non-mixture or mixture distributions. Mixture distributions are generated using the techniques of Fialkowski (2018). Mixture distributions describe random variables that are drawn from more than one component distribution. Mixture distributions provide a useful way for describing heterogeneity in a population, especially when an outcome is a composite response from multiple sources. For a random variable $$Y$$ from a finite mixture distribution with $$k$$ components, the PDF can be described by: $h_{Y}(y) = \sum_{i=1}^{k} \pi_{i} f_{Y_{i}}(y),$ where $$\sum_{i=1}^{k} \pi_{i} = 1$$. The $$\pi_{i}$$ are mixing parameters which determine the weight of each component distribution $$f_{Y_{i}}(y)$$ in the overall probability distribution. The overall distribution $$h_{Y}(y)$$ has a valid PDF if each component distribution has a valid PDF. The main assumption is statistical independence between the process of randomly selecting the component distribution and the distributions themselves. Assume there is a random selection process that first generates the numbers $$1,\ ...,\ k$$ with probabilities $$\pi_{1},\ ...,\ \pi_{k}$$. After selecting number $$i$$, where $$1 \leq i \leq k$$, a random variable $$y$$ is drawn from component distribution $$f_{Y_{i}}(y)$$ (Davenport, Bezder, and Hathaway 1988; Shork, Allison, and Thiel 1996; Everitt 1996; Pearson 2011).

The package SimCorrMix contains vignettes describing mixture variables. Continuous mixture variables are generated componentwise and then transformed to the desired mixture variables using random multinomial variables generated based on mixing probabilities. The correlation matrices are specified in terms of correlations with components of the mixture variables. Headrick and Beasley (2004)’s equations have been adapted in SimRepeat to permit mixture variables. These are contained in the functions calc_betas, calc_corr_y, calc_corr_ye, and calc_corr_yx.

The package SimRepeat also allows the user to specify the correlations between independent variables across outcomes (i.e., $$X_{(pj)}$$ and $$X_{(qj)}$$ for $$p$$ not equal to $$q$$). This allows imposing a specific correlation structure (i.e., AR(1) or Toeplitz) on time-varying covariates. Independent variables can be designated as the same across outcomes (i.e., to include a stationary covariate as in height). Continuous variables are generated using the techniques of Fialkowski (2017), which permits either Fleishman (1978)’s third-order (method = "Fleishman") or Headrick (2002)’s fifth-order (method = "Polynomial") PMT. When using the fifth-order method, sixth cumulant corrections can be specified in order to obtain random variables with valid probability distribution functions (PDF’s). The error loop from SimCorrMix can be used within the nonnormsys function to attempt to correct the final correlations among all $$X$$ terms to be within a user-specified precision value (epsilon) of the target correlations.

The following examples demonstrate usage of the nonnormsys function. Example 1 is Headrick and Beasley (2004)’s example from their paper. Example 2 gives a system with non-mixture and mixture variables. Example 3 gives a system with missing variables. Details about theory and equations are in the Theory and Equations for Correlated Systems of Continuous Variables vignette.

Example 1: System of three equations for 2 independent variables

In this example taken from Headrick and Beasley (2004):

1. $$X_{11}, X_{21}, X_{31}$$ each have Exponential(2) distribution

2. $$X_{12}, X_{22}, X_{32}$$ each have Laplace(0, 1) distribution

3. $$E_1, E_2, E_3$$ each have Cauchy(0, 1) distribution (the standardized cumulants were taken from the paper)

$$$\begin{split} Y_1 &= \beta_{10} + \beta_{11} * X_{11} + \beta_{12} * X_{12} + E_1 \\ Y_2 &= \beta_{20} + \beta_{21} * X_{21} + \beta_{22} * X_{22} + E_2 \\ Y_3 &= \beta_{30} + \beta_{31} * X_{31} + \beta_{32} * X_{32} + E_3 \end{split} \tag{1}$$$

Each term was generated with zero mean and unit variance. All intercepts were set at $$0$$. The correlations were given as:

1. $$\rho_{X_{11}, X_{12}} = 0.10$$, $$\rho_{X_{21}, X_{22}} = 0.35$$, $$\rho_{X_{31}, X_{32}} = 0.70$$

2. $$\rho_{Y_1, X_{11}} = \rho_{Y_1, X_{12}} = 0.40$$, $$\rho_{Y_2, X_{21}} = \rho_{Y_2, X_{22}} = 0.50$$, $$\rho_{Y_3, X_{31}} = \rho_{Y_3, X_{32}} = 0.60$$

3. $$\rho_{E_1, E_2} = \rho_{E_1, E_3} = \rho_{E_2, E_3} = 0.40$$

The across outcome correlations between $$X$$ terms were set as below in order to obtain the same results as in the paper. No sixth cumulant corrections were used, although the Laplace(0, 1) distribution requires a correction of $$25.14$$ in order to have a valid PDF.

Step 1: Set up parameter inputs

library("SimRepeat")
library("printr")
options(scipen = 999)
seed <- 111
n <- 10000
M <- 3

method <- "Polynomial"
Stcum1 <- calc_theory("Exponential", 2)
Stcum2 <- calc_theory("Laplace", c(0, 1))
Stcum3 <- c(0, 1, 0, 25, 0, 1500)

means <- lapply(seq_len(M), function(x) rep(0, 3))
vars <- lapply(seq_len(M), function(x) rep(1, 3))

skews <- lapply(seq_len(M), function(x) c(Stcum1[3], Stcum2[3], Stcum3[3]))
skurts <- lapply(seq_len(M), function(x) c(Stcum1[4], Stcum2[4], Stcum3[4]))
fifths <- lapply(seq_len(M), function(x) c(Stcum1[5], Stcum2[5], Stcum3[5]))
sixths <- lapply(seq_len(M), function(x) c(Stcum1[6], Stcum2[6], Stcum3[6]))
Six <- list()

# Note the following would be used to obtain all valid PDF's
# Six <- lapply(seq_len(M), function(x) list(NULL, 25.14, NULL))

betas.0 <- 0
corr.x <- list()
corr.x[[1]] <- corr.x[[2]] <- corr.x[[3]] <- list()
corr.x[[1]][[1]] <- matrix(c(1, 0.1, 0.1, 1), nrow = 2, ncol = 2)
corr.x[[1]][[2]] <- matrix(c(0.1974318, 0.1859656, 0.1879483, 0.1858601),
2, 2, byrow = TRUE)
corr.x[[1]][[3]] <- matrix(c(0.2873190, 0.2589830, 0.2682057, 0.2589542),
2, 2, byrow = TRUE)
corr.x[[2]][[1]] <- t(corr.x[[1]][[2]])
corr.x[[2]][[2]] <- matrix(c(1, 0.35, 0.35, 1), nrow = 2, ncol = 2)
corr.x[[2]][[3]] <- matrix(c(0.5723303, 0.4883054, 0.5004441, 0.4841808),
2, 2, byrow = TRUE)
corr.x[[3]][[1]] <- t(corr.x[[1]][[3]])
corr.x[[3]][[2]] <- t(corr.x[[2]][[3]])
corr.x[[3]][[3]] <- matrix(c(1, 0.7, 0.7, 1), nrow = 2, ncol = 2)

corr.yx <- list(matrix(c(0.4, 0.4), 1), matrix(c(0.5, 0.5), 1),
matrix(c(0.6, 0.6), 1))
corr.e <- matrix(0.4, nrow = M, ncol = M)
diag(corr.e) <- 1
error_type <- "non_mix"

Step 2: Check parameter inputs

checkpar(M, method, error_type, means, vars, skews, skurts, fifths, sixths,
Six, betas.0 = betas.0, corr.x = corr.x, corr.yx = corr.yx, corr.e = corr.e,
quiet = TRUE)
## [1] TRUE

Step 3: Generate system

Sys1 <- nonnormsys(n, M, method, error_type, means, vars, skews, skurts,
fifths, sixths, Six, betas.0 = betas.0, corr.x = corr.x, corr.yx = corr.yx,
corr.e = corr.e, seed = seed, quiet = TRUE)
## Total Simulation time: 0.042 minutes
knitr::kable(Sys1$betas, booktabs = TRUE, caption = "Beta coefficients") Table 1: Beta coefficients B1 B2 Y1 0.4318335 0.4318335 Y2 0.4667600 0.4667598 Y3 0.4648507 0.4648503 knitr::kable(Sys1$constants[[1]], booktabs = TRUE, caption = "PMT constants")
Table 2: PMT constants
c0 c1 c2 c3 c4 c5
-0.3077396 0.8005605 0.318764 0.0335001 -0.0036748 0.0001587
0.0000000 0.7277089 0.000000 0.0963035 0.0000000 -0.0022321
0.0000000 -0.1596845 0.000000 0.3550358 0.0000000 -0.0094729
Sys1$valid.pdf ## [[1]] ## [1] "TRUE" "FALSE" "FALSE" ## ## [[2]] ## [1] "TRUE" "FALSE" "FALSE" ## ## [[3]] ## [1] "TRUE" "FALSE" "FALSE" Step 4: Describe results Sum1 <- summary_sys(Sys1$Y, Sys1$E, E_mix = NULL, Sys1$X, Sys1$X_all, M, method, means, vars, skews, skurts, fifths, sixths, corr.x = corr.x, corr.e = corr.e) knitr::kable(Sum1$cont_sum_y, booktabs = TRUE,
caption = "Simulated Distributions of Outcomes")
Table 3: Simulated Distributions of Outcomes
Outcome N Mean SD Median Min Max Skew Skurtosis Fifth Sixth
Y1 1 10000 0.0071862 1.184474 -0.0735469 -12.27685 9.139193 0.0593231 10.859887 -15.709407 344.2901
Y2 2 10000 0.0029877 1.253034 -0.0880282 -10.82648 10.005708 0.3167011 9.653009 4.236132 271.6446
Y3 3 10000 0.0100362 1.308292 -0.1012901 -12.35040 14.358996 0.4478936 10.291990 10.681747 419.8943
knitr::kable(Sum1$target_sum_x, booktabs = TRUE, caption = "Target Distributions of Independent Variables") Table 4: Target Distributions of Independent Variables Outcome X Mean SD Skew Skurtosis Fifth Sixth cont1_1 1 1 0 1 2 6 24 120 cont1_2 1 2 0 1 0 3 0 30 cont2_1 2 1 0 1 2 6 24 120 cont2_2 2 2 0 1 0 3 0 30 cont3_1 3 1 0 1 2 6 24 120 cont3_2 3 2 0 1 0 3 0 30 knitr::kable(Sum1$cont_sum_x, booktabs = TRUE,
caption = "Simulated Distributions of Independent Variables")
Table 5: Simulated Distributions of Independent Variables
Outcome X N Mean SD Median Min Max Skew Skurtosis Fifth Sixth
cont1_1 1 1 10000 -0.0008979 0.9972631 -0.2978603 -1.328231 9.250562 2.0468429 6.364092 25.635123 119.57370
cont1_2 1 2 10000 -0.0011300 1.0028701 0.0063327 -7.066301 5.654474 -0.1199282 3.037337 -3.488146 27.09002
cont2_1 2 1 10000 -0.0010172 0.9928807 -0.2910767 -1.291738 8.621562 2.0224749 6.245030 24.969443 112.32254
cont2_2 2 2 10000 -0.0008947 1.0008280 -0.0017667 -7.616110 6.442554 -0.0951022 3.630147 -4.626091 60.43407
cont3_1 3 1 10000 -0.0002394 0.9984306 -0.3077666 -1.578368 9.545770 2.0078250 6.085216 24.522409 120.55127
cont3_2 3 2 10000 -0.0003199 0.9874384 -0.0012571 -6.035376 5.227375 -0.0527893 2.236968 -1.027774 12.76165
knitr::kable(Sum1$target_sum_e, booktabs = TRUE, caption = "Target Distributions of Error Terms") Table 6: Target Distributions of Error Terms Outcome Mean SD Skew Skurtosis Fifth Sixth E1 1 0 1 0 25 0 1500 E2 2 0 1 0 25 0 1500 E3 3 0 1 0 25 0 1500 knitr::kable(Sum1$cont_sum_e, booktabs = TRUE,
caption = "Simulated Distributions of Error Terms")
Table 7: Simulated Distributions of Error Terms
Outcome N Mean SD Median Min Max Skew Skurtosis Fifth Sixth
E1 1 10000 0.0080619 0.9935061 -0.0004688 -11.77427 8.960461 -0.1369344 21.30282 -35.984933 965.4108
E2 2 10000 0.0038802 0.9978020 -0.0012622 -13.02425 10.362046 0.1759016 24.93013 1.313834 1463.0695
E3 3 10000 0.0102962 0.9951096 0.0001988 -12.04561 12.090892 0.4188715 26.37023 26.936516 1586.3607

Maximum Correlation Errors for X Variables by Outcome:

maxerr <- do.call(rbind, Sum1$maxerr) rownames(maxerr) <- colnames(maxerr) <- paste("Y", 1:M, sep = "") knitr::kable(as.data.frame(maxerr), digits = 5, booktabs = TRUE, caption = "Maximum Correlation Errors for X Variables") Table 8: Maximum Correlation Errors for X Variables Y1 Y2 Y3 Y1 0.00413 0.00205 0.00575 Y2 0.00205 0.00495 0.00541 Y3 0.00575 0.00541 0.00516 Step 5: Compare simulated correlations to theoretical values knitr::kable(calc_corr_y(Sys1$betas, corr.x, corr.e, vars),
booktabs = TRUE, caption = "Expected Y Correlations")
Table 9: Expected Y Correlations
Y1 Y2 Y3
Y1 1.0000000 0.3692526 0.3935111
Y2 0.3692526 1.0000000 0.5083399
Y3 0.3935111 0.5083399 1.0000000
knitr::kable(Sum1$rho.y, booktabs = TRUE, caption = "Simulated Y Correlations") Table 10: Simulated Y Correlations Y1 Y2 Y3 Y1 1.0000000 0.3741871 0.4015689 Y2 0.3741871 1.0000000 0.5019800 Y3 0.4015689 0.5019800 1.0000000 knitr::kable(calc_corr_ye(Sys1$betas, corr.x, corr.e, vars),
booktabs = TRUE, caption = "Expected Y, E Correlations")
Table 11: Expected Y, E Correlations
E1 E2 E3
Y1 0.8420754 0.3368301 0.3368301
Y2 0.3173969 0.7934921 0.3173969
Y3 0.3037028 0.3037028 0.7592569
knitr::kable(Sum1$rho.ye, booktabs = TRUE, caption = "Simulated Y, E Correlations") Table 12: Simulated Y, E Correlations E1 E2 E3 Y1 0.8405035 0.3422090 0.3489198 Y2 0.3218245 0.7931439 0.3098100 Y3 0.3100882 0.2917064 0.7603231 knitr::kable(calc_corr_yx(Sys1$betas, corr.x, vars),
booktabs = TRUE, caption = "Expected Y, X Correlations")
Table 13: Expected Y, X Correlations
X1_1 X1_2
Y1 0.4000000 0.4000000
Y2 0.1419990 0.1384475
Y3 0.1928124 0.1860563
X2_1 X2_2
Y1 0.1401382 0.1352093
Y2 0.5000000 0.4999998
Y3 0.3743418 0.3475145
X3_1 X3_2
Y1 0.2020090 0.1883408
Y2 0.3973238 0.3601800
Y3 0.5999997 0.5999996
knitr::kable(Sum1$rho.yx, booktabs = TRUE, caption = "Simulated Y, X Correlations") Table 14: Simulated Y, X Correlations X1_1 X1_2 Y1 0.4013407 0.4077626 Y2 0.1413510 0.1445654 Y3 0.1980463 0.1900016 X2_1 X2_2 Y1 0.1408618 0.1330024 Y2 0.4987203 0.4934377 Y3 0.3818031 0.3446263 X3_1 X3_2 Y1 0.1976696 0.1882595 Y2 0.3899509 0.3648209 Y3 0.5952966 0.5999793 Example 2: System of 3 equations, each with 2 continuous non-mixture covariates and 1 mixture covariate In this example: 1. $$X_{11} = X_{21} = X_{31}$$ has a Logistic(0, 1) distribution and is the same variable for each Y (static covariate) 2. $$X_{12}, X_{22}, X_{32}$$ each have a Beta(4, 1.5) distribution 3. $$X_{13}, X_{23}, X_{33}$$ each have a Normal(-2, 1), Normal(2, 1) mixture distribution 4. $$E_1, E_2, E_3$$ each have a Logistic(0, 1), Chisq(4), Beta(4, 1.5) mixture distribution $$$\begin{split} Y_1 &= \beta_{10} + \beta_{11} * X_{11} + \beta_{12} * X_{12} + \beta_{13} * X_{13} + E_1 \\ Y_2 &= \beta_{20} + \beta_{21} * X_{11} + \beta_{22} * X_{22} + \beta_{23} * X_{23} + E_2 \\ Y_3 &= \beta_{30} + \beta_{31} * X_{11} + \beta_{32} * X_{32} + \beta_{33} * X_{33} + E_3 \end{split} \tag{2}$$$ Step 1: Set up parameter inputs seed <- 276 n <- 10000 M <- 3 method <- "Polynomial" L <- calc_theory("Logistic", c(0, 1)) C <- calc_theory("Chisq", 4) B <- calc_theory("Beta", c(4, 1.5)) skews <- lapply(seq_len(M), function(x) c(L[3], B[3])) skurts <- lapply(seq_len(M), function(x) c(L[4], B[4])) fifths <- lapply(seq_len(M), function(x) c(L[5], B[5])) sixths <- lapply(seq_len(M), function(x) c(L[6], B[6])) Six <- lapply(seq_len(M), function(x) list(1.75, 0.03)) mix_pis <- lapply(seq_len(M), function(x) list(c(0.4, 0.6), c(0.3, 0.2, 0.5))) mix_mus <- lapply(seq_len(M), function(x) list(c(-2, 2), c(L[1], C[1], B[1]))) mix_sigmas <- lapply(seq_len(M), function(x) list(c(1, 1), c(L[2], C[2], B[2]))) mix_skews <- lapply(seq_len(M), function(x) list(c(0, 0), c(L[3], C[3], B[3]))) mix_skurts <- lapply(seq_len(M), function(x) list(c(0, 0), c(L[4], C[4], B[4]))) mix_fifths <- lapply(seq_len(M), function(x) list(c(0, 0), c(L[5], C[5], B[5]))) mix_sixths <- lapply(seq_len(M), function(x) list(c(0, 0), c(L[6], C[6], B[6]))) mix_Six <- lapply(seq_len(M), function(x) list(NULL, NULL, 1.75, NULL, 0.03)) Nstcum <- calc_mixmoments(mix_pis[[1]][[1]], mix_mus[[1]][[1]], mix_sigmas[[1]][[1]], mix_skews[[1]][[1]], mix_skurts[[1]][[1]], mix_fifths[[1]][[1]], mix_sixths[[1]][[1]]) Mstcum <- calc_mixmoments(mix_pis[[2]][[2]], mix_mus[[2]][[2]], mix_sigmas[[2]][[2]], mix_skews[[2]][[2]], mix_skurts[[2]][[2]], mix_fifths[[2]][[2]], mix_sixths[[2]][[2]]) means <- lapply(seq_len(M), function(x) c(L[1], B[1], Nstcum[1], Mstcum[1])) vars <- lapply(seq_len(M), function(x) c(L[2]^2, B[2]^2, Nstcum[2]^2, Mstcum[2]^2)) same.var <- 1 betas.0 <- 0 corr.x <- list() corr.x[[1]] <- list(matrix(0.1, 4, 4), matrix(0.2, 4, 4), matrix(0.3, 4, 4)) diag(corr.x[[1]][[1]]) <- 1 # set correlations between components of the same mixture variable to 0 corr.x[[1]][[1]][3:4, 3:4] <- diag(2) # since X1 is the same across outcomes, set correlation the same corr.x[[1]][[2]][, same.var] <- corr.x[[1]][[3]][, same.var] <- corr.x[[1]][[1]][, same.var] corr.x[[2]] <- list(t(corr.x[[1]][[2]]), matrix(0.35, 4, 4), matrix(0.4, 4, 4)) diag(corr.x[[2]][[2]]) <- 1 # set correlations between components of the same mixture variable to 0 corr.x[[2]][[2]][3:4, 3:4] <- diag(2) # since X1 is the same across outcomes, set correlation the same corr.x[[2]][[2]][, same.var] <- corr.x[[2]][[3]][, same.var] <- t(corr.x[[1]][[2]][same.var, ]) corr.x[[2]][[3]][same.var, ] <- corr.x[[1]][[3]][same.var, ] corr.x[[2]][[2]][same.var, ] <- t(corr.x[[2]][[2]][, same.var]) corr.x[[3]] <- list(t(corr.x[[1]][[3]]), t(corr.x[[2]][[3]]), matrix(0.5, 4, 4)) diag(corr.x[[3]][[3]]) <- 1 # set correlations between components of the same mixture variable to 0 corr.x[[3]][[3]][3:4, 3:4] <- diag(2) # since X1 is the same across outcomes, set correlation the same corr.x[[3]][[3]][, same.var] <- t(corr.x[[1]][[3]][same.var, ]) corr.x[[3]][[3]][same.var, ] <- t(corr.x[[3]][[3]][, same.var]) corr.yx <- list(matrix(0.3, 1, 4), matrix(0.4, 1, 4), matrix(0.5, 1, 4)) corr.e <- matrix(0.4, 9, 9) corr.e[1:3, 1:3] <- corr.e[4:6, 4:6] <- corr.e[7:9, 7:9] <- diag(3) error_type = "mix" Step 2: Check parameter inputs checkpar(M, method, error_type, means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, same.var = same.var, betas.0 = betas.0, corr.x = corr.x, corr.yx = corr.yx, corr.e = corr.e, quiet = TRUE) ## [1] TRUE Step 3: Generate system Note that use.nearPD = FALSE and adjgrad = FALSE so that negative eigen-values will be replaced with eigmin (default $$0$$) instead of using the nearest positive-definite matrix (found with Bates and Maechler (2017)’s Matrix::nearPD function by Higham (2002)’s algorithm) or the adjusted gradient updating method via adj_grad (Yin and Zhang 2013; Zhang and Yin Year not provided; Maree 2012). Sys2 <- nonnormsys(n, M, method, error_type, means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, same.var, betas.0, corr.x, corr.yx, corr.e, seed, use.nearPD = FALSE, quiet = TRUE) ## Total Simulation time: 0.014 minutes knitr::kable(Sys2$betas, booktabs = TRUE, caption = "Beta coefficients")
Table 15: Beta coefficients
B1 B2 B3 B4
Y1 0.3213032 3.336166 0.6556270 0.6556270
Y2 0.3881446 2.223553 0.8415819 0.8415821
Y3 0.4345749 0.000000 1.3794056 1.3794056
knitr::kable(Sys2$constants[[1]], booktabs = TRUE, caption = "PMT constants") Table 16: PMT constants c0 c1 c2 c3 c4 c5 0.0000000 0.8879258 0.0000000 0.0360523 0.0000000 0.0000006 0.1629657 1.0899841 -0.1873287 -0.0449503 0.0081210 0.0014454 0.0000000 1.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.8879258 0.0000000 0.0360523 0.0000000 0.0000006 -0.2275081 0.9007156 0.2316099 0.0154662 -0.0013673 0.0000552 0.1629657 1.0899841 -0.1873287 -0.0449503 0.0081210 0.0014454 Sys2$valid.pdf
## [[1]]
## [1] "TRUE" "TRUE" "TRUE" "TRUE" "TRUE" "TRUE" "TRUE"
##
## [[2]]
## [1] "TRUE" "TRUE" "TRUE" "TRUE" "TRUE" "TRUE" "TRUE"
##
## [[3]]
## [1] "TRUE" "TRUE" "TRUE" "TRUE" "TRUE" "TRUE" "TRUE"

Step 4: Describe results

Sum2 <- summary_sys(Sys2$Y, Sys2$E, Sys2$E_mix, Sys2$X, Sys2$X_all, M, method, means, vars, skews, skurts, fifths, sixths, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, corr.x = corr.x, corr.e = corr.e) knitr::kable(Sum2$cont_sum_y, booktabs = TRUE,
caption = "Simulated Distributions of Outcomes")
Table 17: Simulated Distributions of Outcomes
Outcome N Mean SD Median Min Max Skew Skurtosis Fifth Sixth
Y1 1 10000 3.589862 2.576976 3.365901 -8.590069 23.70402 1.1030234 4.039389 11.893356 42.501607
Y2 2 10000 2.781013 2.779635 2.584989 -7.634977 22.73812 0.9484330 3.323180 10.374270 37.360759
Y3 3 10000 1.163812 3.226658 1.024133 -11.625259 18.97849 0.5523652 1.714578 3.843356 9.620764
knitr::kable(Sum2$target_sum_x, booktabs = TRUE, caption = "Target Distributions of Independent Variables") Table 18: Target Distributions of Independent Variables Outcome X Mean SD Skew Skurtosis Fifth Sixth cont1_1 1 1 0.0000000 1.8137994 0.0000000 1.2000000 0.000000 6.857143 cont1_2 1 2 0.7272727 0.1746854 -0.6938789 -0.0686029 1.828251 -3.379110 cont1_3 1 3 -2.0000000 1.0000000 0.0000000 0.0000000 0.000000 0.000000 cont1_4 1 4 2.0000000 1.0000000 0.0000000 0.0000000 0.000000 0.000000 cont2_1 2 1 0.0000000 1.8137994 0.0000000 1.2000000 0.000000 6.857143 cont2_2 2 2 0.7272727 0.1746854 -0.6938789 -0.0686029 1.828251 -3.379110 cont2_3 2 3 -2.0000000 1.0000000 0.0000000 0.0000000 0.000000 0.000000 cont2_4 2 4 2.0000000 1.0000000 0.0000000 0.0000000 0.000000 0.000000 cont3_1 3 1 0.0000000 1.8137994 0.0000000 1.2000000 0.000000 6.857143 cont3_2 3 2 0.7272727 0.1746854 -0.6938789 -0.0686029 1.828251 -3.379110 cont3_3 3 3 -2.0000000 1.0000000 0.0000000 0.0000000 0.000000 0.000000 cont3_4 3 4 2.0000000 1.0000000 0.0000000 0.0000000 0.000000 0.000000 knitr::kable(Sum2$cont_sum_x, booktabs = TRUE,
caption = "Simulated Distributions of Independent Variables")
Table 19: Simulated Distributions of Independent Variables
Outcome X N Mean SD Median Min Max Skew Skurtosis Fifth Sixth
cont1_1 1 1 10000 0.0004036 1.8201505 0.0211754 -9.1086030 11.353203 0.0314930 1.3636885 0.8191860 7.6044595
cont1_2 1 2 10000 0.7272107 0.1743913 0.7553300 0.0765447 1.176209 -0.6805537 -0.0960855 1.7904701 -2.9223486
cont1_3 1 3 10000 -2.0000000 0.9999500 -2.0027696 -5.8445854 1.692666 0.0009459 0.0193741 0.0376516 -0.0283331
cont1_4 1 4 10000 2.0000000 0.9999500 2.0095202 -1.8793929 5.929108 -0.0006385 0.0089325 0.0295979 -0.3287443
cont2_1 2 1 10000 0.0004036 1.8201505 0.0211754 -9.1086030 11.353203 0.0314930 1.3636885 0.8191860 7.6044595
cont2_2 2 2 10000 0.7273131 0.1749365 0.7548160 0.0850562 1.108987 -0.6935349 -0.0667186 1.7992307 -3.3734961
cont2_3 2 3 10000 -2.0000000 0.9999500 -1.9899973 -6.0984500 1.691800 -0.0387498 0.0458874 0.1134769 -0.4972766
cont2_4 2 4 10000 2.0000000 0.9999500 2.0110466 -2.1448770 5.491031 -0.0455281 0.0073620 -0.0195780 -0.2759494
cont3_1 3 1 10000 0.0004036 1.8201505 0.0211754 -9.1086030 11.353203 0.0314930 1.3636885 0.8191860 7.6044595
cont3_2 3 2 10000 0.7274490 0.1748960 0.7565852 0.0747992 1.071272 -0.7079415 -0.0175061 1.7502794 -3.6178723
cont3_3 3 3 10000 -2.0000000 0.9999500 -2.0047639 -6.1416294 1.976896 -0.0137807 0.0222297 0.0420496 -0.1082238
cont3_4 3 4 10000 2.0000000 0.9999500 1.9920150 -1.9378583 5.545766 -0.0258160 -0.0029318 -0.2124639 -0.2961942
knitr::kable(Sum2$target_mix_x, booktabs = TRUE, caption = "Target Distributions of Mixture Independent Variables") Table 20: Target Distributions of Mixture Independent Variables Outcome X Mean SD Skew Skurtosis Fifth Sixth mix1_1 1 1 0.4 2.2 -0.2885049 -1.154019 1.793022 6.173267 mix2_1 2 1 0.4 2.2 -0.2885049 -1.154019 1.793022 6.173267 mix3_1 3 1 0.4 2.2 -0.2885049 -1.154019 1.793022 6.173267 knitr::kable(Sum2$mix_sum_x, booktabs = TRUE,
caption = "Simulated Distributions of Mixture Independent Variables")
Table 21: Simulated Distributions of Mixture Independent Variables
Outcome X N Mean SD Median Min Max Skew Skurtosis Fifth Sixth
mix1_1 1 1 10000 0.4 2.19989 1.045485 -5.445049 5.916135 -0.2731288 -1.168640 1.738283 6.351654
mix2_1 2 1 10000 0.4 2.19989 1.057698 -6.054823 5.473999 -0.2947029 -1.156161 1.804279 6.198049
mix3_1 3 1 10000 0.4 2.19989 1.058490 -6.162909 5.538341 -0.3087157 -1.129261 1.847631 5.899136
Nplot <- plot_simpdf_theory(sim_y = Sys2$X_all[[1]][, 3], ylower = -10, yupper = 10, title = "PDF of X_11: Mixture of N(-2, 1) and N(2, 1) Distributions", fx = function(x) mix_pis[[1]][[1]][1] * dnorm(x, mix_mus[[1]][[1]][1], mix_sigmas[[1]][[1]][1]) + mix_pis[[1]][[1]][2] * dnorm(x, mix_mus[[1]][[1]][2], mix_sigmas[[1]][[1]][2]), lower = -Inf, upper = Inf) Nplot knitr::kable(Sum2$target_mix_e, booktabs = TRUE,
caption = "Target Distributions of Mixture Error Terms")
Table 22: Target Distributions of Mixture Error Terms
Outcome Mean SD Skew Skurtosis Fifth Sixth
E1 1 1.163636 2.17086 2.013281 8.789539 36.48103 192.722
E2 2 1.163636 2.17086 2.013281 8.789539 36.48103 192.722
E3 3 1.163636 2.17086 2.013281 8.789539 36.48103 192.722
knitr::kable(Sum2$mix_sum_e, booktabs = TRUE, caption = "Simulated Distributions of Mixture Error Terms") Table 23: Simulated Distributions of Mixture Error Terms Outcome N Mean SD Median Min Max Skew Skurtosis Fifth Sixth E1 1 10000 1.163636 2.170751 0.7848816 -9.673026 23.10077 1.947736 8.372072 32.80773 167.3890 E2 2 10000 1.163636 2.170751 0.7845759 -9.483456 21.01385 2.079222 9.163271 36.52046 169.1260 E3 3 10000 1.163636 2.170751 0.8101482 -7.411084 20.11514 1.921233 8.176876 32.02805 146.7574 Mplot <- plot_simpdf_theory(sim_y = Sys2$E_mix[, 1],
title = paste("PDF of E_1: Mixture of Logistic(0, 1), Chisq(4),",
"\nand Beta(4, 1.5) Distributions", sep = ""),
fx = function(x) mix_pis[[1]][[2]][1] * dlogis(x, 0, 1) +
mix_pis[[1]][[2]][2] * dchisq(x, 4) + mix_pis[[1]][[2]][3] *
dbeta(x, 4, 1.5),
lower = -Inf, upper = Inf)
Mplot

Maximum Correlation Errors for X Variables by Outcome:

maxerr <- do.call(rbind, Sum2$maxerr) rownames(maxerr) <- colnames(maxerr) <- paste("Y", 1:M, sep = "") knitr::kable(as.data.frame(maxerr), digits = 5, booktabs = TRUE, caption = "Maximum Correlation Errors for X Variables") Table 24: Maximum Correlation Errors for X Variables Y1 Y2 Y3 Y1 0.00288 0.00405 0.00521 Y2 0.00405 0.00170 0.00341 Y3 0.00521 0.00341 0.00409 Step 5: Compare simulated correlations to theoretical values Since the system contains mixture variables, the mixture parameter inputs mix_pis, mix_mus, and mix_sigmas and the error_type must also be used. knitr::kable(calc_corr_y(Sys2$betas, corr.x, corr.e, vars, mix_pis,
mix_mus, mix_sigmas, "mix"), booktabs = TRUE,
caption = "Expected Y Correlations")
Table 25: Expected Y Correlations
Y1 Y2 Y3
Y1 1.0000000 0.2978052 0.3885223
Y2 0.2978052 1.0000000 0.4785063
Y3 0.3885223 0.4785063 1.0000000
knitr::kable(Sum2$rho.y, booktabs = TRUE, caption = "Simulated Y Correlations") Table 26: Simulated Y Correlations Y1 Y2 Y3 Y1 1.0000000 0.2838656 0.3958264 Y2 0.2838656 1.0000000 0.4717876 Y3 0.3958264 0.4717876 1.0000000 knitr::kable(calc_corr_ye(Sys2$betas, corr.x, corr.e, vars, mix_pis,
mix_mus, mix_sigmas, "mix"), booktabs = TRUE,
caption = "Expected Y, E Correlations")
Table 27: Expected Y, E Correlations
E1 E2 E3
Y1 0.8433984 0.1025981 0.1025981
Y2 0.0944555 0.7764627 0.0944555
Y3 0.0817148 0.0817148 0.6717288
knitr::kable(Sum2$rho.yemix, booktabs = TRUE, caption = "Simulated Y, E Correlations") Table 28: Simulated Y, E Correlations E1 E2 E3 Y1 0.8433998 0.0788837 0.1088860 Y2 0.0857085 0.7728459 0.0893955 Y3 0.0881229 0.0692333 0.6696976 knitr::kable(calc_corr_yx(Sys2$betas, corr.x, vars, mix_pis,
mix_mus, mix_sigmas, "mix"), booktabs = TRUE,
caption = "Expected Y, X Correlations")
Table 29: Expected Y, X Correlations
X1_1 X1_2 X1_3 X1_4
Y1 0.2999999 0.3000000 0.3000000 0.3000000
Y2 0.4000002 0.1733719 0.1733719 0.1733719
Y3 0.5000000 0.2804878 0.2804878 0.2804878
X2_1 X2_2 X2_3 X2_4
Y1 0.2999999 0.1924528 0.1924528 0.1924528
Y2 0.4000002 0.4000000 0.4000000 0.4000000
Y3 0.5000000 0.3902439 0.3902439 0.3902439
X3_1 X3_2 X3_3 X3_4
Y1 0.2999999 0.2886792 0.2886792 0.2886792
Y2 0.4000002 0.3719248 0.3719248 0.3719248
Y3 0.5000000 0.5000000 0.5000000 0.5000000
knitr::kable(Sum2$rho.yx, booktabs = TRUE, caption = "Simulated Y, X Correlations") Table 30: Simulated Y, X Correlations X1_1 X1_2 X1_3 X1_4 Y1 0.3046310 0.2926216 0.3197465 0.2869766 Y2 0.4002240 0.1577374 0.1697532 0.1652599 Y3 0.4934152 0.2918114 0.2789633 0.2860321 X2_1 X2_2 X2_3 X2_4 Y1 0.3046310 0.1964025 0.1918824 0.1957318 Y2 0.4002240 0.3971412 0.3928657 0.3970529 Y3 0.4934152 0.4008303 0.3914031 0.3888395 X3_1 X3_2 X3_3 X3_4 Y1 0.3046310 0.2854515 0.2892292 0.2906667 Y2 0.4002240 0.3729806 0.3633462 0.3700993 Y3 0.4934152 0.5087496 0.5014829 0.5009007 Example 3: System of 3 equations, $$Y_1$$ and $$Y_3$$ as in Example 2, $$Y_2$$ has only error term In this example: 1. $$X_{11} = X_{31}$$ has a Logistic(0, 1) distribution and is the same variable for $$Y_1$$ and $$Y_3$$ 2. $$X_{12}, X_{32}$$ have Beta(4, 1.5) distributions 3. $$X_{13}, X_{33}$$ have normal mixture distributions 4. $$E_1, E_2, E_3$$ have Logistic, Chisq, Beta mixture distributions $$$\begin{split} Y_1 &= \beta_{10} + \beta_{11} * X_{11} + \beta_{12} * X_{12} + \beta_{13} * X_{13} + E_1 \\ Y_2 &= \beta_{20} + E_2 \\ Y_3 &= \beta_{30} + \beta_{31} * X_{11} + \beta_{32} * X_{32} + \beta_{33} * X_{33} + E_3 \end{split} \tag{3}$$$ Step 1: Set up parameter inputs seed <- 276 n <- 10000 M <- 3 method <- "Polynomial" L <- calc_theory("Logistic", c(0, 1)) C <- calc_theory("Chisq", 4) B <- calc_theory("Beta", c(4, 1.5)) skews <- list(c(L[3], B[3]), NULL, c(L[3], B[3])) skurts <- list(c(L[4], B[4]), NULL, c(L[4], B[4])) fifths <- list(c(L[5], B[5]), NULL, c(L[5], B[5])) sixths <- list(c(L[6], B[6]), NULL, c(L[6], B[6])) Six <- list(list(1.75, 0.03), NULL, list(1.75, 0.03)) mix_pis <- list(list(c(0.4, 0.6), c(0.3, 0.2, 0.5)), list(c(0.3, 0.2, 0.5)), list(c(0.4, 0.6), c(0.3, 0.2, 0.5))) mix_mus <- list(list(c(-2, 2), c(L[1], C[1], B[1])), list(c(L[1], C[1], B[1])), list(c(-2, 2), c(L[1], C[1], B[1]))) mix_sigmas <- list(list(c(1, 1), c(L[2], C[2], B[2])), list(c(L[2], C[2], B[2])), list(c(1, 1), c(L[2], C[2], B[2]))) mix_skews <- list(list(c(0, 0), c(L[3], C[3], B[3])), list(c(L[3], C[3], B[3])), list(c(0, 0), c(L[3], C[3], B[3]))) mix_skurts <- list(list(c(0, 0), c(L[4], C[4], B[4])), list(c(L[4], C[4], B[4])),list(c(0, 0), c(L[4], C[4], B[4]))) mix_fifths <- list(list(c(0, 0), c(L[5], C[5], B[5])), list(c(L[5], C[5], B[5])), list(c(0, 0), c(L[5], C[5], B[5]))) mix_sixths <- list(list(c(0, 0), c(L[6], C[6], B[6])), list(c(L[6], C[6], B[6])), list(c(0, 0), c(L[6], C[6], B[6]))) mix_Six <- list(list(NULL, NULL, 1.75, NULL, 0.03), list(1.75, NULL, 0.03), list(NULL, NULL, 1.75, NULL, 0.03)) means <- list(c(L[1], B[1], Nstcum[1], Mstcum[1]), Mstcum[1], c(L[1], B[1], Nstcum[1], Mstcum[1])) vars <- list(c(L[2]^2, B[2]^2, Nstcum[2]^2, Mstcum[2]^2), Mstcum[2]^2, c(L[2]^2, B[2]^2, Nstcum[2]^2, Mstcum[2]^2)) # since Y_2 has no X variables, same.var must be changed to a matrix same.var <- matrix(c(1, 1, 3, 1), 1, 4) corr.x <- list(list(matrix(0.1, 4, 4), NULL, matrix(0.3, 4, 4)), NULL) diag(corr.x[[1]][[1]]) <- 1 # set correlations between components of the same mixture variable to 0 corr.x[[1]][[1]][3:4, 3:4] <- diag(2) # since X1 is the same across outcomes, set correlation the same corr.x[[1]][[3]][, 1] <- corr.x[[1]][[1]][, 1] corr.x <- append(corr.x, list(list(t(corr.x[[1]][[3]]), NULL, matrix(0.5, 4, 4)))) diag(corr.x[[3]][[3]]) <- 1 # set correlations between components of the same mixture variable to 0 corr.x[[3]][[3]][3:4, 3:4] <- diag(2) # since X1 is the same across outcomes, set correlation the same corr.x[[3]][[3]][, 1] <- t(corr.x[[1]][[3]][1, ]) corr.x[[3]][[3]][1, ] <- t(corr.x[[3]][[3]][, 1]) corr.yx <- list(matrix(0.3, 1, 4), NULL, matrix(0.5, 1, 4)) corr.e <- matrix(0.4, 9, 9) corr.e[1:3, 1:3] <- corr.e[4:6, 4:6] <- corr.e[7:9, 7:9] <- diag(3) error_type = "mix" Step 2: Check parameter inputs checkpar(M, method, error_type, means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, same.var = same.var, betas.0 = betas.0, corr.x = corr.x, corr.yx = corr.yx, corr.e = corr.e, quiet = TRUE) ## [1] TRUE Step 3: Generate system Sys3 <- nonnormsys(n, M, method, error_type, means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, same.var, betas.0, corr.x, corr.yx, corr.e, seed, use.nearPD = FALSE, quiet = TRUE) ## Total Simulation time: 0.006 minutes knitr::kable(Sys3$betas, booktabs = TRUE, caption = "Beta coefficients")
Table 31: Beta coefficients
B1 B2 B3 B4
Y1 0.3213033 3.336167 0.6556271 0.6556271
Y2 0.0000000 0.000000 0.0000000 0.0000000
Y3 0.4345749 0.000000 1.3794056 1.3794056
knitr::kable(Sys3$constants[[1]], booktabs = TRUE, caption = "PMT constants") Table 32: PMT constants c0 c1 c2 c3 c4 c5 0.0000000 0.8879258 0.0000000 0.0360523 0.0000000 0.0000006 0.1629657 1.0899841 -0.1873287 -0.0449503 0.0081210 0.0014454 0.0000000 1.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.8879258 0.0000000 0.0360523 0.0000000 0.0000006 -0.2275081 0.9007156 0.2316099 0.0154662 -0.0013673 0.0000552 0.1629657 1.0899841 -0.1873287 -0.0449503 0.0081210 0.0014454 Sys3$valid.pdf
## [[1]]
## [1] "TRUE" "TRUE" "TRUE" "TRUE" "TRUE" "TRUE" "TRUE"
##
## [[2]]
## [1] "TRUE" "TRUE" "TRUE"
##
## [[3]]
## [1] "TRUE" "TRUE" "TRUE" "TRUE" "TRUE" "TRUE" "TRUE"

Step 4: Describe results

Sum3 <- summary_sys(Sys3$Y, Sys3$E, Sys3$E_mix, Sys3$X, Sys3$X_all, M, method, means, vars, skews, skurts, fifths, sixths, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, corr.x = corr.x, corr.e = corr.e) knitr::kable(Sum3$cont_sum_y, booktabs = TRUE,
caption = "Simulated Distributions of Outcomes")
Table 33: Simulated Distributions of Outcomes
Outcome N Mean SD Median Min Max Skew Skurtosis Fifth Sixth
Y1 1 10000 3.589515 2.561356 3.3343275 -6.173603 27.29735 1.2077687 4.570660 17.098230 83.88966
Y2 2 10000 1.163636 2.170751 0.8114017 -10.198298 17.51876 1.9371481 8.312887 29.395190 114.56563
Y3 3 10000 1.163810 3.223408 0.9884224 -11.205198 21.47867 0.5828432 1.737082 4.733577 14.03912
knitr::kable(Sum3$target_sum_x, booktabs = TRUE, caption = "Target Distributions of Independent Variables") Table 34: Target Distributions of Independent Variables Outcome X Mean SD Skew Skurtosis Fifth Sixth cont1_1 1 1 0.0000000 1.8137994 0.0000000 1.2000000 0.000000 6.857143 cont1_2 1 2 0.7272727 0.1746854 -0.6938789 -0.0686029 1.828251 -3.379110 cont1_3 1 3 -2.0000000 1.0000000 0.0000000 0.0000000 0.000000 0.000000 cont1_4 1 4 2.0000000 1.0000000 0.0000000 0.0000000 0.000000 0.000000 cont3_1 3 1 0.0000000 1.8137994 0.0000000 1.2000000 0.000000 6.857143 cont3_2 3 2 0.7272727 0.1746854 -0.6938789 -0.0686029 1.828251 -3.379110 cont3_3 3 3 -2.0000000 1.0000000 0.0000000 0.0000000 0.000000 0.000000 cont3_4 3 4 2.0000000 1.0000000 0.0000000 0.0000000 0.000000 0.000000 knitr::kable(Sum3$cont_sum_x, booktabs = TRUE,
caption = "Simulated Distributions of Independent Variables")
Table 35: Simulated Distributions of Independent Variables
Outcome X N Mean SD Median Min Max Skew Skurtosis Fifth Sixth
cont1_1 1 1 10000 0.0004003 1.8086659 -0.0034553 -9.3357455 8.093322 0.0093534 0.9026131 -0.0030437 1.4023271
cont1_2 1 2 10000 0.7271071 0.1743004 0.7532079 0.0867398 1.083444 -0.6761726 -0.1000023 1.7778230 -3.0225169
cont1_3 1 3 10000 -2.0000000 0.9999500 -1.9820874 -6.1670779 1.579041 -0.0138124 0.0450324 -0.0467214 -0.1399622
cont1_4 1 4 10000 2.0000000 0.9999500 2.0192696 -1.8671048 5.818930 -0.0259989 0.0660603 0.0980029 -0.4313656
cont3_1 3 1 10000 0.0004003 1.8086659 -0.0034553 -9.3357455 8.093322 0.0093534 0.9026131 -0.0030437 1.4023271
cont3_2 3 2 10000 0.7273631 0.1752534 0.7601429 0.0711494 1.034962 -0.7030286 -0.0889725 1.9270159 -3.3525250
cont3_3 3 3 10000 -2.0000000 0.9999500 -1.9947523 -5.8932382 1.483459 -0.0447962 -0.0478362 -0.0040827 -0.1491799
cont3_4 3 4 10000 2.0000000 0.9999500 2.0119310 -1.6266392 6.083537 -0.0019161 -0.0060812 0.2566953 0.2512733
knitr::kable(Sum3$target_mix_x, booktabs = TRUE, caption = "Target Distributions of Mixture Independent Variables") Table 36: Target Distributions of Mixture Independent Variables Outcome X Mean SD Skew Skurtosis Fifth Sixth mix1_1 1 1 0.4 2.2 -0.2885049 -1.154019 1.793022 6.173267 mix3_1 3 1 0.4 2.2 -0.2885049 -1.154019 1.793022 6.173267 knitr::kable(Sum3$mix_sum_x, booktabs = TRUE,
caption = "Simulated Distributions of Mixture Independent Variables")
Table 37: Simulated Distributions of Mixture Independent Variables
Outcome X N Mean SD Median Min Max Skew Skurtosis Fifth Sixth
mix1_1 1 1 10000 0.4 2.19989 1.03944 -6.140646 5.801265 -0.2961194 -1.141168 1.808030 6.086474
mix3_1 3 1 10000 0.4 2.19989 1.05123 -5.487086 6.058566 -0.2866761 -1.152828 1.791524 6.199999
knitr::kable(Sum3$target_mix_e, booktabs = TRUE, caption = "Target Distributions of Mixture Error Terms") Table 38: Target Distributions of Mixture Error Terms Outcome Mean SD Skew Skurtosis Fifth Sixth E1 1 1.163636 2.17086 2.013281 8.789539 36.48103 192.722 E2 2 1.163636 2.17086 2.013281 8.789539 36.48103 192.722 E3 3 1.163636 2.17086 2.013281 8.789539 36.48103 192.722 knitr::kable(Sum3$mix_sum_e, booktabs = TRUE,
caption = "Simulated Distributions of Mixture Error Terms")
Table 39: Simulated Distributions of Mixture Error Terms
Outcome N Mean SD Median Min Max Skew Skurtosis Fifth Sixth
E1 1 10000 1.163636 2.170751 0.7763939 -7.337748 25.01447 2.027964 8.810501 39.02100 230.4321
E2 2 10000 1.163636 2.170751 0.8114017 -10.198298 17.51876 1.937148 8.312887 29.39519 114.5656
E3 3 10000 1.163636 2.170751 0.8072152 -9.407838 18.95352 1.920803 8.312115 30.29603 124.0343

Maximum Correlation Errors for X Variables by Outcome:

maxerr <- rbind(Sum3$maxerr[[1]][-2], Sum3$maxerr[[3]][-2])
rownames(maxerr) <- colnames(maxerr) <- c("Y1", "Y3")
knitr::kable(as.data.frame(maxerr), digits = 5, booktabs = TRUE,
caption = "Maximum Correlation Errors for X Variables")
Table 40: Maximum Correlation Errors for X Variables
Y1 Y3
Y1 0.00177 0.00147
Y3 0.00147 0.00257

Step 5: Compare simulated correlations to theoretical values

knitr::kable(calc_corr_y(Sys3$betas, corr.x, corr.e, vars, mix_pis, mix_mus, mix_sigmas, "mix"), booktabs = TRUE, caption = "Expected Y Correlations") Table 41: Expected Y Correlations Y1 Y2 Y3 Y1 1.0000000 0.1025981 0.3885224 Y2 0.1025981 1.0000000 0.0817148 Y3 0.3885224 0.0817148 1.0000000 knitr::kable(Sum3$rho.y, booktabs = TRUE,
caption = "Simulated Y Correlations")
Table 42: Simulated Y Correlations
Y1 Y2 Y3
Y1 1.0000000 0.0738147 0.3825892
Y2 0.0738147 1.0000000 0.0611834
Y3 0.3825892 0.0611834 1.0000000
knitr::kable(calc_corr_ye(Sys3$betas, corr.x, corr.e, vars, mix_pis, mix_mus, mix_sigmas, "mix"), booktabs = TRUE, caption = "Expected Y, E Correlations") Table 43: Expected Y, E Correlations E1 E2 E3 Y1 0.8433983 0.1025981 0.1025981 Y2 0.1216485 1.0000000 0.1216485 Y3 0.0817148 0.0817148 0.6717288 knitr::kable(Sum3$rho.yemix, booktabs = TRUE,
caption = "Simulated Y, E Correlations")
Table 44: Simulated Y, E Correlations
E1 E2 E3
Y1 0.8421445 0.0738147 0.0966026
Y2 0.0912775 1.0000000 0.0993515
Y3 0.0742094 0.0611834 0.6704631
knitr::kable(calc_corr_yx(Sys3$betas, corr.x, vars, mix_pis, mix_mus, mix_sigmas, "mix"), booktabs = TRUE, caption = "Expected Y, X Correlations") Table 45: Expected Y, X Correlations X1_1 X1_2 X1_3 X1_4 Y1 0.3 0.3000000 0.3000000 0.3000000 Y2 0.0 0.0000000 0.0000000 0.0000000 Y3 0.5 0.2804878 0.2804878 0.2804878 X3_1 X3_2 X3_3 X3_4 Y1 0.3 0.2886793 0.2886793 0.2886793 Y2 0.0 0.0000000 0.0000000 0.0000000 Y3 0.5 0.5000000 0.5000000 0.5000000 knitr::kable(Sum3$rho.yx, booktabs = TRUE,
caption = "Simulated Y, X Correlations")
Table 46: Simulated Y, X Correlations
X1_1 X1_2 X1_3 X1_4
Y1 0.2889926 0.2964159 0.2877044 0.3116951
Y2 0.0014158 -0.0077903 -0.0056745 -0.0025133
Y3 0.5030255 0.2748050 0.2790685 0.2803257
X3_1 X3_2 X3_3 X3_4
Y1 0.2889926 0.2873456 0.2900545 0.2873188
Y2 0.0014158 -0.0002223 -0.0097569 -0.0044247
Y3 0.5030255 0.4990494 0.4886098 0.5065214

References

Bates, Douglas, and Martin Maechler. 2017. Matrix: Sparse and Dense Matrix Classes and Methods. https://CRAN.R-project.org/package=Matrix.

Davenport, JW, JC Bezder, and RJ Hathaway. 1988. “Parameter Estimation for Finite Mixture Distributions.” Computers & Mathematics with Applications 15 (10): 819–28.

Everitt, B S. 1996. “An Introduction to Finite Mixture Distributions.” Statistical Methods in Medical Research 5 (2): 107–27. https://doi.org/10.1177/096228029600500202.

Fialkowski, A C. 2017. SimMultiCorrData: Simulation of Correlated Data with Multiple Variable Types. https://CRAN.R-project.org/package=SimMultiCorrData.

———. 2018. SimCorrMix: Simulation of Correlated Data with Multiple Variable Types Including Continuous and Count Mixture Distributions. https://CRAN.R-project.org/package=SimCorrMix.

Fleishman, A I. 1978. “A Method for Simulating Non-Normal Distributions.” Psychometrika 43: 521–32. https://doi.org/10.1007/BF02293811.

Headrick, T C. 2002. “Fast Fifth-Order Polynomial Transforms for Generating Univariate and Multivariate Non-Normal Distributions.” Computational Statistics and Data Analysis 40 (4): 685–711. https://doi.org/10.1016/S0167-9473(02)00072-5.

Headrick, T C, and T M Beasley. 2004. “A Method for Simulating Correlated Non-Normal Systems of Statistical Equations.” Communications in Statistics - Simulation and Computation 33 (1): 19–33. http://dx.doi.org/10.1081/SAC-120028431.

Higham, N. 2002. “Computing the Nearest Correlation Matrix - a Problem from Finance.” IMA Journal of Numerical Analysis 22 (3): 329–43. https://doi.org/10.1093/imanum/22.3.329.

Maree, S. 2012. “Correcting Non Positive Definite Correlation Matrices.” BSc Thesis Applied Mathematics, Delft University of Technology. http://resolver.tudelft.nl/uuid:2175c274-ab03-4fd5-85a9-228fe421cdbf.

Pearson, RK. 2011. “Exploring Data in Engineering, the Sciences, and Medicine.” In. New York: Oxford University Press.

Shork, N J, D B Allison, and B Thiel. 1996. “Mixture Distributions in Human Genetics Research.” Statistical Methods in Medical Research 5: 155–78. https://doi.org/10.1177/096228029600500204.

Yin, JF, and Y Zhang. 2013. “Alternative Gradient Algorithms for Computing the Nearest Correlation Matrix.” Applied Mathematics and Computation 219 (14): 7591–9. https://doi.org/10.1016/j.amc.2013.01.045.

Zhang, Y, and JF Yin. Year not provided. “Modified Alternative Gradients Algorithm for Computing the Nearest Correlation Matrix.” Internal paper. Shanghai: Tongji University.