# Introduction to the EAinference package

#### 2017-12-01

EAinference package is designed to facilitate simulation based inference of lasso type estimators. The package includes

• Fitting the lasso, group lasso, scaleld lasso and scaled group lasso models
• Gaussian and wild multiplier bootstrap for lasso, group lasso, scaled lasso, scaled group lasso and their de-biased estimators
• Importance sampler for approximating p-values in these methods
• Markov chain Monte Carlo lasso sampler with applications in post-selection inference.

## Loss functions

Let the unknown true model be $y = X\beta_0 + \epsilon,$ where $$\beta_0$$ is a unknown true coefficient vector and $$\epsilon_i \sim_{iid} N(0,\sigma^2)$$.

We use following loss functions. $\ell_{grlasso}(\beta ; \lambda) = \frac{1}{2}||y-X\beta||^2_2 + n\lambda\sum_j w_j||\beta_{(j)}||_2,$ $\ell_{sgrlasso}(\beta, \sigma ; \lambda) = \frac{1}{2\sigma}||y-X\beta||^2_2 + \frac{\sigma}{2} + n\lambda\sum_j w_j||\beta_{(j)}||_2,$ where $$grlasso$$ and $$sgrlasso$$ represent group lasso and scaled group lasso, respectively, and $$w_j$$ is the weight factor for the $$j$$-th group. Loss functions for lasso and scaled lasso can be treated as special cases of group lasso and group scaled lasso when every group is a singleton, respectively.

## Fit lasso, group lasso, scaled lasso and scaled group lasso models

lassoFit computes lasso, group lasso, scaled lasso and scaled group lasso solutions. Users can either provide the value of $$\lambda$$ or choose to use cross-validation by setting lbd = "cv.min" or lbd = "cv.1se". By letting lbd = "cv.min", users can have $$\lambda$$ which gives minimum mean-squared error, while lbd = "cv.1se" is the largest $$\lambda$$ within 1 standard deviaion error bound of "cv.min".

library(EAinference)
set.seed(1234)
n <- 30; p <- 50
Niter <-  10
group <- rep(1:(p/5), each = 5)
weights <- rep(1, p/5)
beta0 <- c(rep(1,5), rep(0, p-5))
X <- matrix(rnorm(n*p), n)
Y <- X %*% beta0 + rnorm(n)
# set lambda = .5
lassoFit(X = X, Y = Y, lbd = .5, group = group, weights = weights, type="grlasso")
## $B0 ## [1] 0.5161666451 1.0188194987 0.6824793388 0.3931465866 0.6874994287 ## [6] -0.0016276550 0.0217320293 0.0301747719 -0.0099519996 -0.0164145624 ## [11] -0.0014113541 -0.0045361968 0.0020838981 0.0005231764 -0.0052660052 ## [16] 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 ## [21] 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 ## [26] 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 ## [31] 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 ## [36] 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 ## [41] -0.0017693962 -0.0011635879 0.0061087716 0.0010330827 -0.0028810035 ## [46] 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 ## ##$S0
##  [1]  0.33335824  0.65800255  0.44077947  0.25391509  0.44402031
##  [6] -0.03886814  0.51891926  0.72052006 -0.23763656 -0.39194621
## [11] -0.19045450 -0.61212553  0.28120884  0.07059705 -0.71060849
## [16]  0.46931065  0.33721146  0.11896615  0.29502880  0.03505243
## [21]  0.05203069 -0.12061855  0.04603167 -0.40549602 -0.50026213
## [26]  0.17202391  0.45006926 -0.17412122 -0.80287493 -0.10071653
## [31] -0.25984436  0.01141286  0.24753340  0.62774880  0.04721167
## [36]  0.33710419 -0.09279505 -0.04843598 -0.43430001  0.40143342
## [41] -0.24735341 -0.16266440  0.85398270  0.14442166 -0.40275203
## [46]  0.53179122 -0.20151832 -0.07230133 -0.45279215  0.21106223
##
## $lbd ## [1] 0.5 ## ##$weights
##  [1] 1 1 1 1 1 1 1 1 1 1
##
## $group ## [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5 5 5 ## [24] 5 5 6 6 6 6 6 7 7 7 7 7 8 8 8 8 8 9 9 9 9 9 10 ## [47] 10 10 10 10 # use cross-validation lassoFit(X = X, Y = Y, lbd = "cv.1se", group = group, weights = weights, type="grlasso") ##$B0
##  [1] 0.2585567 0.5662969 0.3423216 0.1790861 0.3370465 0.0000000 0.0000000
##  [8] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## [15] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## [22] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## [29] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## [36] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## [43] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## [50] 0.0000000
##
## $S0 ## [1] 0.320602499 0.702188704 0.424466187 0.222062331 0.417926026 ## [6] -0.082467976 0.277804727 0.096219628 -0.208633518 -0.139425279 ## [11] -0.045785563 -0.085990733 0.311352884 0.011898323 -0.431104404 ## [16] 0.434035959 0.080054096 0.099761888 0.383888283 -0.046916332 ## [21] -0.142767164 -0.004633968 0.139308032 -0.385795767 -0.281788517 ## [26] 0.100791387 0.170594118 0.002935687 -0.284587073 -0.021219450 ## [31] -0.152129053 0.174903539 0.117935249 0.246820956 0.035916060 ## [36] 0.306760303 -0.064360881 -0.114288527 -0.306805271 0.216458545 ## [41] -0.179758354 -0.247343130 0.344814852 -0.033834872 -0.179308622 ## [46] 0.257759246 0.067089513 -0.164422715 -0.196709827 0.228724776 ## ##$lbd
## [1] 1.475293
##
## $weights ## [1] 1 1 1 1 1 1 1 1 1 1 ## ##$group
##  [1]  1  1  1  1  1  2  2  2  2  2  3  3  3  3  3  4  4  4  4  4  5  5  5
## [24]  5  5  6  6  6  6  6  7  7  7  7  7  8  8  8  8  8  9  9  9  9  9 10
## [47] 10 10 10 10

## Parametric bootstrap sampler

PBsampler supports two bootstrap methods; Gaussian bootstrap and wild multiplier bootstrap. Due to the fact that the sampling distirbution of $$(\hat{\beta}, S)$$, coefficient estimator and its subgradient, is characterized by $$(\beta_0, \sigma^2, \lambda)$$, users are required to provide PE(either the estimate of $$\beta_0$$ or the estimate of $$E(y) = X\beta_0$$), sig2(or estimate of $$\sigma^2$$) and lbd($$\lambda$$ from above loss functions).

By specifying two sets of arguments, (PE_1, sig2_1, lbd_1) and (PE_2, sig2_2, lbd_2), users can sample from the mixture distribution. In this way, samples will be drawn from (PE_1, sig2_1, lbd_1) with 1/2 probability and from (PE_2, sig2_2, lbd_2) with another 1/2 probability.

set.seed(1234)
n <- 5; p <- 10
Niter <-  3
group <- rep(1:(p/5), each = 5)
weights <- rep(1, p/5)
beta0 <- c(rep(1,5), rep(0, p-5))
X <- matrix(rnorm(n*p), n)
Y <- X %*% beta0 + rnorm(n)
#
# Using non-mixture distribution
#
PBsampler(X = X, PE_1 = rep(0, p), sig2_1 = 1, lbd_1 = .5,
weights = weights, group = group, type = "grlasso", niter = Niter, parallel = FALSE)
## $beta ## [,1] [,2] [,3] [,4] [,5] ## [1,] -0.3525372 -0.04505232 -0.163496084 -0.20736594 0.04096835 ## [2,] 0.0000000 0.00000000 0.000000000 0.00000000 0.00000000 ## [3,] -0.1019592 -0.01026948 0.007230875 -0.04528641 0.02613062 ## [,6] [,7] [,8] [,9] [,10] ## [1,] 0.07989826 0.01441626 -0.078991476 -0.01728012 -0.02587893 ## [2,] 0.19450800 0.10014585 -0.449697335 -0.06017330 -0.11895590 ## [3,] -0.48573785 -0.02475042 0.003966679 0.16608124 -0.22664921 ## ##$subgrad
##            [,1]        [,2]        [,3]       [,4]        [,5]       [,6]
## [1,] -0.7928269 -0.10133122 -0.36770991 -0.4663535  0.09212409  0.6801195
## [2,]  0.5445986  0.06272791 -0.41645041  0.2135834 -0.29673172  0.3757996
## [3,] -0.8845247 -0.08909051  0.06272808 -0.3928755  0.22669099 -0.8647542
##             [,7]         [,8]       [,9]      [,10]
## [1,]  0.12271068 -0.672414957 -0.1471035 -0.2202979
## [2,]  0.19352241 -0.868867603 -0.1162979 -0.2298557
## [3,] -0.04405627  0.007096803  0.2956620 -0.4035373
##
## $X ## [,1] [,2] [,3] [,4] [,5] [,6] ## [1,] -1.2070657 0.5060559 -0.47719270 -0.1102855 0.1340882 -1.4482049 ## [2,] 0.2774292 -0.5747400 -0.99838644 -0.5110095 -0.4906859 0.5747557 ## [3,] 1.0844412 -0.5466319 -0.77625389 -0.9111954 -0.4405479 -1.0236557 ## [4,] -2.3456977 -0.5644520 0.06445882 -0.8371717 0.4595894 -0.0151383 ## [5,] 0.4291247 -0.8900378 0.95949406 2.4158352 -0.6937202 -0.9359486 ## [,7] [,8] [,9] [,10] ## [1,] 1.1022975 -1.1676193 1.4494963 -0.9685143 ## [2,] -0.4755931 -2.1800396 -1.0686427 -1.1073182 ## [3,] -0.7094400 -1.3409932 -0.8553646 -1.2519859 ## [4,] -0.5012581 -0.2942939 -0.2806230 -0.5238281 ## [5,] -1.6290935 -0.4658975 -0.9943401 -0.4968500 ## ##$PE
##  [1] 0 0 0 0 0 0 0 0 0 0
##
## $sig2 ## [1] 1 ## ##$lbd
## [1] 0.5
##
## $weights ## [1] 1 1 ## ##$group
##  [1] 1 1 1 1 1 2 2 2 2 2
##
## $type ## [1] "grlasso" ## ##$PEtype
## [1] "coeff"
##
## $Btype ## [1] "gaussian" ## ##$Y
## NULL
##
## $mixture ## [1] FALSE ## ##$call
## PBsampler(X = X, PE_1 = rep(0, p), sig2_1 = 1, lbd_1 = 0.5, weights = weights,
##     group = group, niter = Niter, type = "grlasso", parallel = FALSE)
##
## attr(,"class")
## [1] "PB"
#
# Using mixture distribution
#
PBsampler(X = X, PE_1 = rep(0, p), sig2_1 = 1, lbd_1 = .5,
PE_2 = rep(1, p), sig2_2 = 2, lbd_2 = .3, weights = weights,
group = group, type = "grlasso", niter = Niter, parallel = FALSE)
## $beta ## [,1] [,2] [,3] [,4] [,5] [,6] ## [1,] 0.0000000 0.0000000 0.00000000 0.0000000 0.00000000 -0.27036272 ## [2,] -0.3321355 -0.3335305 -0.04145217 -0.4657345 0.06209445 -0.08106727 ## [3,] 1.2724686 0.7145404 0.26382548 0.7718595 -0.07766709 0.28111670 ## [,7] [,8] [,9] [,10] ## [1,] 0.1244449 -0.15833798 0.2244849 -0.1545494 ## [2,] -0.3907588 -0.08506543 -0.3701445 -0.2274632 ## [3,] 1.2572062 1.56849000 1.1991659 1.4822653 ## ##$subgrad
##            [,1]       [,2]        [,3]       [,4]        [,5]       [,6]
## [1,] -0.6154729  0.1204288 -0.02636072  0.1638148  0.05509138 -0.6236691
## [2,] -0.4984384 -0.5005026 -0.06221061 -0.6989186  0.09317068 -0.1360223
## [3,]  0.7602823  0.4269187  0.15762952  0.4611765 -0.04640962  0.1009130
##            [,7]       [,8]       [,9]      [,10]
## [1,]  0.2870925 -0.3652346  0.5177945 -0.3565099
## [2,] -0.6555875 -0.1427033 -0.6210494 -0.3816455
## [3,]  0.4514828  0.5632085  0.4305853  0.5323487
##
## $X ## [,1] [,2] [,3] [,4] [,5] [,6] ## [1,] -1.2070657 0.5060559 -0.47719270 -0.1102855 0.1340882 -1.4482049 ## [2,] 0.2774292 -0.5747400 -0.99838644 -0.5110095 -0.4906859 0.5747557 ## [3,] 1.0844412 -0.5466319 -0.77625389 -0.9111954 -0.4405479 -1.0236557 ## [4,] -2.3456977 -0.5644520 0.06445882 -0.8371717 0.4595894 -0.0151383 ## [5,] 0.4291247 -0.8900378 0.95949406 2.4158352 -0.6937202 -0.9359486 ## [,7] [,8] [,9] [,10] ## [1,] 1.1022975 -1.1676193 1.4494963 -0.9685143 ## [2,] -0.4755931 -2.1800396 -1.0686427 -1.1073182 ## [3,] -0.7094400 -1.3409932 -0.8553646 -1.2519859 ## [4,] -0.5012581 -0.2942939 -0.2806230 -0.5238281 ## [5,] -1.6290935 -0.4658975 -0.9943401 -0.4968500 ## ##$PE
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## PE_1    0    0    0    0    0    0    0    0    0     0
## PE_2    1    1    1    1    1    1    1    1    1     1
##
## $sig2 ## [1] 1 2 ## ##$lbd
## [1] 0.5 0.3
##
## $weights ## [1] 1 1 ## ##$group
##  [1] 1 1 1 1 1 2 2 2 2 2
##
## $type ## [1] "grlasso" ## ##$PEtype
## [1] "coeff"
##
## $Btype ## [1] "gaussian" ## ##$Y
## NULL
##
## $mixture ## [1] TRUE ## ##$call
## PBsampler(X = X, PE_1 = rep(0, p), sig2_1 = 1, lbd_1 = 0.5, PE_2 = rep(1,
##     p), sig2_2 = 2, lbd_2 = 0.3, weights = weights, group = group,
##     niter = Niter, type = "grlasso", parallel = FALSE)
##
## attr(,"class")
## [1] "PB"

## Importance sampler for high dimensional data

Importance sampler enables users to access the inference of an extreme region. This is done by using a proposal distiribution that is denser around the extreme region.

Say that we are interested in computing the expectation of a function of a random variable, $$h(X)$$. Let $$f(x)$$ be the true or target distribution and $$g(x)$$ be the proposal distribution. We can approximate the expectation by a weighted average of samples drawn from the proposal distribution as follows,

$\begin{eqnarray} E_f\Big[h(X)\Big] &=& E_g \Big[h(X)\frac{f(X)}{g(X)} \Big]\\ &\approx& \frac{1}{N}\sum_{i=1}^N h(x_i)\frac{f(x_i)}{g(x_i)}. \end{eqnarray}$ By using hdIS method, one can easily compute the importance weight which is the ratio of target density over proposal density; $$f(x_i)/g(x_i)$$ from above equation.

Users can simply draw samples from the proposal distribution using PBsampler and plug in the result into hdIS with target distribution parameters in order to compute the importance weights.

# Target distribution parameter
PETarget <- rep(0, p)
sig2Target <- .5
lbdTarget <- .37
# Proposal distribution parameter
PEProp1 <- rep(1, p)
sig2Prop1 <- .5
lbdProp1 <- 1

# Draw samples from proposal distribution
PB <- PBsampler(X = X, PE_1 = PEProp1, sig2_1 = sig2Prop1,
lbd_1 = lbdProp1, weights = weights, group = group, niter = Niter,
type="grlasso", PEtype = "coeff")

# Compute importance weights
hdIS(PB, PETarget = PETarget, sig2Target = sig2Target, lbdTarget = lbdTarget,
log = TRUE)
## [1]  -96.45903 -126.87442  -69.81881

## Metropolis-Hastings Lasso Sampler

In this section, we introduce MHLS method, a Markov Chain Monte Carlo(MCMC) sampler for lasso estimator. Although bootstrapping is one of the most convenient sampling methods, it has a clear limitation which is that sampling from the conditional distribution is impossible. In contrast, MCMC sampler can easily draw samples from the conditional distribution. Here, we introduce MHLS function which draws lasso samples under the fixed active set, A.

weights <- rep(1,p)
lbd <- .37
LassoResult <- lassoFit(X = X,Y = Y,lbd = lbd, type = "lasso", weights = weights)
B0 <- LassoResult$B0 S0 <- LassoResult$S0
Result <- MHLS(X = X, PE = B0, sig2 = 1, lbd = 1,
weights = weights, B0 = B0, S0 = S0, niter = 100, burnin = 0,
PEtype = "coeff")
Result
## ===========================
## Number of iteration:  100
##
## Burn-in period:  0
##
## Plug-in PE:
##  [1] 0.7995661 0.0000000 0.0000000 0.9605262 0.0000000 0.0000000 0.0000000
##  [8] 0.8458788 0.0000000 0.6141082
## PEtype:
## [1] "coeff"
##
## Last 10 steps of beta's:
##            [,1] [,2] [,3]      [,4] [,5] [,6] [,7]      [,8] [,9]
##  [1,] 0.6461477    0    0 1.0846858    0    0    0 0.4806811    0
##  [2,] 0.3534608    0    0 1.0846858    0    0    0 0.3844643    0
##  [3,] 0.3534608    0    0 0.4931794    0    0    0 0.9696950    0
##  [4,] 0.3449473    0    0 0.4931794    0    0    0 0.9696950    0
##  [5,] 0.3449473    0    0 0.4931794    0    0    0 0.9696950    0
##  [6,] 0.3449473    0    0 0.4931794    0    0    0 0.9696950    0
##  [7,] 0.3449473    0    0 0.4931794    0    0    0 0.9696950    0
##  [8,] 0.3449473    0    0 0.4680849    0    0    0 0.9353218    0
##  [9,] 0.3449473    0    0 0.4680849    0    0    0 0.3042383    0
## [10,] 0.3449473    0    0 0.4680849    0    0    0 0.3042383    0
##           [,10]
##  [1,] 0.2336445
##  [2,] 0.2336445
##  [3,] 0.2336445
##  [4,] 0.2336445
##  [5,] 0.2336445
##  [6,] 0.2336445
##  [7,] 0.2336445
##  [8,] 0.2336445
##  [9,] 0.7305437
## [10,] 0.7305437
##
## last 10 steps of subgradients:
##           [,1]       [,2]      [,3]      [,4]        [,5]      [,6]
##  [1,] 1.000003 0.20751105 0.5348776 0.9999994 -0.13120222 0.8152837
##  [2,] 1.000003 0.19902287 0.5373069 0.9999994 -0.13266943 0.8230570
##  [3,] 1.000003 0.14644498 0.5523550 0.9999994 -0.14175768 0.8712067
##  [4,] 1.000003 0.14644498 0.5523550 0.9999994 -0.14175768 0.8712067
##  [5,] 1.000003 0.09852526 0.5660699 0.9999994 -0.15004075 0.9150905
##  [6,] 1.000003 0.03423921 0.5844690 0.9999994 -0.16115279 0.9739622
##  [7,] 1.000003 0.10255316 0.5649171 0.9999994 -0.14934451 0.9114018
##  [8,] 1.000003 0.10255316 0.5649171 0.9999994 -0.14934451 0.9114018
##  [9,] 1.000003 0.46924288 0.4599684 0.9999994 -0.08596107 0.5755955
## [10,] 1.000003 0.31439579 0.5042865 0.9999994 -0.11272687 0.7174010
##              [,7]     [,8]        [,9] [,10]
##  [1,] -0.02095016 1.000024 -0.11176210     1
##  [2,] -0.03408970 1.000024 -0.12631946     1
##  [3,] -0.11547926 1.000024 -0.21649131     1
##  [4,] -0.11547926 1.000024 -0.21649131     1
##  [5,] -0.18965806 1.000024 -0.29867431     1
##  [6,] -0.28917162 1.000024 -0.40892581     1
##  [7,] -0.18342296 1.000024 -0.29176642     1
##  [8,] -0.18342296 1.000024 -0.29176642     1
##  [9,]  0.38420566 1.000024  0.33711177     1
## [10,]  0.14450535 1.000024  0.07154677     1
##
## Acceptance rate:
## -----------------------------
## # Accepted    :   122     62
## # Moved       :   396     99
## Acceptance rate   :   0.308   0.626  

We provide summary and plot functions for MHLS results.

summary(Result)
## $beta ## mean median s.d 2.5% 97.5% ## [1,] 0.3987769 0.3449473 0.2007944 0.11304538 0.8853218 ## [2,] 0.0000000 0.0000000 0.0000000 0.00000000 0.0000000 ## [3,] 0.0000000 0.0000000 0.0000000 0.00000000 0.0000000 ## [4,] 0.5038473 0.4931794 0.3748674 0.04885877 1.0846858 ## [5,] 0.0000000 0.0000000 0.0000000 0.00000000 0.0000000 ## [6,] 0.0000000 0.0000000 0.0000000 0.00000000 0.0000000 ## [7,] 0.0000000 0.0000000 0.0000000 0.00000000 0.0000000 ## [8,] 0.5765659 0.6186559 0.3053770 0.04618138 1.1025196 ## [9,] 0.0000000 0.0000000 0.0000000 0.00000000 0.0000000 ## [10,] 0.3805857 0.3966923 0.2790838 0.03606252 0.8890657 ## ##$subgradient
##              mean     median        s.d        2.5%       97.5%
##  [1,]  1.00000314  1.0000031 0.00000000  1.00000314  1.00000314
##  [2,]  0.29514219  0.2500872 0.18012879  0.06101587  0.71133476
##  [3,]  0.50979702  0.5226920 0.05155390  0.39068031  0.57680532
##  [4,]  0.99999936  0.9999994 0.00000000  0.99999936  0.99999936
##  [5,] -0.11605491 -0.1238428 0.03113581 -0.15652436 -0.04411475
##  [6,]  0.73503303  0.7762934 0.16495796  0.35389307  0.94944074
##  [7,]  0.11470115  0.0449569 0.27883589 -0.24772187  0.75895921
##  [8,]  1.00002402  1.0000240 0.00000000  1.00002402  1.00002402
##  [9,]  0.03852656 -0.0387434 0.30892348 -0.36300346  0.75230284
## [10,]  1.00000000  1.0000000 0.00000000  1.00000000  1.00000000
##
## attr(,"class")
## [1] "summary.MHLS"
plot(Result, index=c(1,4,9))

## Hit <Return> to see the next plot:

## Hit <Return> to see the next plot:

## Hit <Return> to see the next plot:

## Post-selection Inference

postInference.MHLS is a method for post-selection inference. The inference is provided by MHLS results from multiple chains. In order to achieve the robust result, different PE values are used for each chain. After drawing samples of $$(\hat{\beta}, S)$$ with MH sampler, we then refit the estimator to remove the bias of the lasso estimator. The final output is the $$(1-a)$$ quantile of each active coefficient.

postInference.MHLS(X = X, Y = Y, lbd = lbd, sig2.hat = 1, alpha = .05,
method = "coeff", nChain = 5, niterPerChain = 20,
parallel = !(.Platform\$OS.type == "windows"))
##                 beta1       beta4      beta8     beta10
## LowerBound -0.2274856 -0.06173088 -0.8538049 -0.1946631
## UpperBound  2.5068937  2.09967452  0.9623414  2.4811133