BeSS: An R Package for Best Subset Selection and Best Subset Ridge Regression

Introduction

The advance in modern technology, including computing power and storage, brings about more and more high-dimensional data in which the number of features can be much larger than the number of observations (Hastie et al. 2009). Examples include gene, microarray, and proteomics data, high-resolution images, high-frequency financial data, e-commerce data, warehouse data, resonance imaging, signal processing, among many others (Fan et al. 2011).

Since it is not easy to explain the relationship between the response and the variables if the model is too complicated, associated with a lot of predictors for example, and reducing the number of variables resorting to subjective approaches can be influenced by one’s interests and hypotheses. There are at least three challenges for regression methods under the high dimensional setting:

The best subset selection is up to these challenges, which enjoy the following admirable advantages:

By introducing a shrinkage on the coefficients the best subset ridge regression provides a more sophisticated trade-off between model parsimony and prediction on the based of the best subset selection.

Softwares

R package

To download and install BeSS from CRAN:

install.packages("BeSS")

Or try the development version on GitHub:

# install.packages("devtools")
devtools::install_github("Mamba413/bess/R")

Following are comparisons with some R packages aiming at best subset selection in several metrics: | |leaps|lmSubset |bestglm|glmuti|BeSS | :——————————– | :———————————————————-: | :——————————————————–: | :————————————————–: | :—————————————————-: | :————————————————–: | | Solve linear regression models|:heavy_check_mark:|:heavy_check_mark:|:heavy_check_mark:|:heavy_check_mark: |:heavy_check_mark: | | Solve logistic regression models|:x:|:x:|:heavy_check_mark: |:heavy_check_mark: |:heavy_check_mark: | | Solve poisson regression models|:x:|:x:|:heavy_check_mark: |:heavy_check_mark: |:heavy_check_mark: | | Solve CoxPH regression models|:x:|:x:|:x: |:heavy_check_mark: |:heavy_check_mark: | | group variable selection|:x:|:x:|:x:|:x:|:heavy_check_mark: | | Feature screening |:x:|:x:|:x: |:x: |:heavy_check_mark: | | Tuning parameter determination on information criterion |:x:|:heavy_check_mark:|:heavy_check_mark:|:heavy_check_mark:|:heavy_check_mark:| | Tuning parameter determination on cross-validation |:x:|:x:|:heavy_check_mark:|:x:|:heavy_check_mark:| | Include specified variables|:x:|:heavy_check_mark:|:x:|:x:|:heavy_check_mark:| | Options for coefficient shrinkage|:x:|:x:|:x:|:x:|:heavy_check_mark:| | Computational efficiency | :walking::walking: |:walking::running:|:walking::walking:(impossible for glm with variable number greater than 15)|:walking::running: (impossible for glm with variable number greater than 32) |:running::running:|

See the following documents for more details about the BeSS package:

References