Bump Hunting by Patient Rule Induction Method for Survival, Regression and Classification
=============== ### Description
PRIMsrc performs a unified treatment of Bump Hunting by Patient Rule Induction Method (PRIM) in Survival, Regression and Classification settings (SRC). The method generates decision rules delineating a region in the predictor space, where the response is larger than its average over the entire space. The region is shaped as a hyperdimensional box or hyperrectangle that is not necessarily contiguous. Assumptions are that the multivariate input variables can be discrete or continuous and the univariate response variable can be discrete (Classification), continuous (Regression) or a time-to event, possibly censored (Survival). It is intended to handle low and high-dimensional multivariate datasets, including the paradigm where the number of covariates (p) exceeds or dominates that of samples (n): p > n or p >> n.
The current version is a development release that only implements the case of a survival response. At this point, this version is also restricted to a directed peeling search of the first box covered by the recursive coverage (outer) loop of our Patient Recursive Survival Peeling (PRSP) algorithm (Dazard et al., 2014, 2015, 2016). New features will be added soon as they are available.
The package relies on an optional variable screening (pre-selection) procedure that is run before the PRSP algorithm and final variable usage (selection) procedure is done. This is done by four possible cross-validated variable screening (pre-selection) procedures offered to the user from the main end-user survival Bump Hunting function
sbh(). At this point, the user can choose between:
In this version, the Cross-Validation (CV) procedure and Bump Hunting procedures that control model size (#covariates) and model complexity (#peeling steps), respectively, to fit the Survival Bump Hunting model, are carried out internally by two consecutive tasks within a single main end-user survival Bump Hunting function
sbh(). The returned S3-class
sbh object contains cross-validated estimates of all the decision-rules of used covariates and all other statistical quantities of interest at each iteration of the peeling sequence (inner loop of the PRSP algorithm). This enables the graphical display of results of profiling curves for model selection/tuning, peeling trajectories, covariate traces and survival distributions (see companion papers Dazard et al., 2014, 2015, 2016 for details).
PRIMsrc offers a number of options for the number of replications of the fitting procedure to be perfomed: B; the type of K-fold cross-validation desired: (replicated)-averaged or-combined; as well as the peeling and cross-validation critera for model selection/tuning, and a few more parameters for the PRSP algorithm. The package takes advantage of the R packages
snow, which allows users to create a parallel backend within an R session, enabling access to a cluster of compute cores and/or nodes on a local and/or remote machine(s) with either. The package supports two types of communication mechanisms between master and worker processes: ‘Socket’ or ‘Message-Passing Interface’ (‘MPI’).
============ ### Branches
This branch (master) is the default one, that hosts the current development release (version 0.7.5) of the survival bump hunting procedure that implements the case of a survival response. Note that
PRIMsrc is still a non-production release and that version 0.7.5 implements significant user-visible changes. Check details of new features, changes, and bug fixes in the “Usage” section below.
The second branch (unified) will host the future complete version of the code (version 1.0.0), including undirected peeling search derived from the Patient Rule Induction Method (PRIM), and unified treatment of bump hunting for every type of common response: Survival, Regression and Classification (SRC).
=========== ### License
PRIMsrc is open source / free software, licensed under the GNU General Public License version 3 (GPLv3), sponsored by the Free Software Foundation. To view a copy of this license, visit GNU Free Documentation License.
============= ### Downloads
CRAN downloads since initial release to CRAN (2015-07-28): as tracked by RStudio CRAN mirror
CRAN downloads in the last month:
CRAN downloads in the last week:
================ ### Requirements
PRIMsrc (>= 0.7.5) requires R-3.0.2 (2013-09-25). It was built and tested under R version 3.4.2 (2017-09-28) and Travis CI.
Installation has been tested on Windows, Linux, OSX and Solaris platforms.
See Travis CI build result:
See CRAN checks: .
================ ### Installation
PRIMsrcfrom the CRAN repository, simply download and install the current version (0.7.5) from the CRAN repository:
PRIMsrcfrom the GitHub repository, simply run the following using devtools:
install.packages("devtools") library("devtools") devtools::install_github("jedazard/PRIMsrc")
========= ### Usage
================== ### Website - Wiki
=================== ### Acknowledgments
Authors: + Jean-Eudes Dazard, Ph.D. (firstname.lastname@example.org) + Michael Choe, M.D. (email@example.com) + Michael LeBlanc, Ph.D. (firstname.lastname@example.org) + Alberto Santana, MBA. (email@example.com)
Maintainers: + Jean-Eudes Dazard, Ph.D. (firstname.lastname@example.org)
+ This work made use of the High Performance Computing Resource in the Core Facility for Advanced Research Computing at Case Western Reserve University. + This project was partially funded by the National Institutes of Health NIH - National Cancer Institute (R01-CA160593) to J-E. Dazard and J.S. Rao.
============== ### References
Dazard J-E. and Rao J.S. Variable Selection Strategies for High-Dimensional Survival Bump Hunting using Recursive Peeling Methods. [submitted (2017)].
Diaz D.A., Dazard J-E. and Rao J.S. Unsupervised Bump Hunting Using Principal Components. In: Ahmed SE, editor. Big and Complex Data Analysis: Methodologies and Applications. Contributions to Statistics, vol. Edited Refereed Volume. Springer International Publishing, Cham Switzerland (2017), 325-345.
Yi C. and Huang J. Semismooth Newton Coordinate Descent Algorithm for Elastic-Net Penalized Huber Loss Regression and Quantile Regression. J. Comp Graph. Statistics (2016), DOI: 10.1080/10618600.2016.1256816.
Dazard J-E., Choe M., LeBlanc M. and Rao J.S. Cross-validation and Peeling Strategies for Survival Bump Hunting using Recursive Peeling Methods. Statistical Analysis and Data Mining (2016), 9(1):12-42. (The American Statistical Association Data Science Journal)
Dazard J-E., Choe M., LeBlanc M. and Rao J.S. R package PRIMsrc: Bump Hunting by Patient Rule Induction Method for Survival, Regression and Classification. In JSM Proceedings, Statistical Programmers and Analysts Section. Seattle, WA, USA. American Statistical Association IMS - JSM, p. 650-664. JSM (2015).
Dazard J-E., Choe M., LeBlanc M. and Rao J.S. Cross-Validation of Survival Bump Hunting by Recursive Peeling Methods. In JSM Proceedings, Survival Methods for Risk Estimation/Prediction Section. Boston, MA, USA. American Statistical Association IMS - JSM, p. 3366-3380. JSM (2014).
Dazard J-E. and J.S. Rao. Local Sparse Bump Hunting. J. Comp Graph. Statistics (2010), 19(4):900-92.