Introduction to glmnetUtils

The glmnetUtils package provides a collection of tools to streamline the process of fitting elastic net models with glmnet. I wrote the package after a couple of projects where I found myself writing the same boilerplate code to convert a data frame into a predictor matrix and a response vector. In addition to providing a formula interface, it also features a function cva.glmnet to do crossvalidation for both $$\alpha$$ and $$\lambda$$, as well as some utility functions.

The formula interface

The interface that glmnetUtils provides is very much the same as for most modelling functions in R. To fit a model, you provide a formula and data frame. You can also provide any arguments that glmnet will accept. Here are some simple examples for different types of data:

# least squares regression
(mtcarsMod <- glmnet(mpg ~ cyl + disp + hp, data=mtcars))
## Call:
## glmnet.formula(formula = mpg ~ cyl + disp + hp, data = mtcars)
##
## Model fitting options:
##     Sparse model matrix: FALSE
##     Use model.frame: FALSE
##     Alpha: 1
##     Lambda summary:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
## 0.03326 0.11690 0.41000 1.02800 1.44100 5.05500
# multinomial logistic regression with specified elastic net alpha parameter
(irisMod <- glmnet(Species ~ ., data=iris, family="multinomial", alpha=0.5))
## Call:
## glmnet.formula(formula = Species ~ ., data = iris, alpha = 0.5,
##     family = "multinomial")
##
## Model fitting options:
##     Sparse model matrix: FALSE
##     Use model.frame: FALSE
##     Alpha: 0.5
##     Lambda summary:
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
## 0.0000870 0.0008707 0.0087090 0.0979200 0.0870700 0.8700000
# Poisson regression with an offset
(InsMod <- glmnet(Claims ~ District + Group + Age, data=MASS::Insurance,
family="poisson", offset=log(Holders)))
## Call:
## glmnet.formula(formula = Claims ~ District + Group + Age, data = MASS::Insurance,
##     family = "poisson", offset = log(Holders))
##
## Model fitting options:
##     Sparse model matrix: FALSE
##     Use model.frame: FALSE
##     Alpha: 1
##     Lambda summary:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
## 0.02877 0.11610 0.46880 1.40500 1.89300 7.64100

Under the hood, glmnetUtils creates a model matrix and response vector, and passes them to the glmnet package to do the actual model fitting. A simple print method is also provided, to show the main model details at a glance. I’ll describe shortly what the “sparse model matrix” and “use model.frame” options do.

Predicting from a model works as you’d expect: just pass a data frame containing the new observations to the predict method. You can also specify any arguments that predict.glmnet accepts.

# least squares regression: get predictions for lambda=1
predict(mtcarsMod, newdata=mtcars, s=1)

# multinomial logistic regression: get predicted class
predict(irisMod, newdata=iris, type="class")

# Poisson regression: need to specify offset
predict(InsMod, newdata=MASS::Insurance, offset=log(Holders))

If you want, you can still use the original model matrix-plus-response syntax:

mtcarsX <- as.matrix(mtcars[c("cyl", "disp", "hp")])

Conclusion

The glmnetUtils package is a way to improve quality of life for users of glmnet. As with many R packages, it’s always under development; you can get the latest version from my GitHub repo. If you find a bug, or if you want to suggest improvements to the package, please feel free to contact me at hongooi@microsoft.com.