mlr: Machine Learning in R

Bernd Bischl, Michel Lang, Jakob Richter, Jakob Bossek, Leonard Judt, Tobias Kuehn, Erich Studerus, Lars Kotthoff

2018-03-07

mlr: Machine Learning in R

This Vignette is supposed to give you a short introductory glance at the key features of mlr. A more detailed in depth and continuously updated tutorial can be found on the GitHub project page:

Purpose

The main goal of mlr is to provide a unified interface for machine learning tasks as classification, regression, cluster analysis and survival analysis in R. In lack of a common interface it becomes a hassle to carry out standard methods like cross-validation and hyperparameter tuning for different learners. Hence, mlr offers the following features:

Quick Start

To highlight the main principles of mlr we give a quick introduction to the package. We demonstrate how to simply perform a classification analysis using a stratified cross validation, which illustrates some of the major building blocks of the mlr workflow, namely tasks and learners.

library(mlr)
## Loading required package: ParamHelpers
data(iris)

## Define the task:
task = makeClassifTask(id = "tutorial", data = iris, target = "Species")
print(task)
## Supervised task: tutorial
## Type: classif
## Target: Species
## Observations: 150
## Features:
##    numerics     factors     ordered functionals 
##           4           0           0           0 
## Missings: FALSE
## Has weights: FALSE
## Has blocking: FALSE
## Has coordinates: FALSE
## Classes: 3
##     setosa versicolor  virginica 
##         50         50         50 
## Positive class: NA
## Define the learner:
lrn = makeLearner("classif.lda")
print(lrn)
## Learner classif.lda from package MASS
## Type: classif
## Name: Linear Discriminant Analysis; Short name: lda
## Class: classif.lda
## Properties: twoclass,multiclass,numerics,factors,prob
## Predict-Type: response
## Hyperparameters:
## Define the resampling strategy:
rdesc = makeResampleDesc(method = "CV", stratify = TRUE)

## Do the resampling:
r = resample(learner = lrn, task = task, resampling = rdesc)
## Resampling: cross-validation
## Measures:             mmce
## [Resample] iter 1:    0.0000000
## [Resample] iter 2:    0.0000000
## [Resample] iter 3:    0.0666667
## [Resample] iter 4:    0.0666667
## [Resample] iter 5:    0.0000000
## [Resample] iter 6:    0.0000000
## [Resample] iter 7:    0.0666667
## [Resample] iter 8:    0.0000000
## [Resample] iter 9:    0.0000000
## [Resample] iter 10:   0.0000000
## 
## Aggregated Result: mmce.test.mean=0.0200000
## 
print(r)
## Resample Result
## Task: tutorial
## Learner: classif.lda
## Aggr perf: mmce.test.mean=0.0200000
## Runtime: 0.0647471
## Get the mean misclassification error:
r$aggr
## mmce.test.mean 
##           0.02

Detailed Tutorial

The previous example just demonstrated a tiny fraction of the capabilities of mlr. More features are covered in the tutorial which can be found online on the mlr project page. It covers among others: benchmarking, preprocessing, imputation, feature selection, ROC analysis, how to implement your own learner and the list of all supported learners. Reading is highly recommended!

Thanks

We would like to thank the authors of all packages which mlr uses under the hood: