Model Evaluation Audit for Classification Problem

In this vignette we present plots for classification models evaluation.

Dataset

We work on titanic dataset form the DALEX package.

titanic <- na.omit(DALEX::titanic)
titanic$survived = as.numeric(titanic$survived)-1
head(titanic)
##   gender age class    embarked       country  fare sibsp parch survived
## 1   male  42   3rd Southampton United States  7.11     0     0        0
## 2   male  13   3rd Southampton United States 20.05     0     2        0
## 3   male  16   3rd Southampton United States 20.05     1     1        0
## 4 female  39   3rd Southampton       England 20.05     1     1        1
## 5 female  16   3rd Southampton        Norway  7.13     0     0        1
## 6   male  25   3rd Southampton United States  7.13     0     0        1

Models

We fit 2 models: glm and svm.

model_glm <- glm(survived~., data = titanic, family = binomial)

library(e1071)
model_svm <- svm(survived~., data = titanic)

Model Audit

The first step is creating explainer object with the DALEX package. It’s an object that can be used to audit a model. It wraps up a model with meta-data.

exp_glm <- DALEX::explain(model_glm, data = titanic, y = titanic$survived)
## Preparation of a new explainer is initiated
##   -> model label       :  lm  (default)
##   -> data              :  2099  rows  9  cols 
##   -> target variable   :  2099  values 
##   -> data              :  A column identical to the target variable `y` has been found in the `data`.  (WARNING)
##   -> data              :  It is highly recommended to pass `data` without the target variable column
##   -> predict function  :  yhat.glm  will be used (default)
##   -> predicted values  :  numerical, min =  9.814966e-09 , mean =  0.3244402 , max =  1  
##   -> residual function :  difference between y and yhat (default)
##   -> residuals         :  numerical, min =  -0.9614217 , mean =  -1.68201e-09 , max =  0.9666502  
## A new explainer has been created!
exp_svm <- DALEX::explain(model_svm, data = titanic, y = titanic$survived, label = "svm")
## Preparation of a new explainer is initiated
##   -> model label       :  svm 
##   -> data              :  2099  rows  9  cols 
##   -> target variable   :  2099  values 
##   -> data              :  A column identical to the target variable `y` has been found in the `data`.  (WARNING)
##   -> data              :  It is highly recommended to pass `data` without the target variable column
##   -> predict function  :  yhat.svm  will be used (default)
##   -> predicted values  :  numerical, min =  -0.05516344 , mean =  0.2523206 , max =  1.059265  
##   -> residual function :  difference between y and yhat (default)
##   -> residuals         :  numerical, min =  -1.035725 , mean =  0.0721196 , max =  1.015941  
## A new explainer has been created!

Second step is creating auditor_model_evaluation object that can be further used for validating a model.

library(auditor)
eva_glm <- model_evaluation(exp_glm)
eva_svm <- model_evaluation(exp_svm)

Receiver Operating Characteristic (ROC)

auditor_model_evaluation object can be used for plotting charts.

plot(eva_glm, eva_svm, type = "roc")

plot of chunk unnamed-chunk-5

# or
# plot_roc(eva_glm, eva_svm)

LIFT Chart

plot(eva_glm, eva_svm, type = "lift")

plot of chunk unnamed-chunk-6

# or
# plot_lift(eva_glm, eva_svm)

Other methods

Other methods and plots are described in vignettes: