Visualising FFTrees

You can visualize an FFTrees object x in one of two ways: First, you can visualize cue accuracies with plot(x, what = 'cues'). Second, you can visualize individual trees and performance statistics with plot(x).

Example: titanic

The titanic dataset contains survival statistics of passengers on the Titanic. For each passenger, we know what passenger class (s)he was, his/her age, his/her sex, and whether or not (s)he survived.

Here is how the first few rows of the dataframe look:

head(titanic)
##   class   age  sex survived
## 1 first adult male        1
## 2 first adult male        1
## 3 first adult male        1
## 4 first adult male        1
## 5 first adult male        1
## 6 first adult male        1

Our goal is to build FFTrees that predict whether or not a passenger will survive based on these cues.

First, let’s create an FFTrees object called titanic.fft from the titanic dataset.

# Create fast-and-frugal trees from the titanic data

titanic.fft <- FFTrees(formula = survived ~.,
                       data = titanic)       

Visualising cue accuracies

You can visualize individual cue accuracies (specifically their sensitivities and specificities) by including the what = 'cues' argument within the plot() function. Let’s apply the function to the titanic.fft object to see how accurate each of the cues were on their own in predicting survival:

plot(titanic.fft,
     main = "Titanic cue accuracy",
     what = 'cues')

Wow. None of the cues did very well on their own. Good performing cues should be in the top left hand of the graph (i.e.; low false alarm rate and high hit rate). It looks like the best cue was sex, followed by class. age was a pretty terrible cue.

Plotting a tree

To plot the tree from an FFTrees object, use plot(). You can add some stylistic arguments like description and decision.labels. Let’s plot one of the trees:

plot(titanic.fft, 
     main = "Titanic", 
     decision.labels = c("Died", "Survived"))

This plot contains a lot of information, here are the main elements:

Additional arguments

You can specify additional arguments to the plot() command that will change what is displayed

# Show the best training titanic fast-and-frugal tree without statistics
plot(titanic.fft,
     decision.labels = c("Died", "Survived"),
     stats = FALSE)

For example, let’s repeat the previous analysis, but now we’ll create separate training and test datasets by including the train.p = .5 argument. This will split the dataset into a 50% training set, and a 50% testing set (note: you could also define an explicit test data set with the data.test argument)

set.seed(100) # For replicability of the training/test split
titanic.pred.fft <- FFTrees(formula = survived ~.,
                            data = titanic,
                            train.p = .5)

Here is the best training tree applied to the training data:

plot(titanic.pred.fft,
     tree = "best.train", 
     main = "Titanic", 
     decision.labels = c("Died", "Survived"))

The best training tree (tree #3) had a high specificity of 92%, but a low hit rate of just 48%. However, as we can see in the ROC table, LR didn’t perform much better, and CART did even worse than tree #3.

Now let’s apply the same tree to the test dataset:

plot(titanic.pred.fft,
     tree = "best.train",
     data = "test", 
     main = "Titanic", 
     decision.labels = c("Died", "Survived"))

Performance has actually increased in this test data (e.g.; the hit-rate is up to 54%). However, both logistic regression and CART did similarly.

Let’s visualise tree #4, the most liberal tree with the highest hit-rate:

plot(titanic.pred.fft,
     tree = 4,
     data = "test", 
     main = "Titanic", 
     decision.labels = c("Died", "Survived"))

Tree #4 was able to increase the testing hit-rate up to 65%, but at a cost of a lower specificity of 70%.