Visualising FFTrees

You can visualize FFTrees objects in one of two ways: First, you can visualize cue accuracies with showcues(). Second, you can visualize individual trees and performance statistics by applying the plot() function to an FFTrees object.

The titanic dataset

The titanic dataset contains survival statistics of passengers on the Titanic. For each passenger, we know what passenger class (s)he was, his/her age, his/her sex, and whether or not (s)he survived.

Here is how the first few rows of the dataframe look:

head(titanic)
##   class   age  sex survived
## 1 first adult male        1
## 2 first adult male        1
## 3 first adult male        1
## 4 first adult male        1
## 5 first adult male        1
## 6 first adult male        1

Our goal is to build FFTrees that predict whether or not a passenger will survive based on these cues.

Creating an FFTrees object

First, let’s create an FFTrees object called titanic.fft from the titanic dataset.

titanic.fft <- FFTrees(formula = survived ~.,
                       data = titanic)

Visualising cue accuracies

You can visualize individual cue accuracies (specifically their hit rates and false alarm rates) by including the what = 'cues' argument within the plot() function. Let’s apply the function to the titanic.fft object to see how accurate each of the cues were on their own in predicting survival:

plot(titanic.fft,
     main = "Titanic cue accuracy",
     what = 'cues')

Wow. None of the cues did very well on their own. Good performing cues should be in the top left hand of the graph (i.e.; low false alarm rate and high hit rate). It looks like the best cue was sex, followed by class. age was a pretty terrible cue.

Plotting a tree

To plot the tree from an FFTrees object, use plot(). You can add some stylistic arguments like description and decision.names. Let’s plot one of the trees:

plot(titanic.fft, 
     main = "Titanic", 
     decision.names = c("Died", "Survived"))

This plot contains a lot of information, here are the main elements:

Additional arguments

You can specify additional arguments to the plot() command that will change what is displayed

For example, let’s repeat the previous analysis, but now we’ll create separate training and test datasets by including the train.p = .5 argument. This will split the dataset into a 50% training set, and a 50% testing set (note: you could also define an explicit test data set with the data.test argument)

set.seed(100) # For replicability of the training/test split
titanic.pred.fft <- FFTrees(formula = survived ~.,
                            data = titanic,
                            train.p = .5)

Here is the best training tree applied to the training data:

plot(titanic.pred.fft,
     tree = "best.train", 
     main = "Titanic", 
     decision.names = c("Died", "Survived"))

The best training tree (tree #3) had a high specificity of 92%, but a low hit rate of just 48%. However, as we can see in the ROC table, LR didn’t perform much better, and CART did even worse than tree #3.

Now let’s apply the same tree to the test dataset:

plot(titanic.pred.fft,
     tree = "best.train",
     data = "test", 
     main = "Titanic", 
     decision.names = c("Died", "Survived"))

Performance has actually increased in this test data (e.g.; the hit-rate is up to 54%). However, both logistic regression and CART did similarly. Let’s see how tree #4, the most liberal tree, did:

plot(titanic.pred.fft,
     tree = 4,
     data = "test", 
     main = "Titanic", 
     decision.names = c("Died", "Survived"))

Tree #4 was able to increase the testing hit-rate up to 65%, but at a cost of a lower specificity of 70%.

Additional arguments