Here are some examples of FFTrees created from datasets in the `FFTrees`

package

The `mushrooms`

dataset contains data about mushrooms (see `?mushrooms`

for details). The goal of our model is to predict which mushrooms are poisonous based on 22 cues ranging from the mushroom’s odor, color, etc.

Let’s create some trees. We’ll start by creating a training and test dataset:

```
set.seed(100) # For replicability of the training / test data split
train.samples <- sample(nrow(mushrooms), size = 4000)
mushrooms.train <- mushrooms[train.samples, ]
mushrooms.test <- mushrooms[setdiff(1:nrow(mushrooms), train.samples), ]
mushrooms.fft <- FFTrees(formula = poisonous ~.,
data = mushrooms.train,
data.test = mushrooms.test)
```

```
## Warning in predict.lm(object, newdata, se.fit, scale = 1, type =
## ifelse(type == : prediction from a rank-deficient fit may be misleading
```

Here’s basic information about the trees:

`mushrooms.fft`

```
## [1] "6 FFTs using up to 4 of 22 cues"
## [1] "FFT #3 uses 2 cues {odor,sporepc} with the following performance:"
## train test
## n 4000.00 4124.00
## pci 0.94 0.94
## mcu 1.47 1.46
## acc 0.93 0.93
## bacc 0.93 0.93
## sens 0.86 0.85
## spec 1.00 1.00
```

First, let’s look at the individual cue training accuracies with `plot()`

:

`plot(mushrooms.fft, main = "Mushrooms", what = "cues")`

It looks like the cues *oder* and *sporepc* are the best predictors. in fact, the single cue *odor* has a hit rate of 97% and a false alarm rate of 0%! Based on this, we should expect the final trees to use just these cues.

Now let’s plot the best tree applied to the test dataset

```
plot(mushrooms.fft,
data = "test",
description = "Mushrooms FFT",
decision.names = c("Safe", "Poisonous"))
```

Indeed, it looks like the best tree only uses the *odor* and *sporepc* cues. In our test dataset, the tree had a false alarm rate of 0% (1 - specificity), and a hit rate of 85%.

Now, let’s say that you talk to a mushroom expert who says that we are using the wrong cues. According to her, the best predictors for poisonous mushrooms are *ringtype* and *ringnum*. Let’s build a set of trees with these cues and see how they perform relative to our initial tree:

```
mushrooms.ring.fft <- FFTrees(formula = poisonous ~ ringtype + ringnum,
data = mushrooms.train,
data.test = mushrooms.test)
```

```
## Warning in predict.lm(object, newdata, se.fit, scale = 1, type =
## ifelse(type == : prediction from a rank-deficient fit may be misleading
```

Here is the final tree:

```
plot(mushrooms.ring.fft,
data = "test",
description = "Mushrooms (Ring only) FFT",
decision.names = c("Safe", "Poisonous"))
```

As we can see, this tree did not perform nearly as well as our earlier one.

The `iris.v`

dataset contains data about 150 flowers (see `?iris.v`

). Our goal is to predict which flowers are of the class Virginica. In this example, we’ll create trees using the entire dataset (without an explicit test dataset)

```
iris.fft <- FFTrees(formula = virginica ~.,
data = iris.v)
```

First, let’s look at the individual cue training accuracies with `showcues`

:

`plot(iris.fft, what = "cues")`

It looks like the cues *pet.wid* and *pet.len* are the best predictors. Based on this, we should expect the final trees will likely use just one or both of these cues

Now let’s plot the best tree applied to the test dataset

```
plot(iris.fft,
description = "Iris FFT",
decision.names = c("Not V", "Virginica"))
```

Indeed, it looks like the best tree only uses the *pet.wid* and *pet.len* cues. In our test dataset, the tree had a false alarm rate of 6% (1 - specificity) and a hit rate of 100%.

Now, this tree did quite well, but what if someone wants a tree with the lowest possible false alarm rate. If we look at the ROC plot in the bottom left corner of the plot above, we can see that tree #2 has a false alarm rate close to 0%. Let’s look at that tree:

```
plot(iris.fft,
description = "Iris FFT",
decision.names = c("Not V", "Virginica"),
tree = 2) # Show tree #6
```

As you can see, this tree does indeed have a lower false alarm rate of 2%. However, it comes at a cost of a lower hit-rate.