The `mushrooms`

dataset contains data about mushrooms (see `?mushrooms`

for details). The goal of our model is to predict which mushrooms are poisonous based on 22 cues ranging from the mushroomâ€™s odor, color, etc.

Here are the first few rows of the data:

`head(mushrooms)`

```
## poisonous cshape csurface ccolor bruises odor gattach gspace gsize
## 1 TRUE x s n t p f c n
## 2 FALSE x s y t a f c b
## 3 FALSE b s w t l f c b
## 4 TRUE x y w t p f c n
## 5 FALSE x s g f n f w b
## 6 FALSE x y y t a f c b
## gcolor sshape sroot ssaring ssbring scaring scbring vtype vcolor ringnum
## 1 k e e s s w w p w o
## 2 k e c s s w w p w o
## 3 n e c s s w w p w o
## 4 n e e s s w w p w o
## 5 k t e s s w w p w o
## 6 n e c s s w w p w o
## ringtype sporepc population habitat
## 1 p k s u
## 2 p n n g
## 3 p n n m
## 4 p k s u
## 5 e n a g
## 6 p k n g
```

Letâ€™s create some trees using `FFTrees()`

, weâ€™ll use the `train.p = .5`

argument to split the original data into a 50% training set and a 50% testing set.

```
# Create FFTs from the mushrooms data
set.seed(100) # For replicability of the training / test data split
mushrooms.fft <- FFTrees(formula = poisonous ~.,
data = mushrooms,
train.p = .5, # Split data into 50\50 training \ test
main = "Mushrooms",
decision.labels = c("Safe", "Poison"))
```

Hereâ€™s basic information about the best performing FFT:

```
# Print information about the best performing tree
mushrooms.fft
```

```
## Mushrooms
## FFT 1 (of 6) predicts poisonous using 2 cues: {odor, sporepc}
##
## [1] If odor != {f,s,y,p,c,m}, decide Safe.
## [2] If sporepc != {h,w,r}, decide Safe, otherwise, decide Poison.
##
## train test
## cases .n 4062.00 4062.00
## hits .hi 1699.00 1649.00
## misses .mi 279.00 289.00
## false al .fa 0.00 0.00
## corr rej .cr 2084.00 2124.00
## speed .mcu 1.47 1.46
## frugality .pci 0.94 0.94
## cost .cost 0.07 0.07
## accuracy .acc 0.93 0.93
## balanced .bacc 0.93 0.93
## sensitivity .sens 0.86 0.85
## specificity .spec 1.00 1.00
##
## pars: algorithm = 'ifan', goal = 'wacc', goal.chase = 'wacc', sens.w = 0.5, max.levels = 4
```

Letâ€™s look at the individual cue training accuracies with `plot()`

:

```
# Show mushrooms cue accuracies
plot(mushrooms.fft,
what = "cues")
```

It looks like the cues `oder`

and `sporepc`

are the best predictors. in fact, the single cue *odor* has a hit rate of 97% and a false alarm rate of 0%! Based on this, we should expect the final trees to use just these cues.

Now letâ€™s plot the best training tree applied to the test dataset

```
# Plot the best FFT for the mushrooms data
plot(mushrooms.fft,
data = "test")
```

Indeed, it looks like the best tree only uses the *odor* and *sporepc* cues. In our test dataset, the tree had a false alarm rate of 0% (1 - specificity), and a hit rate of 85%.

Now, letâ€™s say that you talk to a mushroom expert who says that we are using the wrong cues. According to her, the best predictors for poisonous mushrooms are *ringtype* and *ringnum*. Letâ€™s build a set of trees with these cues and see how they perform relative to our initial tree:

```
# Create trees using only ringtype and ringnum
mushrooms.ring.fft <- FFTrees(formula = poisonous ~ ringtype + ringnum,
data = mushrooms,
train.p = .5,
main = "Mushrooms (Ring Only)",
decision.labels = c("Safe", "Poison"))
```

Here is the best training tree:

```
plot(mushrooms.ring.fft,
data = "test")
```

As we can see, this tree did not perform nearly as well as our earlier one.

The `iris.v`

dataset contains data about 150 flowers (see `?iris.v`

). Our goal is to predict which flowers are of the class Virginica. In this example, weâ€™ll create trees using the entire dataset (without an explicit test dataset)

```
iris.fft <- FFTrees(formula = virginica ~.,
data = iris.v,
main = "Iris",
decision.labels = c("Not-V", "V"))
```

First, letâ€™s look at the individual cue training accuracies:

```
plot(iris.fft,
what = "cues")
```

It looks like the cues *pet.wid* and *pet.len* are the best predictors. Based on this, we should expect the final trees will likely use just one or both of these cues

Now letâ€™s plot the best tree

`plot(iris.fft)`

Indeed, it looks like the best tree only uses the *pet.wid* and *pet.len* cues. In our test dataset, the tree had a sensitivity of 100% and specificity of 95%.

Now, this tree did quite well, but what if someone wants a tree with the lowest possible false alarm rate. If we look at the ROC plot in the bottom left corner of the plot above, we can see that tree #2 has a specificity close to 100%. Letâ€™s look at that tree:

```
plot(iris.fft,
tree = 2) # Show tree #2
```

As you can see, this tree does indeed have a higher specificity of 99%. However, it comes at a cost of a lower sensitivity of 82%.