%\VignetteEngine{knitr::knitr} %\VignetteIndexEntry{Classyfire Cheat Sheet}

General Package Handling

Install from CRAN


Load the classyfire package within R


Get the classyfire help overview


Building a classification ensemble

Loading some test data, for instance the iris dataset


irisClass <- iris[,5]
irisData  <- iris[,-5]

Construct a classification ensemble in parallel (using 4 cpus in this instance) that consists of 10 independent classification models (classifiers) optimised using 10 bootstrap iterations

ens <- cfBuild(inputData = irisData, inputClass = irisClass, bootNum = 10, ensNum = 10,
               parallel = TRUE, cpus = 4, type = "SOCK")

Similarly, in sequence:

ens <- cfBuild(inputData = irisData, inputClass = irisClass, bootNum = 10, ensNum = 10,
               parallel = FALSE)

The list of attributes available for each classifier in the ensemble is provided by the function:


Get the overall average test and train accuracy


Get the individual test and train accuracies in the ensemble


# Alternatively


Testing new unknown data

In this instance, we are going to randomly generate test data (that represent a new input dataset of unknown classes) to find out their classes using the generated ensemble. The new dataset must have exactly the same number of columns as the inputData, passed as an argument in cfBuild. In the following example, 400 points are selected at random, which results in 100 samples (rows).

testMatr <- matrix(runif(400)*100, ncol = ncol(irisData))           
predRes  <- cfPredict(ens, testMatr)

Determining statistical significance by permutation testing

Execute five permutation rounds; in each permutation test, an ensemble of 10 classifiers is constructed, each running 10 bootstrap iterations during the optimization process. The default values for permutation testing are ensNum, bootNum and permNum equal to 100.

permObj <- cfPermute(irisData, irisClass, bootNum = 10, ensNum = 10, permNum = 5, 
                     parallel = TRUE, cpus = 4, type = "SOCK")

Get the vector of averaged accuracies, one for each permutation (each permutation is an independent classification ensemble)


Get the overall elapsed time for the permutation process, and the vector of individual execution times for each permutation respectively


Access the first ensemble in the permutation list


Evaluating the classification ensemble

All the functions for descriptive statistics within classyfire start with the prefix “get”. For example:

Get the average test and/or train accuracy of the ensemble


Get the vectors of test and/or train accuracies of the classifiers in the ensemble


Get the confusion matrix summarising the performance of the ensemble


Get the optimal SVM hyperparameters of the classification ensemble

optParam <- getOptParam(ens)

Return the “five number summary”, a descriptive statistic that consists of the minimum, first (lower) quartile, median, third (upper) quartile and maximum value of a given distribution. In this case, the function is applied directly on the output of permutation testing, generated by the cfPermute function.


Plotting functions within classyfire

All the functions for plotting within classyfire start with the prefix “gg” since the library ggplot2 is in use. For example:

The ggClasPred function generates a barplot with the per class accuracies (%) for all the correctly classified and misclassified samples in the classification ensemble.

# Show the percentages of correctly classified samples in 
# a barplot with or without text respectively

ggClassPred(ens, showText = TRUE)

# Show the percentages of classified and missclassified samples
# in a barplot simultaneously with and without text

ggClassPred(ens, displayAll = TRUE)
ggClassPred(ens, position = "stack", displayAll = TRUE)
ggClassPred(ens, position = "stack", displayAll = TRUE, showText = TRUE)

# Alernatively, using a dodge position
ggClassPred(ens, position = "dodge", displayAll = TRUE)
ggClassPred(ens, position = "dodge", displayAll = TRUE, showText = TRUE)

The ggEnsTrend function displays the average test accuracies for every new classifier added to the ensemble, as constructed by the cfBuild function.


# Plot with text 
ggEnsTrend(ens, showText  = TRUE)

# Plot with text; set different limits on y axis 
ggEnsTrend(ens, showText  = TRUE, ylims=c(90, 100))

The ggEnsHist function generates a histogram of the ensemble results as generated by cfBuild.


# Density plot of the test accuracies in the ensemble
ggEnsHist(ens, density = TRUE)

# Density plot that highlights additional descriptive statistics
ggEnsHist(ens, density = TRUE, percentiles=TRUE)
ggEnsHist(ens, density = TRUE, percentiles=TRUE, mean=TRUE)
ggEnsHist(ens, density = TRUE, percentiles=TRUE, median=TRUE)

The ggPermHist function generates a histogram of the permutation results as generated by cfPermute.


# Density plot 
ggPermHist(permObj, density=TRUE)

# Density plot that highlights additional descriptive statistics
ggPermHist(permObj, density=TRUE, percentiles = TRUE, mean = TRUE)
ggPermHist(permObj, density=TRUE, percentiles = TRUE, median = TRUE)

Finally, the ggFusedHist function generates a histogram for simultaneous visual comparison of the classification and permutation distributions.

ggFusedHist(ensObj, permObj)