A Quick Introduction to iNextPD via Examples
T. C. Hsieh
20170311
iNextPD
(iNterpolation and extrapolation for Phylogenetic Diversity) is an R package provides the rarefaction and extrapolation framework to making fair comparison of abundancesensitive phylogenetic diversity among multiple assemblages (Hsieh and Chao, Systematic Biology, 2016). In this document, we provide a quick introduction demonstrating how to run iNextPD
. Detailed information about iNextPD
functions is provided in the iNextPD Manual, also available in CRAN. See Chao et al. (2015) and Hsieh and Chao (2016) for methodologies. An online version of PhDOnline
(https://chao.shinyapps.io/PhDOnline/) is also available for users without an R background. A neutral theory of species diversity is included in Chao et al. (2014); and a brief description of methods and R package (iNEXT
) are included in an application paper by Hsieh, Ma & Chao (2016).
iNextPD
is an extension for iNEXT
, which extending trdional rarefaction and extrapoltion framework for species diversity to abundancesensitive phylogenetic diversity. iNextPD
focuses on three measures of Hill numbers of order q: Faith’s PD (q = 0
), a simple transformation of phylogenetic entropy (q = 1
) and and a simple transformation of Rao’s quadratic entropy (q = 2
). For each diversity measure, iNextPD
uses the observed sample of abundance or incidence data (called the “reference sample”) to compute diversity estimates and the associated 95% confidence intervals for the following two types of rarefaction and extrapolation (R/E):
 Sample‐size‐based R/E sampling curves:
iNextPD
computes diversity estimates for rarefied and extrapolated samples up to an appropriate size. This type of sampling curve plots the diversity estimates with respect to sample size (tyep=1
).
 Coverage‐based R/E sampling curves:
iNextPD
computes diversity estimates for rarefied and extrapolated samples with sample completeness (as measured by sample coverage) up to an appropriate coverage. This type of sampling curve plots the diversity estimates with respect to sample coverage (type=3
).
iNextPD
also plots the above two types of sampling curves and a sample completeness curve. The sample completeness curve provides a bridge between these two types of curves (type=2
).
SOFTWARE NEEDED TO RUN INEXTPD IN R
HOW TO RUN INEXTPD:
The iNextPD
package is available on CRAN and can be downloaded with a standard R installation procedure using the following commands. For a first‐time installation, the additional visualization extension packages (ade4
, ggplot2
, iNEXT
, Rcpp
) must be loaded.
## install iNEXT package from CRAN
install.packages("iNextPD")
## install the latest version from github
install.packages('devtools')
library(devtools)
install_github('JohnsonHsieh/iNextPD')
## import packages
library(iNextPD)
library(ggplot2)
library(ade4)
Remark: In order to install devtools
package, you should update R to the latest version. Also, to get install_github
to work, you should install the httr
package.
MAIN FUNCTION: iNextPD()
We first describe the main function iNextPD()
with default arguments:
iNextPD(x, labels, phy, q=0, datatype="abundance", size=NULL, endpoint=NULL, knots=40, se=FALSE, conf=0.95, nboot=50)
The arguments of this function are briefly described below, and will be explained in more details by illustrative examples in later text. This main function computes diversity estimates of order q = 0, 1, 2, the sample coverage estimates and related statistics for K (if
knots=K
) evenly‐spaced knots (sample sizes) between size 1 and the
endpoint
, where the endpoint is described below. Each knot represents a particular sample size for which diversity estimates will be calculated. By default, endpoint = double the reference sample size (total sample size for abundance data; total sampling units for incidence data). For example, if
endpoint = 10
,
knot = 4
, diversity estimates will be computed for a sequence of samples with sizes (1, 4, 7, 10).
x

a matrix , data.frame , lists of species abundances or incidence data.

labels

species names for object x .

phy

a phylog objcet for input phylotree.

q

a number or vector specifying the diversity order(s) of Hill numbers.

datatype

data type of input data: individualbased abundance data (datatype = "abundance" ), or species by samplingunits incidence matrix (datatype = "incidence_raw" ).

size

an integer vector of sample sizes for which diversity estimates will be computed. If NULL , then diversity estimates will be calculated for those sample sizes determined by the specified/default endpoint and knots .

endpoint

an integer specifying the sample size that is the endpoint for R/E calculation; If NULL , then endpoint= double the reference sample size.

knots

an integer specifying the number of equallyspaced knots (say K, default is 40) between size 1 and the endpoint ; each knot represents a particular sample size for which diversity estimate will be calculated. If the endpoint is smaller than the reference sample size, then iNextPD() computes only the rarefaction esimates for approximately K evenly spaced knots. If the endpoint is larger than the reference sample size, then iNextPD() computes rarefaction estimates for approximately K/2 evenly spaced knots between sample size 1 and the reference sample size, and computes extrapolation estimates for approximately K/2 evenly spaced knots between the reference sample size and the endpoint .

se

a logical variable to calculate the bootstrap standard error and conf confidence interval.

conf

a positive number < 1 specifying the level of confidence interval, default is 0.95.

nboot

an integer specifying the number of bootstrap replications.

This function returns an "iNextPD"
object which can be further used to make plots using the function ggiNEXT()
to be described below.
POINT ESTIMATION FUNCTION: estimatePD()
We also supply the function
estimatePD(x, labels, phy, datatype="abundance", base="size",
level=NULL, conf=0.95, digits=4)
to compute diversity estimates with q = 0, 1, 2 for any particular level of sample size (base="size"
) or any specified level of sample coverage (base="coverage"
) for either abundance data (datatype="abundance"
) or incidence data ("incidence_raw"
). If level=NULL
, this function computes the diversity estimates for the minimum sample size/coverage among all sites.
For example, the following command returns the species diversity with a specified level of sample coverage of 97.5% for the bird abundancebased data. For some sites, this coverage value corresponds to the rarefaction part whereas the others correspond to extrapolation, as indicated in the method of the output.
estimatePD(bird$abun, bird.lab, bird.phy, "abundance",
base="coverage", level=0.975, conf=0.95)
site m method order SC qPD qPD.95.LCL qPD.95.UCL
1 North.site 227.0711 extrapolated 0 0.975 1248.1118 1128.6005 1367.6232
3 North.site 227.0711 extrapolated 1 0.975 439.4657 386.8856 492.0458
5 North.site 227.0711 extrapolated 2 0.975 212.5806 179.6321 245.5291
8 South.site 247.8890 interpolated 0 0.975 1367.1348 1288.1097 1446.1600
10 South.site 247.8890 interpolated 1 0.975 451.9783 412.1321 491.8246
12 South.site 247.8890 interpolated 2 0.975 205.6565 176.3483 234.9647
GRAPHIC DISPLAYS: FUNCTION ggiNEXT()
The function ggiNEXT()
, which extends ggplot2
to the "iNextPD"
object with default arguments, is described as follows:
ggiNEXT(x, type=1, se=TRUE, facet.var="none", color.var="site", grey=FALSE)
Here x
is an "iNextPD"
object. Three types of curves are allowed:
Samplesizebased R/E curve (type=1
): this curve plots diversity estimates with confidence intervals (if se=TRUE
) as a function of sample size up to double the reference sample size, by default, or a user‐specified endpoint
.
Sample completeness curve (type=2
) with confidence intervals (if se=TRUE
): this curve plots the sample coverage with respect to sample size for the same range described in (1).
Coveragebased R/E curve (type=3
): this curve plots the diversity estimates with confidence intervals (if se=TRUE
) as a function of sample coverage up to the maximum coverage obtained from the maximum size described in (1).
The argument facet.var=("none", "order", "site" or "both")
is used to create a separate plot for each value of the specified variable. For example, the following code displays a separate plot for each value of the diversity order q. The user may also use the argument grey=TRUE
to plot black/white figures. The usage of color.var is illustrated in the incidence data example described in later text. The ggiNEXT()
function is a wrapper around ggplot2
package to create a R/E curve using a single line of code. The resulting object is of class "ggplot"
, so can be manipulated using the ggplot2
tools.
out < iNextPD(bird$abun, bird.lab, bird.phy,
q=c(0, 1, 2), datatype="abundance", endpoint=400)
# Sample‐size‐based R/E curves, separating by "site""
ggiNEXT(out, type=1, facet.var="site")
## Not run:
# Sample‐size‐based R/E curves, separating by "order"
ggiNEXT(out, type=1, facet.var="order")
# display black‐white theme
ggiNEXT(out, type=1, facet.var="order", grey=TRUE)
## End(Not run)
The argument facet.var="site"
in ggiNEXT
function creates a separate plot for each site as shown below:
# Sample‐size‐based R/E curves, separating by "site""
ggiNEXT(out, type=1, facet.var="site")
The argument facet.var="order"
and color.var="site"
creates a separate plot for each diversity order site, and within each plot, different colors are used for two sites.
ggiNEXT(out, type=1, facet.var="order", color.var="site")
The following commands return the sample completeness curve in which different colors are used for the two sites:
ggiNEXT(out, type=2, facet.var="none", color.var="site")
The following commands return the coverage‐based R/E sampling curves in which different colors are used for the two sites (facet.var="site"
) and for three orders (facet.var="order"
)
ggiNEXT(out, type=3, facet.var="site")
ggiNEXT(out, type=3, facet.var="order", color.var="site")