Gambin overview

Thomas J. Matthews and Colin Gillespie

Overview

The gambin distribution is a sample distribution based on a stochastic model of species abundances, and has been demonstrated to fit empirical data better than the most commonly used species-abundance distribution (SAD) models (see Matthews et al. (2014) and Ugland et al. (2007)). Gambin is a stochastic model which combines the gamma distribution with a binomial sampling method. To fit the gambin distribution, the abundance data is first binned into octaves using a simple log2 transform that doubles the number of abundance classes within each octave. Thus, octave 0 contains the number of species with 1 individual, octave 1 the number of species with 2 or 3 individuals, octave 2 the number of species with 4 to 7 individuals, and so forth (method 3 in J. S. Gray, Bjørgesæter, and Ugland (2006)).

The gambin distribution is flexible, meaning it can fit a variety of empirical SAD shapes (including lognormal and logseries-like shapes), and that the distribution shape (in the context of the unimodal gambin model) is adequately characterised by the model’s single parameter (α): low values of alpha indicate logserieslike SADs, and high alpha values indicate lognormal-like SADs. As such, the alpha parameter can be used as a metric to compare the shape of SADs from different ecological communities; for example, along an environmental gradient (e.g Arellano et al. (2017))

The expected abundance octave of a species is given by the number of successfull consecutive Bernoulli trials with a given parameter \(p\). The parameter \(p\) of species is assumed to distributed according to a gamma distribution. This approach can be viewed as linking the gamma distribution with the probability of success in a binomial process with \(x\) trials. Use the fit_abundances() function to fit the gambin model to a vector of species abundances, optionally using a subsample of the individuals. The package estimates the alpha (shape) parameter with associated confidence intervals. Methods are provided for plotting the results, and for calculating the likelihood of fits.

It has become increasingly apparent that many empirical SADs are in fact multimodal (Antão et al. (2017)). As such, recent work has focused on expanding the standard unimodal gambin model to allow it to fit distributions with multiple modes (Matthews et al. In prep). For example, the bimodal gambin model can be calculated as the integration of two gambin distributions. The corresponding likelihood function for the bimodal gambin model contain’s four parameters: the shape parameters for the first and second group, the max octave of the first group (as this is allowed to vary), and one splitting parameter (split) representing the fraction of objects in the first group.It is relatively straightforward to extend the above approach for fitting the bimodal gambin model by maximum likelihood, to fitting gambin models with g modes. For each additional mode, a further three parameters are needed: the additional alpha, max octave and split parameters (see Matthews et al. In prep). Use the fit_abundances() function in combination with the no_of_components of argument. The default is no_of_components = 1, which fits the standard unimodal gambin model. no_of_components = 2, fits the bimodal gambin model, and so on. As the optimisation procedure takes long with no_of_components > 1, it is possible to use the cores argument within fit_abundances to make use of parallel processing in the maximum likelihood optimisation.

Example

library("gambin")
data(moths, package="gambin")

##unimodal model
fit = fit_abundances(moths)
fit$alpha
## [1] 1.644694
barplot(fit)
points(fit)

AIC(fit)
## [1] 803.6409
##unimodal model (fit to a subsample of 1000 individuals)
fit2 = fit_abundances(moths, subsample = 1000)
fit2$alpha
## [1] 0.8722806
barplot(fit2)
points(fit2)

AIC(fit2)
## [1] 395.0488
##bimodal model (using 3 cores)

#simulate bimodal gambin distribution
x1 = rgambin(600, 5, 10)
x2 = rgambin(300, 1, 10)
x = table(c(x1,x2))
freq = as.vector(x)
values = as.numeric(as.character(names(x)))
abundances = data.frame(octave=values, species = freq)

#fit bimodal model to simulated data
fit3 = fit_abundances(abundances, no_of_components = 2, cores = 1)
## Using 1 core. Your machine has 12 available.
barplot(fit3)
points(fit3)

AIC(fit3)
## [1] 4026.448
#compare with AIC of unimodal model
AIC(fit_abundances(abundances))
## [1] 4037.638

References

Matthews, T.J. et al (In prep) Extension of the Gambin Distribution to Multimodal Species Abundance Distributions.

Antão, Laura H., Sean R. Connolly, Anne E. Magurran, Amadeu Soares, and Maria Dornelas. 2017. “Prevalence of Multimodal Species Abundance Distributions Is Linked to Spatial and Taxonomic Breadth.” Global Ecology and Biogeography 26 (2): 203–15. doi:10.1111/geb.12532.

Arellano, Gabriel, María N. Umaña, Manuel J. Macía, M. Isabel Loza, Alfredo Fuentes, Victoria Cala, and Peter M. Jørgensen. 2017. “The Role of Niche Overlap, Environmental Heterogeneity, Landscape Roughness and Productivity in Shaping Species Abundance Distributions Along the Amazon–Andes Gradient.” Global Ecology and Biogeography 26 (2): 191–202.

Gray, John S., Anders Bjørgesæter, and Karl. I. Ugland. 2006. “On Plotting Species Abundance Distributions.” Journal of Animal Ecology 75 (3). [Wiley, British Ecological Society]: 752–56.

Matthews, Thomas J, Michael K Borregaard, Karl I Ugland, Paulo AV Borges, François Rigal, Pedro Cardoso, and Robert J Whittaker. 2014. “The Gambin Model Provides a Superior Fit to Species Abundance Distributions with a Single Free Parameter: Evidence, Implementation and Interpretation.” Ecography 37 (10). Wiley Online Library: 1002–11.

Ugland, Karl I, P John D Lambshead, Brian McGill, John S Gray, Niall O’Dea, Richard J Ladle, and Robert J Whittaker. 2007. “Modelling Dimensionality in Species Abundance Distributions: Description and Evaluation of the Gambin Model.” Evolutionary Ecology Research 9 (2). Evolutionary Ecology, Ltd.: 313–24.