Introduction to co-correspondence analysis

Gavin L. Simpson

Among the ordination methods available to ecologists there are methods that relate a species abundance or occurrence matrix to a matrix of explanatory variables. Known as constrained or canonical ordination methods, redundancy analysis (RDA) and Canonical Correspondence Analysis (CCA) are the most commonly encountered forms. A restriction of these methods is that they are only constrained if there are fewer explanatory variables as numbers of observations or species, whichever is lowest, - 1.

Relating two species matrices is not possible using RDA or CCA unless the number of species in the data set playing the explanatory role is much smaller than the number of observations. Co-inertia analysis was invented as a solution to problems of this sort, but a deficiency is that it has an underlying linear response model like RDA.

Co-correspondence analysis (Co-CA) combines the ideas of co-inertia analysis with the unimodal response model familiar to correspondence analysis (CA) or CCA methods. The aim is to related two species abundance or occurrence matrices such that the resulting decomposition into axes are those combinations that best explain the covariation between species and observations in the two matrices.

There are two forms of Co-CA;

  1. Symmetric Co-CA, and
  2. Predictive Co-CA.

In symmetric Co-CA, neither of the two abundance or occurrence matrices plays the predictive or explanatory role. This method is best thought of as identifying the common patterns between the two assemblages. In contrast, in predictive Co-CA a more direct regression model is fitted where one matrix plays the response role and the other the predictor role. In this way, one set of species data is used to predict the other.

The key requirement for Co-CA is that the two assemblages have been collected at the same locations, just as you would if you wanted to explain species abundance as a function of environmental factors.

Symmetric Co-CA

As an illustration of symmetric Co-CA, we look at common patterns in a data set of beetles and plants. The data are provided with cocorresp

## log transform the beetle data
beetles <- log1p(beetles)

The data are observations of beetle and vascular plant species abundance at 30 roadside verges in the Netherlands. There are 30 beetle taxa and 30 vascular plant species. The abundances of the vascular plants are recorded on the 1–9 va der Maarel scale. To make the distributions of beetle species abundances more symmetric and stabilise variances, the counts are log transformed.

Both forms of Co-CA are fitted using the coca() function. The call comprises

  1. a formula, where the left-hand side is a community data frame or matric, and the right-hand side is typically a .,
  2. a data argument supplied a suitable data frame or matrix. This is the object used to form the terms on the right-hand side of the formula indicated by the . placeholder,
  3. the type of Co-CA model to fit, indicated by the method argument; options are "symmetric" and "predictive".

A symmetric Co-CA is fitted to the beetle and plant data sets as follows

bp.sym <- coca(beetles ~ ., data = plants, method = "symmetric")
## some species contain no data and were removed from data matrix y
## some species contain no data and were removed from data matrix x

Notice that it shouldn't make any difference which of the matrices is specified on the right- or left-hand sides of the formula. The messages printing suring fitting are for information; some species contained no data and hence were removed prior to fitting. This data processing step could have bee performed ahead of time via

beetles <- beetles[, colSums(beetles) > 0]
plants <- plants[, colSums(plants) > 0]
bp.sym <- coca(beetles ~ ., data = plants, method = "symmetric")

Printing the resulting object provides a relatively compact summary of the Co-CA model fitted

## Symmetric Co-Correspondence Analysis
## Call: symcoca(y = y, x = x, n.axes = n.axes, R0 = weights,
## symmetric = symmetric, nam.dat = nam.dat)
## Eigenvalues:
##  COCA 1    COCA 2    COCA 3    COCA 4    COCA 5    COCA 6    COCA 7   
##  0.2534    0.1289    0.0811    0.0741    0.0585    0.0474    0.0373   
##  COCA 8    COCA 9   COCA 10   COCA 11   COCA 12   COCA 13   COCA 14   
##  0.0320    0.0308    0.0233    0.0207    0.0184    0.0172    0.0161   
## COCA 15   COCA 16   COCA 17   COCA 18   COCA 19   COCA 20   COCA 21   
##  0.0144    0.0118    0.0106    0.0100    0.0087    0.0085    0.0066   
## COCA 22   COCA 23   COCA 24   COCA 25   COCA 26   COCA 27   COCA 28   
##  0.0063    0.0050    0.0044    0.0043    0.0034    0.0022    0.0010   
## COCA 29   
##  0.0006   
## Inertia:
##            beetles plants
## Total:     3.98833  5.757
## Explained: 3.97079  5.740
## Residual:  0.01754  0.018

A scrreplot provides a compact summary of the dimensionality of the covariance between the two matrices.


plot of chunk screeplot-symcoca

From the screeplot, we see that most of the signal in the covariance is contained on the first 2-3 axes.

The resulting symmetric cocorrespondence analysis can be plotted in the form of a biplot, except now we have two sets of species (variable) scores and two sets of site (observations or sample) scores. The plot method can be used to draw the Co-CA biplots. The which argument selects which of two asemblages are drawn:

layout(matrix(1:2, ncol = 2))
plot(bp.sym, which = "response", main = "Beetles")
plot(bp.sym, which = "predictor", main = "Plants")

plot of chunk plot-symcoca

Some additional control is afforded by the plot() method, but good plots of the fitted Co-CA will often require the use of lower-level functions such as the points() and scores() methods.