The package ‘SSDM’ is a computer platform implemented in R providing a range of methodological approaches and parameterization at each step of the SSDM building. This vignette presents a typical workflow in R command to use it. An additionnal vignette presents the same workflow using the graphic user interface with gui function (see GUI vignette).

The workflow of the package ‘SSDM’ is based on three modelling levels:

  1. the individual Species Distribution Model (SDM) fitting the occurrences of a single species on environmental predictor variables with a single modelling algorithm,
  2. the ensemble SDM (ESDM) combining the outputs of several SDMs, each SDM using a different modelling algorithm,
  3. the stack SDM combining several SDM or ESDM outputs to model species assemblages and compute species diversity and species richness (Fig. 1).
Figure 1. Flow chart of the package ‘SSDM’

Figure 1. Flow chart of the package ‘SSDM’

Data inputs

Environmental variables

In addition to build species distribution models you will need environmental variables. Currently ‘SSDM’ uses all raster formats supported by the R package ‘rgdal’. The package ‘SSDM’ supports both continuous (e.g., climate maps, digital elevation models, bathymetric maps) and categorical environmental variables (e.g., land cover maps, soil type maps) as inputs. The package also allows normalizing environmental variables, which may be useful to improve the fit of certain algorithms (like artificial neural networks).

Rasters of environmental data need to have the same coordinate reference system while spatial extent and resolution of the environmental layers can differ. During processing, the package will deal with between-variables discrepancies in spatial extent and resolution by rescaling all environmental rasters to the smallest common spatial extent then upscaling them to the coarsest resolution.

‘SSDM’ include load_var function to read raster files including your environmental variables. We will work with three 30 arcsec-resolution rasters covering the north part of the main island of New Caledonia ’Grande Terre’. Climatic variables (RAINFALL and TEMPERATURE) are from the WorldClim database, and the SUBSTRATE map is from the IRD Atlas of New Caledonia (2012) (see ?Env).

## Registered S3 method overwritten by 'R.oo':
##   method        from       
##   throw.default R.methodsS3
## Welcome to the SSDM package, you can launch the graphical user interface by typing gui() in the console.
Env <- load_var(system.file('extdata',  package = 'SSDM'), categorical = 'SUBSTRATE', verbose = FALSE)
## class      : RasterStack 
## dimensions : 120, 120, 14400, 3  (nrow, ncol, ncell, nlayers)
## resolution : 0.008333333, 0.008333333  (x, y)
## extent     : 164, 165, -21, -20  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## min values : 0.4593978, 0.0000000,   0.6610169 
## max values :         1,         2,           1

Note that:

  • Specify which environmental variable is categorical with categorical parameter.
  • Normalization is activated by default, check Norm option.

Natural history records

Species distribution models are built on natural history records.

‘SSDM’ include load_occ function to read raw csv or txt files including your natural history records. We will work with natural history records from five Cryptocarya species native to New Caledonia (see ?Occurrences).

Occ <- load_occ(path = system.file('extdata',  package = 'SSDM'), Env,
         Xcol = 'LONGITUDE', Ycol = 'LATITUDE',
         file = 'Occurrences.csv', sep = ',', verbose = FALSE)
## 1  elliptica  164.1833 -20.28333
## 2  elliptica  164.1833 -20.46666
## 6  elliptica  164.7333 -20.59999
## 8  elliptica  164.7666 -20.74999
## 9  elliptica  164.7833 -20.61666
## 10 elliptica  164.7833 -20.63333

Note that:

  • Occurences are checked after environmental data are loaded, therefore environmental data need to be loaded before occurrences.
  • Use GeoRes option to thin occurences. Thinning removes unnecessary records, reducing the effect of sampling bias while retaining the greatest amount of information.
  • For issues opening file look at additional options of read.csv function used to open you raw data.*

Model algorithms

Individual species distribution models (SDMs)

In the example below we build an elliptica distribution model with a subset the occurrences of the species and specifying a single algorithm, here generalized linear models. The package ‘SSDM’ includes the main algorithms used to model species distributions: general additive models (GAM), generalized linear models (GLM), multivariate adaptive regression splines (MARS), classification tree analysis (CTA), generalized boosted models (GBM), maximum entropy (Maxent), artificial neural networks (ANN), random forests (RF), and support vector machines (SVM). Default parameters of the dependent R package of each algorithm were conserved but most of them remain settable.

SDM <- modelling('GLM', subset(Occurrences, Occurrences$SPECIES == 'elliptica'), 
                 Env, Xcol = 'LONGITUDE', Ycol = 'LATITUDE', verbose = FALSE)
plot(SDM@projection, main = 'SDM\nfor Cryptocarya elliptica\nwith GLM algorithm')