Introduction

This vignette goes through the spatial thinning example presented in “spThin: An R package for spatial thinning of species occurrence records for use in ecological niche models”. Here we demonstrate how spThin can be used to spatially thin species occurence records, we test how many repetitions of the thinning algorithm are necessary to achieve the optimal number of thinned records for a dataset previously thinned “by hand”, and we examine whether there is a notable increase in efficiency if an occurence dataset is thinned as multiple smaller groups of occurrences, rather than a single large set of occurrences.

Load the spThin R package

Here we load the R package from source code. This source code will soon be submitted to CRAN, so that this package can be loaded using standard package management methods

## Install package from source, then load package into workspace
install.packages( type = "source", pkgs = "spThin_0.1.0.tar.gz", repos = NULL )
## Installing package into '/private/var/folders/3m/mc89wxfx6kb1pw7tc_50tmpw0000gn/T/Rtmp7F6luk/Rinst48b228038da7'
## (as 'lib' is unspecified)
## Warning: installation of package 'spThin_0.1.0.tar.gz' had non-zero exit
## status
library( spThin )
## Loading required package: spam
## Loading required package: grid
## Spam version 1.0-1 (2014-09-09) is loaded.
## Type 'help( Spam)' or 'demo( spam)' for a short introduction 
## and overview of this package.
## Help for individual functions is also obtained by adding the
## suffix '.spam' to the function name, e.g. 'help( chol.spam)'.
## 
## Attaching package: 'spam'
## 
## The following objects are masked from 'package:base':
## 
##     backsolve, forwardsolve
## 
## Loading required package: fields
## Loading required package: maps
## Loading required package: knitr

Example dataset

To demonstrate the use of spThin we used a set of 201 verified, georeferenced occurrence records for the Caribbean spiny pocket mouse Heteromys anomalus. These occurrences are from Columbia, Venezuela, and three Caribbean islands: Trinidad, Tobago, and Margarita. This dataset is included as part of the spThin package.

Load H. anomalus dataset

data( Heteromys_anomalus_South_America )
head( Heteromys_anomalus_South_America )
##       SPEC    LAT   LONG   REGION
## 1 anomalus  7.883 -75.20 mainland
## 2 anomalus  8.000 -76.73 mainland
## 3 anomalus 10.617 -75.03 mainland
## 4 anomalus  8.633 -74.07 mainland
## 5 anomalus  9.967 -75.07 mainland
## 6 anomalus 10.217 -73.38 mainland

Here we load and examine the dataset. The name assigned to this dataset is Heteromys_anomalus_South_America. Note that this dataset includes a column indicating which REGION the occurrences was collected. Regions here refer to either the mainland or three islands in which an occurrence was collected. We can see that there are many more occurrences collected for the mainland than for the three islands. Note that Trinidad has been shortened to 'trin' an Margarita has been shortened to 'mar'.

table( Heteromys_anomalus_South_America$REGION )
## 
## mainland      mar   tobago     trin 
##      174        2        4       21

Run spThin::thin on the full dataset

thin involves multiple settings. This allows for extensive flexibility in how the user spatially thins a dataset. However, many have default values. See ?thin for further information.

thinned_dataset_full <-
  thin( loc.data = Heteromys_anomalus_South_America, 
        lat.col = "LAT", long.col = "LONG", 
        spec.col = "SPEC", 
        thin.par = 10, reps = 100, 
        locs.thinned.list.return = TRUE, 
        write.files = TRUE, 
        max.files = 5, 
        out.dir = "hanomalus_thinned_full/", out.base = "hanomalus_thinned", 
        write.log.file = TRUE,
        log.file = "hanomalus_thinned_full_log_file.txt" )
## ********************************************** 
##  Beginning Spatial Thinning.
##  Script Started at: Sun Nov 16 21:16:58 2014
## lat.long.thin.count
## 122 123 124 
##  14  45  41 
## [1] "Maximum number of records after thinning: 124"
## [1] "Number of data.frames with max records: 41"
## [1] "Writing new *.csv files"
## Warning: Created new output directory: hanomalus_thinned_full/
## [1] "Writing file: hanomalus_thinned_full/hanomalus_thinned_thin1.csv"
## [1] "Writing file: hanomalus_thinned_full/hanomalus_thinned_thin2.csv"
## [1] "Writing file: hanomalus_thinned_full/hanomalus_thinned_thin3.csv"
## [1] "Writing file: hanomalus_thinned_full/hanomalus_thinned_thin4.csv"
## [1] "Writing file: hanomalus_thinned_full/hanomalus_thinned_thin5.csv"

In the case above, we found that 10 repetitions were sufficient to return spatially thinned datasets with the optimal number of occurrence records (124). Because this is a random process, it is possible that a similarly repeated run would not return any datasets with the optimal number of occurrence records. To visually assess whether we are using enough reps to approach the optimal number we use the function plotThin, This function produces three plots: 1) the cumulative number of records retained versus the number of repetitions, 2) the log cumulative number of records retained versus the log number of repetitions, and 3) a histogram of the maximum number of records retained for each thinned dataset.

plotThin( thinned_dataset_full )

plot of chunk unnamed-chunk-4 plot of chunk unnamed-chunk-4 plot of chunk unnamed-chunk-4

Looking at the plot of cumulative maximum records retained versus number of repetitions, we see that in this run, this value is constant through out the dataset creation process, indicating that a single repetition would have sufficed to reach 124. This is likely not always the case, but this plot can be examined to assess whether a given number of repetitions is sufficient to achieve a plateau (sensu species accumulation curves in Ecology).

Run spThin::thin on datasets separated by region

Coastal mainland

thinned_dataset_mainland <-
  thin( loc.data = Heteromys_anomalus_South_America[ which( Heteromys_anomalus_South_America$REGION == "mainland" ) , ], 
        lat.col = "LAT", long.col = "LONG", 
        spec.col = "SPEC", 
        thin.par = 10, reps = 100, 
        locs.thinned.list.return = TRUE, 
        write.files = TRUE, 
        max.files = 5, 
        out.dir = "hanomalus_thinned_mainland/", out.base = "hanomalus_thinned", 
        write.log.file = TRUE,
        log.file = "hanomalus_thinned_mainland_log_file.txt" )
## ********************************************** 
##  Beginning Spatial Thinning.
##  Script Started at: Sun Nov 16 21:17:07 2014
## lat.long.thin.count
## 109 110 
##  34  66 
## [1] "Maximum number of records after thinning: 110"
## [1] "Number of data.frames with max records: 66"
## [1] "Writing new *.csv files"
## Warning: Created new output directory: hanomalus_thinned_mainland/
## [1] "Writing file: hanomalus_thinned_mainland/hanomalus_thinned_thin1.csv"
## [1] "Writing file: hanomalus_thinned_mainland/hanomalus_thinned_thin2.csv"
## [1] "Writing file: hanomalus_thinned_mainland/hanomalus_thinned_thin3.csv"
## [1] "Writing file: hanomalus_thinned_mainland/hanomalus_thinned_thin4.csv"
## [1] "Writing file: hanomalus_thinned_mainland/hanomalus_thinned_thin5.csv"

Trinidad

thinned_dataset_trin <-
  thin( loc.data = Heteromys_anomalus_South_America[ which( Heteromys_anomalus_South_America$REGION == "trin" ) , ], 
        lat.col = "LAT", long.col = "LONG", 
        spec.col = "SPEC", 
        thin.par = 10, reps = 10, 
        locs.thinned.list.return = TRUE, 
        write.files = TRUE, 
        max.files = 5, 
        out.dir = "hanomalus_thinned_trin/", out.base = "hanomalus_thinned", 
        write.log.file = TRUE,
        log.file = "hanomalus_thinned_trin_log_file.txt" )
## ********************************************** 
##  Beginning Spatial Thinning.
##  Script Started at: Sun Nov 16 21:17:13 2014
## lat.long.thin.count
## 11 12 
##  2  8 
## [1] "Maximum number of records after thinning: 12"
## [1] "Number of data.frames with max records: 8"
## [1] "Writing new *.csv files"
## Warning: Created new output directory: hanomalus_thinned_trin/
## [1] "Writing file: hanomalus_thinned_trin/hanomalus_thinned_thin1.csv"
## [1] "Writing file: hanomalus_thinned_trin/hanomalus_thinned_thin2.csv"
## [1] "Writing file: hanomalus_thinned_trin/hanomalus_thinned_thin3.csv"
## [1] "Writing file: hanomalus_thinned_trin/hanomalus_thinned_thin4.csv"
## [1] "Writing file: hanomalus_thinned_trin/hanomalus_thinned_thin5.csv"

Margarita

thinned_dataset_mar <-
  thin( loc.data = Heteromys_anomalus_South_America[ which( Heteromys_anomalus_South_America$REGION == "mar" ) , ], 
        lat.col = "LAT", long.col = "LONG", 
        spec.col = "SPEC", 
        thin.par = 10, reps = 10, 
        locs.thinned.list.return = TRUE, 
        write.files = TRUE, 
        max.files = 5, 
        out.dir = "hanomalus_thinned_mar/", out.base = "hanomalus_thinned", 
        write.log.file = TRUE,
        log.file = "hanomalus_thinned_mar_log_file.txt" )
## ********************************************** 
##  Beginning Spatial Thinning.
##  Script Started at: Sun Nov 16 21:17:13 2014
## lat.long.thin.count
##  1 
## 10 
## [1] "Maximum number of records after thinning: 1"
## [1] "Number of data.frames with max records: 10"
## [1] "Writing new *.csv files"
## Warning: Created new output directory: hanomalus_thinned_mar/
## [1] "Writing file: hanomalus_thinned_mar/hanomalus_thinned_thin1.csv"
## [1] "Writing file: hanomalus_thinned_mar/hanomalus_thinned_thin2.csv"
## [1] "Writing file: hanomalus_thinned_mar/hanomalus_thinned_thin3.csv"
## [1] "Writing file: hanomalus_thinned_mar/hanomalus_thinned_thin4.csv"
## [1] "Writing file: hanomalus_thinned_mar/hanomalus_thinned_thin5.csv"

Tobago

thinned_dataset_tobago <-
  thin( loc.data = Heteromys_anomalus_South_America[ which( Heteromys_anomalus_South_America$REGION == "tobago" ) , ], 
        lat.col = "LAT", long.col = "LONG", 
        spec.col = "SPEC", 
        thin.par = 10, reps = 10, 
        locs.thinned.list.return = TRUE, 
        write.files = TRUE, 
        max.files = 5, 
        out.dir = "hanomalus_thinned_tobago/", out.base = "hanomalus_thinned", 
        write.log.file = TRUE,
        log.file = "hanomalus_thinned_tobago_log_file.txt" )
## ********************************************** 
##  Beginning Spatial Thinning.
##  Script Started at: Sun Nov 16 21:17:13 2014
## lat.long.thin.count
##  1 
## 10 
## [1] "Maximum number of records after thinning: 1"
## [1] "Number of data.frames with max records: 10"
## [1] "Writing new *.csv files"
## Warning: Created new output directory: hanomalus_thinned_tobago/
## [1] "Writing file: hanomalus_thinned_tobago/hanomalus_thinned_thin1.csv"
## [1] "Writing file: hanomalus_thinned_tobago/hanomalus_thinned_thin2.csv"
## [1] "Writing file: hanomalus_thinned_tobago/hanomalus_thinned_thin3.csv"
## [1] "Writing file: hanomalus_thinned_tobago/hanomalus_thinned_thin4.csv"
## [1] "Writing file: hanomalus_thinned_tobago/hanomalus_thinned_thin5.csv"