Intro to spind

Sam Levin, Gudrun Carl, Ingolf Kuehn

2018-04-08

spind is a package dedicated to removing the spectre of spatial autocorrelation in your spatial models. It contains many of the tools you need to make predictions, assess model performance, and conduct multimodel inference for 2-D gridded data sets using methods that are robust to spatial autocorrelation.

The theory underlying the use of GEEs, WRMs, and many of the other tools in this package is covered elsewhere in the literature, and for the purposes of this vignette, we assume that you have already read those papers. If you haven’t, citations are included in the footnotes of this vignette as well as in the documentation of each function. We also assume that you have a working knowledge of how to use R. This vignette will focus on demonstrating how to utilize this package to create a robust model from spatially referenced data and subsequently assess its accuracy. Along the way, we will use a couple different data sets to examine how these functions work and investigate how one might use them to create a spatially robust model. This particular demonstration will focus on species distribution models (hereafter referred to as SDMs), but this general framework can be applied to any data set that is spatially structured (e.g. economic, sociological).

Generalized Estimating Equations (GEEs) for species distribution modeling 1

This package utilizes the functions already written for GEEs from the packages gee2 and geepack3 and adapts them for easy use in the context of an SDM. Let’s start with a fairly simple GEE using the simulated musdata data set included in the package. Before we get started though, note that GEE requires that predictor variables be continuous variables.

data(musdata)
data(carlinadata)

# Examine the structure to familiarize yourself with the data
?musdata
head(musdata)

?carlinadata
head(carlinadata)
# Next, fit a simple GEE and view the output
coords <- musdata[ ,4:5]

mgee <- GEE(musculus ~ pollution + exposure, family = "poisson", data = musdata,
            coord = coords, corstr = "fixed", plot = TRUE, scale.fix = FALSE,
            customize_plot = scale_color_manual("Custom Legend", values = c('green', 'black')))
#> Scale for 'colour' is already present. Adding another scale for
#> 'colour', which will replace the existing scale.


summary(mgee, printAutoCorPars = TRUE)
#> 
#>  Call: 
#> GEE(formula = musculus ~ pollution + exposure, family = "poisson", 
#>     data = musdata, coord = coords, corstr = "fixed", plot = TRUE, 
#>     scale.fix = FALSE, customize_plot = scale_color_manual("Custom Legend", 
#>         values = c("green", "black")))
#> --- 
#>  Coefficients: 
#>             Estimate  Std.Err z value  Pr(>|z|)    
#> (Intercept) -1.90475  1.31091 -1.4530 0.1462252    
#> pollution    3.36216  0.91416  3.6779 0.0002352 ***
#> exposure    -1.46348  0.88010 -1.6629 0.0963410 .  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> --- 
#>  QIC:  1139.159 
#> --- 
#> Autocorrelation of GLM residuals 
#>  [1]  0.685338504  0.509680590  0.363021118  0.247398654  0.144726020
#>  [6]  0.084220961  0.050228656  0.022369044 -0.001985639 -0.027296083
#> 
#>  Autocorrelation of GEE residuals 
#>  [1] -0.001277974 -0.004261554  0.045280260  0.022738750  0.005821352
#>  [6]  0.004289166  0.008311357  0.003437398  0.001030847 -0.010359040
#> --- 
#>  Autocorrelation parameters from  fixed  model 
#> [1] "a=alpha^(d^v)  , alpha=0.685 , v=1.093"

predictions <- predict(mgee, newdata = musdata)

As you can see, this package includes methods for summary and predict. These are useful in evaluating model fit and autocorrelation of residuals compared to a non-spatial model (in this case, a GLM with the same family as the GEE). Additionally, the plot argument in GEE can be used to visually inspect the autocorrelation of the residuals from each regression. Version 2.1.0 and up also has built in ggplot2 style graphics which the user can customize using arguments passed to the customize_plot argument of the function. Note that a QIC (Quasi-information criterion) score is reported as opposed to AIC. This is calculated based on the method described in Hardin & Hilbe4,5 and is implemented using the function qic.calc.

Note that trying to fit GEEs with corstr = "fixed" to large data sets (i.e. number of observations is approximately sqrt(.Machine$integer.max)) will produce errors as the resulting variance-covariance matrices will be too large to be handled in R (you may well run into problems before this point depending on how much RAM you have available). This is where fitting clustered models can come in handy, as they work with smaller, more manageable matrices. These can be specified by changing the corstr to either "quadratic" or "exchangeable".

Wavelet Revised Models (WRMs) 6

Next, we’ll examine the other main model that is introduced in this package - the Wavelet Revised Model. These are implemented using wavelet transforms from the waveslim package.7 Let’s start with a fairly simple WRM using the same musdata data set as above. As with GEE, WRM also requires that predictor variables are continuous.



mwrm <- WRM(musculus ~ pollution + exposure, family = "poisson",
            data = musdata, coord = coords, level = 1, plot = TRUE)


summary(mwrm)
#> 
#>  Call: 
#> WRM(formula = musculus ~ pollution + exposure, family = "poisson", 
#>     data = musdata, coord = coords, level = 1, plot = TRUE)
#> 
#>  Pearson Residuals: 
#>       Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
#> -1.6135878 -0.3057118  0.0047507 -0.0003873  0.3038946  3.0620363 
#> --- 
#>  Coefficients: 
#>             Estimate Std.Err z value Pr(>|z|)   
#> (Intercept)  -1.9360  1.9177 -1.0095 0.312717   
#> pollution     3.1841  1.2251  2.5991 0.009348 **
#> exposure     -1.2286  1.5063 -0.8156 0.414723   
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> --- 
#>  Number of observations n:  400 ,  n.eff:  300 ,  AIC:  1110.845 
#> 
#>  Number of iterations:  7 
#> --- 
#> Autocorrelation of glm.residuals 
#>  [1]  0.685338504  0.509680590  0.363021118  0.247398654  0.144726020
#>  [6]  0.084220961  0.050228656  0.022369044 -0.001985639 -0.027296083
#> Autocorrelation of wavelet.residuals 
#>  [1]  0.024855393 -0.086311686  0.007820356  0.024501828 -0.016578686
#>  [6]  0.002798656 -0.002977017 -0.004611334  0.018150352 -0.008727321

predictions <- predict(mwrm, newdata = musdata)

WRM has many of the same features as GEE. Setting plot = TRUE allows you to visually examine the autocorrelation of residuals from a GLM of the same error family as your WRM. customize_plot works the same as in GEE - simply pass additional arguments to ggplot to modify the figure to meet your needs. Methods for predict and summary allow you to examine outputs from the model using the same code as you might use for a GLM. However, note that this reports an AIC score, rather than a QIC score as in the GEE.

Other features specific to WRMs

WRM has a number of other model-specific functions that you may find useful in diagnosing model fit and understanding your results. For example, you might want to plot the variance and/or covariance of each of your variables as a function of level. The covar.plot function allows you to visually examine the wavelet relationships from your model. You can select whether to plot the variance or covariance using plot argument which accepts "var" or "covar" as inputs. Note that we are going to switch to the carlinadata data set now.


coords <- carlinadata[ ,4:5]

covar.plot(carlina.horrida ~ aridity + land.use - 1,
           data = carlinadata, coord = coords, wavelet = "d4",
           wtrafo = 'modwt', plot = 'covar')

#> $result
#>                            [,1]   [,2]   [,3]   [,4]   [,5]
#> carlina.horrida-aridity  0.0368 0.0450 0.0623 0.0780 0.0466
#> carlina.horrida-land.use 0.4782 0.1191 0.0332 0.0126 0.0055

covar.plot(carlina.horrida ~ aridity + land.use - 1,
           data = carlinadata, coord = coords, wavelet = "d4",
           wtrafo = 'modwt', plot = 'var')

#> $result
#>                   [,1]   [,2]   [,3]   [,4]   [,5]
#> carlina.horrida 0.7235 0.1792 0.0628 0.0242 0.0093
#> aridity         0.0691 0.1025 0.2028 0.3588 0.2657
#> land.use        0.7556 0.1851 0.0420 0.0119 0.0044

You may also want to view the smooth components of your wavelets at different scales. For this, we offer the upscale function. upscale allows you to visually examine your data at a number of different levels of scale which controls the resolution of the grid cells in your data set.8 It also offers the option to adjust padding settings so you can see how that influences your smooth components. The default padding level is the mean value of your input vector, but it can be easily switched using the pad argument, which works the same way as in the other WRM functions. Finally, you can optionally produce gray scale or colorized maps using the color.maps argument. A quick example below using carlinadata data set.


upscale(carlinadata$land.use, coord = coords,
        pad = mean(carlinadata$land.use), color.maps = FALSE)

Multi-model inference with GEEs and WRMs

spind provides a couple of frameworks for conducting multi-model inference analyses and some helper functions to assist you when examining the results. The first that we’ll examine here is the step.spind function, which implements step-wise model selection. The process is loosely based on MASS::stepAIC and stats::step, but is specific to classes GEE and WRM. For GEEs, step.spind uses models with the lowest QIC scores to determine what the next step will be. For WRMs, you have the option of using AIC or AICc (AIC corrected for small sample sizes) using the logical AICc argument.

Currently, the function only supports backwards model selection. In other words, you have to start with all of the variables in your model formula and remove them in a stepwise fashion. We hope to add forward model selection methods shortly. Additionally, step.spind is written to always respect the hierarchy of variables in the model and currently the user cannot directly override this. For example, step.spind would not remove race while retaining I(race^2). We may change that in the future, but it will remain like this at least until the next major release. Currently, it recognizes polynomial variables by matching variable names located inside of I(var^some_power) and interaction terms by searching for var1:var2 in the model terms. If you want to use a higher order polynomial variable and are not worried about the variable hierarchy, you can create a separate variable (i.e. race_2) and use that in the model.

We’ll go through an example of step.spind using a GEE on the birthwt data set in the MASS package below. The data in birthwt aren’t at all related to SDMs and are not spatially structured, but we hope that in using this data set, we will demonstrate how this function can work with many types of data sets.


# For demonstration only. We are artificially imposing a grid structure
# on data that is not actually spatial data

library(MASS)
#> Warning: package 'MASS' was built under R version 3.4.4
data(birthwt)


x <- rep(1:14, 14)
y <- as.integer(gl(14, 14))
coords <- cbind(x[-(190:196)], y[-(190:196)])

formula <- formula(low ~ age + lwt + race + smoke + ftv +  bwt + I(race^2))

mgee <- GEE(formula, family = "gaussian", data = birthwt,
            coord = coords, corstr = "fixed",scale.fix = TRUE)

mwrm <- WRM(formula, family = "gaussian", data = birthwt,
          coord = coords, level = 1)

ssgee <- step.spind(mgee, birthwt)
#> Iteration:  1 
#>  Single term deletions
#>  Deleted Term:  age 
#>  -------------------- 
#>   Deleted.Vars  Quasi.Lik      QIC
#> 1       <none>  -52.68206 112.4177
#> 2          age  -52.67027 111.9314
#> 3          lwt  -52.74725 112.3100
#> 4         race  -52.68782 112.1267
#> 5        smoke  -52.68652 112.1349
#> 6          ftv  -52.70359 112.1632
#> 7          bwt -121.62394 299.1973
#> 8    I(race^2)  -52.69176 112.1329
#> 
#> Iteration:  2 
#>  Single term deletions
#>  Deleted Term:  race 
#>  -------------------- 
#>   Deleted.Vars  Quasi.Lik      QIC
#> 1       <none>  -52.67027 111.9314
#> 2          lwt  -52.72965 111.8146
#> 3         race  -52.67038 111.6474
#> 4        smoke  -52.67100 111.6566
#> 5          ftv  -52.72503 111.7364
#> 6          bwt -121.56174 299.0924
#> 7    I(race^2)  -52.67607 111.6567
#> 
#> -----
#> Model hierarchy violated by last removal
#>  New Deleted Term:  smoke 
#> Previously deleted term added back into model
#> -----
#> Iteration:  3 
#>  Single term deletions
#>  Deleted Term:  race 
#>  -------------------- 
#>   Deleted.Vars  Quasi.Lik      QIC
#> 1       <none>  -52.67100 111.6566
#> 2          lwt  -52.72809 111.5311
#> 3         race  -52.67366 111.3802
#> 4          ftv  -52.72790 111.4711
#> 5          bwt -123.16568 300.0954
#> 6    I(race^2)  -52.67890 111.3877
#> 
#> -----
#> Model hierarchy violated by last removal
#>  New Deleted Term:  I(race^2) 
#> Previously deleted term added back into model
#> -----
#> Iteration:  4 
#>  Single term deletions
#>  Deleted Term:  ftv 
#>  -------------------- 
#>   Deleted.Vars  Quasi.Lik      QIC
#> 1       <none>  -52.67890 111.3877
#> 2          lwt  -52.72805 111.2494
#> 3         race  -52.76311 111.3122
#> 4          ftv  -52.73630 111.2088
#> 5          bwt -123.33017 298.6147
#> 
#> Iteration:  5 
#>  Single term deletions
#>  Deleted Term:  lwt 
#>  -------------------- 
#>   Deleted.Vars  Quasi.Lik      QIC
#> 1       <none>  -52.73630 111.2088
#> 2          lwt  -52.78793 111.0717
#> 3         race  -52.82335 111.1415
#> 4          bwt -123.35038 298.6351
#> 
#> Iteration:  6 
#>  Single term deletions
#>  Deleted Term:  race 
#>  -------------------- 
#>   Deleted.Vars  Quasi.Lik      QIC
#> 1       <none>  -52.78793 111.0717
#> 2         race  -52.86072 110.9656
#> 3          bwt -123.12477 295.8817
#> 
#> Iteration:  7 
#>  Single term deletions
#>  Deleted Term:  <none> 
#>  -------------------- 
#>   Deleted.Vars  Quasi.Lik      QIC
#> 1       <none>  -52.86072 110.9656
#> 2          bwt -123.98743 296.2879
#> 
#> 
#> ---------------
#> Best model found:
#> low ~ bwt
sswrm <- step.spind(mwrm, birthwt, AICc = TRUE)
#> Iteration:  1 
#>  Single term deletions
#>  Deleted Term:  race 
#>  -------------------- 
#>   Deleted.Vars    LogLik       AIC      AICc
#> 1       <none> -36.27411  90.54822  91.55381
#> 2          age -36.27965  88.55930  89.35930
#> 3          lwt -36.30071  88.60143  89.40143
#> 4         race -36.03376  88.06752  88.86752
#> 5        smoke -36.13543  88.27087  89.07087
#> 6          ftv -36.25766  88.51532  89.31532
#> 7          bwt -88.97789 193.95579 194.75579
#> 8    I(race^2) -36.03879  88.07758  88.87758
#> 
#> -----
#> Model hierarchy violated by last removal
#>  New Deleted Term:  I(race^2) 
#> Previously deleted term added back into model
#> -----
#> Iteration:  2 
#>  Single term deletions
#>  Deleted Term:  smoke 
#>  -------------------- 
#>   Deleted.Vars    LogLik       AIC      AICc
#> 1       <none> -36.03879  88.07758  88.87758
#> 2          age -36.04158  86.08317  86.70195
#> 3          lwt -36.09072  86.18144  86.80022
#> 4         race -36.11776  86.23552  86.85430
#> 5        smoke -35.90087  85.80174  86.42052
#> 6          ftv -36.02003  86.04007  86.65885
#> 7          bwt -88.71971 191.43941 192.05820
#> 
#> Iteration:  3 
#>  Single term deletions
#>  Deleted Term:  ftv 
#>  -------------------- 
#>   Deleted.Vars    LogLik       AIC      AICc
#> 1       <none> -35.90087  85.80174  86.42052
#> 2          age -35.90797  83.81593  84.27747
#> 3          lwt -35.95683  83.91366  84.37520
#> 4         race -35.97398  83.94797  84.40950
#> 5          ftv -35.88997  83.77995  84.24149
#> 6          bwt -90.79731 193.59462 194.05616
#> 
#> Iteration:  4 
#>  Single term deletions
#>  Deleted Term:  age 
#>  -------------------- 
#>   Deleted.Vars    LogLik       AIC      AICc
#> 1       <none> -35.88997  83.77995  84.24149
#> 2          age -35.89084  81.78169  82.10956
#> 3          lwt -35.94335  81.88670  82.21457
#> 4         race -35.96301  81.92602  82.25389
#> 5          bwt -90.81649 191.63298 191.96085
#> 
#> Iteration:  5 
#>  Single term deletions
#>  Deleted Term:  lwt 
#>  -------------------- 
#>   Deleted.Vars    LogLik       AIC      AICc
#> 1       <none> -35.89084  81.78169  82.10956
#> 2          lwt -35.94877  79.89754  80.11494
#> 3         race -35.95486  79.90972  80.12711
#> 4          bwt -91.14883 190.29766 190.51505
#> 
#> Iteration:  6 
#>  Single term deletions
#>  Deleted Term:  race 
#>  -------------------- 
#>   Deleted.Vars    LogLik       AIC      AICc
#> 1       <none> -35.94877  79.89754  80.11494
#> 2         race -36.00167  78.00334  78.13307
#> 3          bwt -91.88479 189.76958 189.89931
#> 
#> Iteration:  7 
#>  Single term deletions
#>  Deleted Term:  <none> 
#>  -------------------- 
#>   Deleted.Vars    LogLik       AIC      AICc
#> 1       <none> -36.00167  78.00334  78.13307
#> 2          bwt -92.26134 188.52268 188.58719
#> 
#> 
#> ---------------
#> Best model found:
#> low ~ bwt

best.mgee <- GEE(ssgee$model, family = "gaussian", data = birthwt,
                 coord = coords, corstr = "fixed",scale.fix = TRUE)

best.wrm <- WRM(sswrm$model, family = "gaussian", data = birthwt,
                coord = coords, level = 1)

summary(best.mgee, printAutoCorPars = FALSE)
#> 
#>  Call: 
#> GEE(formula = ssgee$model, family = "gaussian", data = birthwt, 
#>     coord = coords, corstr = "fixed", scale.fix = TRUE)
#> --- 
#>  Coefficients: 
#>                Estimate     Std.Err t value Pr(>|t|)    
#> (Intercept)  1.2492e+00  4.9121e-01  2.5430  0.01099 *  
#> bwt         -3.0919e-04  6.5913e-05 -4.6909 2.72e-06 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> --- 
#>  QIC:  110.9656 
#> --- 
#> Autocorrelation of GLM residuals 
#>  [1]  0.837748633  0.724407532  0.602588671  0.500754270  0.387294592
#>  [6]  0.275433941  0.147728669  0.008716423 -0.130798183 -0.268641655
#> 
#>  Autocorrelation of GEE residuals 
#>  [1]  0.43453709  0.35186795  0.27457621  0.21231229  0.10255028
#>  [6]  0.08028419  0.07174312  0.04070057  0.02919975 -0.06904364
summary(best.wrm)
#> 
#>  Call: 
#> WRM(formula = sswrm$model, family = "gaussian", data = birthwt, 
#>     coord = coords, level = 1)
#> 
#>  Pearson Residuals: 
#>     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
#> -0.32375 -0.03533 -0.01311  0.00000  0.03328  0.42378 
#> --- 
#>  Coefficients: 
#>                Estimate     Std.Err     t value  Pr(>|t|)    
#> (Intercept)  1.2809e+00  2.3089e-09  5.5477e+08 < 2.2e-16 ***
#> bwt         -3.4305e-04  7.4898e-06 -4.5802e+01 < 2.2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> --- 
#>  Number of observations n:  189 ,  n.eff:  142 ,  AIC:  78.00334 
#> 
#>  Number of iterations:  2 
#> --- 
#> Autocorrelation of glm.residuals 
#>  [1]  0.837748633  0.724407532  0.602588671  0.500754270  0.387294592
#>  [6]  0.275433941  0.147728669  0.008716423 -0.130798183 -0.268641655
#> Autocorrelation of wavelet.residuals 
#>  [1] 0.427865614 0.201246818 0.139896894 0.103437367 0.046485859
#>  [6] 0.045779123 0.025277466 0.034175107 0.005430915 0.055360212

Additionally, we offer multimodel inference tools for GEEs and WRMs which are loosely based on the MuMIn package. These are implemented in mmiWMRR and mmiGEE. They enable you to examine the effect that the grid resolution and variable selection have on the resulting regressions, and then select the appropriate model for subsequent analyses. Note that mmiWMRR has two more arguments than mmiGEE that must be specified.


# Example for WRMs
data(carlinadata)
coords <- carlinadata[ ,4:5]

wrm <- WRM(carlina.horrida ~ aridity + land.use, family = "poisson",
           data = carlinadata, coord = coords, level = 1, wavelet = "d4")

ms1 <- scaleWMRR(carlina.horrida ~ aridity + land.use, family = "poisson",
                 data = carlinadata, coord = coords, scale = 1,
                 wavelet = 'd4', trace = FALSE)

mmi <- mmiWMRR(wrm, data = carlinadata, scale = 1, detail = TRUE)

# Example for GEEs
library(MASS)
data(birthwt)

# impose an artificial (not fully appropriate) grid structure
x <- rep(1:14, 14)
y <- as.integer(gl(14, 14))
coords <- cbind(x[-(190:196)], y[-(190:196)])

formula <- formula(low ~ race + smoke +  bwt)

mgee <- GEE(formula, family = "gaussian", data = birthwt,
            coord = coords, corstr = "fixed", scale.fix = TRUE)

mmi <- mmiGEE(mgee, birthwt)

Finally, we offer one further model selection procedure specific to WRMs. rvi.plot uses mmiWMRR and creates a plot of the relative importance of each explanatory variable as a function of the resolution of the grid (in other words, as a function of the scale argument in mmiWMRR). It will also print the resulting model selection tables to the console.

data(carlinadata)
coords <- carlinadata[ ,4:5]

rvi.plot(carlina.horrida ~ aridity + land.use, family = "poisson",
         data = carlinadata, coord = coords, maxlevel = 4, 
         detail = TRUE, wavelet = "d4")

Goodness of fit and model performance

You may also find that a model not implemented by this package works best for your data. We’ve implemented some spatially corrected accuracy measures that you can use to assess goodness of model fit. The first two of these are categorized according to whether or not their outputs are dependent on the chosen threshold and first appeared in spind v1.0 9. th.dep (threshold dependent) and th.indep (threshold independent) are designed to work on any number of model types, all you need is a set of actual values, predictions, and their associated coordinates. We’ll use the hook data set to see how these work.

data(hook)

# Familiarize yourself with the data
?hook
head(hook)

df <- hook[ ,1:2]
coords <- hook[ ,3:4]

# Threshold dependent metrics
th.dep.indices <- th.dep(data = df, coord = coords, spatial = TRUE)

# Confusion Matrix
th.dep.indices$cm
#>      [,1] [,2] [,3] [,4]
#> [1,]    5    2    0    0
#> [2,]    3    1    1    3
#> [3,]    2    0    0    8
#> [4,]    2    3    0   70

# Kappa statistic
th.dep.indices$kappa
#> [1] 0.628529

# Threshold independent metrics
th.indep.indices <- th.indep(data = df, coord = coords, 
                             spatial = TRUE, plot.ROC = TRUE)


# AUC
th.indep.indices$AUC
#> [1] 0.9424119

# TSS
th.indep.indices$TSS
#> [1] 0.7425474

Finally, many analyses require a calculation of spatial autocorrelation. To that end, we include the function acfft (AutoCorrelation Fast Fourier Transform) to calculate spatial autocorrelation using Moran’s I statistic. While many other packages include functions to perform this analysis, ours provides improved efficiency by harnessing the power of fast Fourier transforms which reduce the time needed to compute the statistic. A quick example below using a GLM and the musdata data set.


coords <- musdata[ ,4:5]
mglm <- glm(musculus ~ pollution + exposure, family = "poisson",
            data = musdata)

ac <- acfft(coords, resid(mglm, type = "pearson"),
            lim1 = 0, lim2 = 1, dmax = 10)
ac
#>  [1]  0.685338504  0.509680590  0.363021118  0.247398654  0.144726020
#>  [6]  0.084220961  0.050228656  0.022369044 -0.001985639 -0.027296083

Note that you can adjust the number of distance bins to examine in acfft using the dmax argument. The default is 10. Finally, you can adjust the size of the distance bins by adjusting the values of lim1 and lim2 to meet your needs.

Wrapping up

Hopefully, you are now ready to utilize GEEs and WRMs to conquer the world of spatial modeling. However, if this vignette has not served its purpose and you still have questions about how to use these tools, please let us know. Of course, no package is complete without bugs and we are always trying to improve our code. If you find any bugs that need squashing, have suggestions for additional functionality or improvements to existing functionality (or this vignette), please don’t hesitate to contact us10.


  1. Carl G & Kuehn I, 2007. Analyzing Spatial Autocorrelation in Species Distributions using Gaussian and Logit Models, Ecol. Model. 207, 159 - 170]

  2. Carey, V. J., 2006. Ported to R by Thomas Lumley (versions 3.13, 4.4, version 4.13)., B. R. gee: Generalized Estimation Equation solver. R package version 4.13-11.

  3. Yan, J., 2004. geepack: Generalized Estimating Equation Package. R package version 0.2.10.

  4. Hardin, J.W. & Hilbe, J.M. (2003) Generalized Estimating Equations. Chapman and Hall, New York.

  5. Barnett et al. Methods in Ecology & Evolution 2010, 1, 15-24.

  6. Carl, G., Kuehn, I. (2010): A wavelet-based extension of generalized linear models to remove the effect of spatial autocorrelation. Geographical Analysis 42 (3), 323 - 337

  7. Whitcher, B. (2005) Waveslim: basic wavelet routines for one-, two- and three-dimensional signal processing. R package version 1.5.

  8. Carl G, Doktor D, Schweiger O, Kuehn I (2016) Assessing relative variable importance across different spatial scales: a two-dimensional wavelet analysis. Journal of Biogeography 43: 2502-2512.

  9. Carl G, Kuehn I (2017) Spind: a package for computing spatially corrected accuracy measures. Ecography 40: 675-682. doi: 10.1111/ecog.02593

  10. Contact email - levisc8@gmail.com or visit the Github repo and create an issue at http://github.com/levisc8/spind/issues.