02 Building modules in SpaDES

Alex M. Chubaty

January 26 2018

1 Introduction

1.1 Module overview

Recall that SpaDES simulations are event-driven, meaning that different actions are performed on data objects based on the order of scheduled events. The central design of SpaDES promotes modularity, such that collections of related simulation actions can be grouped together as ‘modules’ and easily reused among multiple simulations. Strict modularity requires that modules can act independently, without needing to know about other modules. Thus each SpaDES module must explicitly state its input dependencies (data, package, and parameterization requirements), data outputs, as well as provide other useful metadata and documentation for the user. Upon initialization of a simulation via simInit, the dependencies of every module used are examined and evaluated. If dependency incompatibilities exists, the initialization fails and the user is notified. Also during this initialization, module code is source-ed into the simulation environment, making all module objects and functions available during the simulation.

Each SpaDES module describes the processes or activities that drive simulation state changes via changes to objects stored in the simulation environment. Each activity consists of a collection of events which are scheduled depending on the rules of the simulation. Each event may evaluate or modify a simulation data object (e.g., update the values on a raster map), or perform other operations such as saving and loading data objects, plotting, or scheduling other events.

1.1.1 Simulation event list

The event queue is stored in a slot in a simList simulation object. Each event is represented by a data.table row consisting of the time the event is to occur (eventTime), the name of the module from which the event is taken (moduleName), and a character string for the programmer-defined event type (eventType). This list is kept sorted by eventTime, and events are processed in sequence beginning at the top of the list. Completed events are removed from the queue.

1.1.2 Module events

When a call to the event list is made, the event is processed by the module specified by moduleName. The module code then determines the event type and executes the code for that event. For each event type within a module: 1) the instructions for what happens for this event get executed; and 2) there is an optional call to scheduleEvent, which schedules a future event. A module can schedule other event types from within the same module, and should not call other modules because this introduces module dependencies, which breaks the drop-in/replace modularity of your simulation model.

1.1.3 Module event dependencies

Each module schedules its own events (e.g., a ‘fire’ module may schedule ‘burn’ events) and only uses its own data objects (or shared simulation objects). Modules that behave in this way are independent of one another, which is the preferred way to design and implement modules. Maintaining strict modularity allows the removal, addition, and replacement of modules without having to rewrite your code.

Module event dependencies complicate the construction of simulation models, and hinder the ability to develop and deploy models with modularity. If two modules are actually dependent on each others’ events, then you should consider whether they really are separate modules or should be merged into a single module.

2 Module structure

2.1 Module directory structure (moduleName/)

A module consists of a collection of files and folders that …

  |_ moduleName/
      |_ R/                     # contains additional module R scripts
      |_ data/                  # directory for all included data
          |_ CHECKSUMS.txt      # contains checksums for data files
      |_ tests/                 # contains unit tests for module code
      |_ citation.bib           # bibtex citation for the module
      |_ LICENSE.txt            # describes module's legal usage
      |_ README.txt             # provide overview of key aspects
      |_ moduleName.R           # module code file (incl. metadata)
      |_ moduleName.Rmd         # documentation, usage info, etc.
      |_ moduleName_x.y.z.zip   # zip archive of previous versions

2.2 Module code file (moduleName.R)

A SpaDES module consists of a single .R source file, whose name matches the name of the module. This file consists of three parts containing the code for:

  1. the metadata describing the module and its data dependencies;
  2. defining the event types described in the module;
  3. functions describing what happens during the processing of each event type.

2.2.1 Metadata

In order to interact correctly with one another in a simulation, SpaDES modules are designed to be aware of their own dependencies and to share this information with the simulation (and to the user). During simulation initialization, the .R file corresponding to each module used is parsed and the modules’ metadata stored in the simList object inside the envir. As part of this initialization step, the dependencies of each module are extracted from the metadata and are checked against the other modules used in the simulation to ensure that all dependencies can be resolved. Defining module metadata: defineModule()

Every module requires complete metadata, structured as a named list, and passed as an argument to the defineModule function.

Element name Description
name Name of the module as a character string.
description Description of the module as a character string.
keywords Character vector containing a module’s keywords.
childModules Character vector containing the names of the child modules that are part of this module.
authors The author(s) of the module as a person object.
version The module version as a character, numeric, or numeric_version. Semantic versioning is assumed.
spatialExtent Specifies the module’s spatial extent as an Extent object.
timeframe Specifies the valid timeframe for which the module was designed to simulate.
  Must be a POSIXt object of length 2, specifying the start and end times.
timeunit Describes the unit of time corresponding to 1.0 simulation time units.
citation A list of citations for the module, as a character strings.
  Alternatively, the name of a or similar file.
documentation List of filenames referring to module documentation sources.
reqdPkgs Character vector of R package names to be loaded.
parameters A data.frame constructed using rbind with defineParameter, specifying module
  parameters, with columns paramName, paramClass, default, min, max, and paramDesc.
  Default values may be overridden by the user by passing a list of parameters to simInit.
inputObjects A data.frame constructed using bind_rows with expectsInput, specifying
  the object dependencies of the module, with columns objectName, objectClass, desc, sourceURL and other specifications.
  For objects that are used within the module as both an input and an output,
  add an input object by using expectsInput.
outputObjects A data.frame constructed using bind_rows with createsOutput, specifying
  the objects output by the module, with columns objectName, objectClass, desc and other specifications.
  Add an output object by using createsOutput.
## sample module metadata for the default `randomLandscapes` module
## NOTE: long lines have been truncated
defineModule(sim, list(
  name = "randomLandscapes",
  description = "Generate RasterStack of random maps representative of a forest landsc...",
  keywords = c("random map", "random landscape"),
  authors = c(person(c("Alex", "M"), "Chubaty",
                     email = "alexander.chubaty@canada.ca",
                     role = c("aut", "cre")),
              person(c("Eliot", "J", "B"), "McIntire",
                     email = "eliot.mcintire@canada.ca",
                     role = c("aut", "cre"))),
  version = numeric_version("0.2.0"),
  spatialExtent = raster::extent(rep(NA_real_, 4)),
  timeframe = as.POSIXlt(c(NA, NA)),
  timeunit = NA_real_,
  citation = list(),
  reqdPkgs = list("raster", "RColorBrewer", "SpaDES.tools"),
  parameters = rbind(
    defineParameter("stackName", "character", "randomLandscape", NA, NA, "..."),
    defineParameter("nx", "numeric", 100L, NA, NA, "size of map (number ..."),
    defineParameter("ny", "numeric", 100L, NA, NA, "size of map (number ..."),
    defineParameter("inRAM", "logical", FALSE, NA, NA, "should the raster ..."),
    defineParameter(".plotInitialTime", "numeric", 0, NA, NA, "time to ..."),
    defineParameter(".plotInterval", "numeric", 1, NA, NA, "time interval ..."),
    defineParameter(".saveInitialTime", "numeric", NA_real_, NA, NA, "time ..."),
    defineParameter(".saveInterval", "numeric", NA_real_, NA, NA, "time ...")
  inputObjects = bind_rows(
    expectsInput(objectName = NA_character_, objectClass = NA_character_,
                 desc = NA_character_, sourceURL = NA_character_, other = NA_character_)
  outputObjects = bind_rows(
    createsOutput(objectName = globals(sim)$stackName, objectClass = "RasterStack",
                  desc = NA_character_, other = NA_character_)
)) Defining module parameters: defineParameter()

Parameters here differ from input data objects in that the former are intended to be variable across simulation runs, whereas the latter remain constant. Parameters are often module-specific, where they are only used within the module they are defined, although it may be useful to globally define some parameters that are intended to be used by multiple modules. Module-specific parameters are specified using defineParameter (with rbind) within defineModule to build a data.frame of input parameters. Global parameters are defined at the simulation level as part of the simInit call.

The parameter list in the simList object (accessed via params) may be used to pass named parameter values to modules. The general structure of this parameter list is parameters$moduleName$moduleParameter. This nested list structure allows passing as many parameters as needed for your simulation. We suggest passing a list of all the parameters needed for a single module together.

A module’s metadata defines default values for module-specific parameters, and these defaults will be used unless overridden by the user. Default parameter values can overridden by passing values in the parameter list to simInit.


outputDir <- file.path(tempdir(), "simOutputs")
times <- list(start = 0.0, end = 20.0)
parameters <- list(
  .globals = list(stackName = "landscape", burnStats = "nPixelsBurned"),
  .progress = list(NA),
  randomLandscapes = list(nx = 100L, ny = 100L, inRAM = TRUE),
  fireSpread = list(
    nFires = 10L, spreadprob = 0.225, its = 1e6, persistprob = 0,
    returnInterval = 10, startTime = 0,
    .plotInitialTime = 0, .plotInterval = 10
  caribouMovement = list(
    N = 100L, moveInterval = 1, torus = TRUE,
    .plotInitialTime = 1, .plotInterval = 1
modules <- list("randomLandscapes", "fireSpread", "caribouMovement")
objects <- list()
paths <- list(modulePath = system.file("sampleModules", package = "SpaDES.core"),
              outputPath = outputDir)

mySim <- simInit(times = times, params = parameters, modules = modules,
                 objects = objects, paths = paths)

2.2.2 Event types

Each module may contain an arbitrary number of event types. Each of these event types are defined within the doEvent.moduleName call, and are wrapped in a simple if/else stanza that matches the called event type (NOTE: when several event types are defined, switch/case can faster than if/else). To keep the doEvent.moduleName code block as clear and readable as possible, keep the definitions of each event type minimal, using functions (defined outside of the block) for the details of what is happening for each event.

## sample event type definitions from the default `randomLandscapes` module
doEvent.randomLandscapes <- function(sim, eventTime, eventType, debug = FALSE) {
  if (eventType == "init") {
    # do stuff for this event
    sim <- randomLandscapesInit(sim)

    # schedule the next events
    sim <- scheduleEvent(sim, params(sim)$randomLandscapes$.plotInitialTime,
                         "randomLandscapes", "plot")
    sim <- scheduleEvent(sim, params(sim)$randomLandscapes$.saveInitialTime,
                         "randomLandscapes", "save")

  } else if (eventType=="plot") {
    # do stuff for this event

    # schedule the next event
    sim <- scheduleEvent(sim, time(sim) +
                         "randomLandscapes", "plot")
  } else if (eventType == "save") {
    # do stuff for this event

    # schedule the next event
    sim <- scheduleEvent(sim, time(sim) +
                         "randomLandscapes", "save")

  } else {
    warning(paste("Undefined event type: \'",
                  events(sim)[1, "eventType", with = FALSE],
                  "\' in module \'",
                  events(sim)[1, "moduleName", with = FALSE],
                  "\'", sep = ""))

2.2.3 Event functions

Event functions should be defined below the doEvent.moduleName code block and follow the naming convention modulenameEventtype(). Keep these function definitions as short and clean as possible (you can further modularize your functions by calling additional subroutines).

Functions should get and return objects in the simulation environment (envir), rather than pass them as function arguments. This mostly allows for function definitions to be simpler, i.e., they just take the one sim argument if parameters are passed within the simInit call. Accessing objects in the envir is similar to accessing items in a list, i.e., sim[["object"]] or sim$object can be used, in addition to get("object", envir=envir(sim)). Likewise, simulation functions (i.e., those defined in modules) are also accessed using the $ accessor (e.g., sim$myFunction()).

Note that every module requires an "init" event type, which defines the initialization of the module; however, this init event need not do a whole lot (i.e., it can be a stub). As such, the modulenameInit() function is required for initialization. Modules may also include "save" and "plot" events, though these are optional.

## sample event functions from the default `randomLandscapes` module

randomLandscapesInit <- function(sim) {
  if (is.null(params(sim)$randomLandscapes$inRAM)) {
    inMemory <- FALSE
  } else {
    inMemory <- params(sim)$randomLandscapes$inRAM
  # Give dimensions of dummy raster
  nx <- params(sim)$randomLandscapes$nx
  ny <- params(sim)$randomLandscapes$ny
  r <- raster(nrows = ny, ncols = nx, xmn = -nx/2, xmx = nx/2,
              ymn = -ny/2, ymx = ny/2)
  speedup <- max(1, nx/5e2)
  # Make dummy maps for testing of models
  DEM <- gaussMap(template, scale = 300, var = 0.03,
                  speedup = speedup, inMemory = inMemory)
  DEM[] <- round(getValues(DEM), 1) * 1000
  forestAge <- gaussMap(template, scale = 10, var = 0.1,
                        speedup = speedup, inMemory = inMemory)
  forestAge[] <- round(getValues(forestAge), 1) * 20
  percentPine <- gaussMap(template, scale = 50, var = 1,
                          speedup = speedup, inMemory = inMemory)
  percentPine[] <- round(getValues(percentPine), 1)

  # Scale them as needed
  forestAge <- forestAge / maxValue(forestAge) * 100
  percentPine <- percentPine / maxValue(percentPine) * 100

  # Make layers that are derived from other layers
  habitatQuality <- (DEM + 10 + (forestAge + 2.5) * 10) / 100
  habitatQuality <- habitatQuality / maxValue(habitatQuality)

  # Stack them into a single stack and assign to sim envir
  mapStack <- stack(DEM, forestAge, habitatQuality, percentPine)
  names(mapStack) <- c("DEM", "forestAge", "habitatQuality", "percentPine")
  setColors(mapStack) <- list(DEM = brewer.pal(9, "YlOrBr"),
                              forestAge = brewer.pal(9, "BuGn"),
                              habitatQuality = brewer.pal(8, "Spectral"),
                              percentPine = brewer.pal(9, "Greens"))
  sim[[globals(sim)$stackName]] <- mapStack

2.2.4 Event diagram

To better understand how events are scheduled within a simulation, a visual representation called an eventDiagram illustrates the sequences of events within a simulation.

Simulation time is presented on the x-axis, starting at date startDate. Each module appears in a color-coded row, within which each event for that module is displayed corresponding to the sequence of events for that module. Note that only the start time of the event is meaningful is these figures: the width of the bar associated with a particular module’s event corresponds to the module’s timestep unit, not the event’s “duration”.

## png 
##   2
mySim <- spades(mySim) # runs the simulation

# overview of the events in the simulation
eventDiagram(mySim, "0000-06-01", n = 200, width = 720)