This vignette describes how to write formulas using ipmr
’s notation for hierarchical models. It is meant to mirror the standard mathematical notation used to represent the models. The examples here will primarily use lme4 to illustrate how to go from vital rate models to functional forms that you can use in ipmr
. You are of course free to use whatever model fitting software you like, but the number of different possibilities are far too great to cover in this vignette.
The ipmr
notation is meant to make implementing models more concise, as it allows you to avoid retyping/copy+pasting the same kernel formats and just substituting, for example, plot name or transition year. It works by appending suffixes to kernels, vital rates, and parameters that can take multiple values (e.g. P
becomes P_yr
). We then add a named list where the names correspond to the suffix and the entries correspond to the values it can take. We pass pass the list to levels_hier_effs
(i.e. levels_hier_effs = list(yr = 1:5)
), and set has_hier_effs = TRUE
.
Some quick translations between mathematical notation, R model syntax, and ipmr
style notation. In the ipmr
column, suffixes that are appended are highlighted in bold.
Suffix names are not restricted to the examples below, and can be anything you want them to be.
The mathematical notation in this vignette will take the following form:
Coefficients/parameters: \(\alpha\)  intercept, \(\beta\)  slope, \(\mu\)  mean, \(\sigma\) standard deviation.
Vital rates: subscripts on the coefficients/parameters. For example, \(\alpha_g\) for the intercept of a growth model.
Hierarchical effects: superscripts on coefficients. For example, \(\beta_g^{site}\) for a site specific growth slope.
Probability density functions: these are denoted with \(f_p\), where \(_p\) is replaced by the vital rate. For example, \(f_g(\mu_g, \sigma_g)\) would denote a probability density function for growth.
Math Formula  R Formula  ipmr  Model Number 

\(\mu_g^{yr} = \alpha_g + \alpha_g^{yr} * \beta_g * z\) 
size_2 ~ size_1 + (1  year), family = gaussian()

mu_g_yr = g_int + g_int_yr + g_slope * z  1 
\(g(z',z) = f_g(\mu_g^{yr}, \sigma_g)\)  g = dnorm(z_2, mu_g_yr, sd_g)  1  
\(Logit(\mu_{s}^{plot,yr}) = \alpha_s + \alpha_{s}^{plot,yr} + \beta * z\) 
surv ~ size_1 + (1  year) + (1  plot), family = binomial()

s_pl_yr = s_int + s_int_pl_yr + s_slope * z  2 
surv ~ size_1 + (1  year) + (1  plot), family = binomial()

s_pl_yr = s_int + s_int_pl + s_int_yr + s_slope * z  2  
s = inv_logit(s_pl_yr)  2  
\(log(\mu_{r}^{yr}) = \alpha_{r} + \alpha_r^{yr} + (\beta_r + \beta_r^{yr}) * z\) 
fec ~ size_1 + (size_1  year), family = poisson()

r = exp(r_int + r_int_yr + (r_slope + r_slope_yr) * z)  3 
Internally, ipmr
generates the various levels using expand.grid
. This means that every combination of the values in levels_hier_effs
must exist in the data_list
, or the model will fail with an error along the lines of (Error in eval_tidy: object 'x_yz' not found
). In order to exclude levels that do not exist in your data, you can add a vector to the list in levels_hier_effs
called drop_levels
. This should contain the values you wish to exclude as a character vector. For example, if data are collected for sites = c("a", "b", "c")
, and years = c(2005:2008)
, but there is no data from site "a"
in 2007, we can use levels_hier_effs = list(site = c("a", "b", "c"), year = c(2005:2008), drop_levels = c("a_2007"))
.
When picking values to appear in levels_hier_effs
, it is best to avoid longer words or phrases where words are separated by underscores (e.g. treatment = c("h_removal", "c_addition")
), as this may cause problems when values are substituted into expressions. CamelCase is generally safer in these cases (e.g. use treatment = c("hRemoval", "cAddition")
instead of treatment = c("h_removal", "c_addition")
).
These models are pretty common in the IPM literature. They encompass cases where a single site has been sampled over many years, or sampled many sites across one transition.
This example will use a hypothetical study that spans 5 transitions and has 6 consecutive censuses. The temporal variation will be denoted with a suffix/superscript appended to each term that it modifies (\(x^{yr}\)/x_yr
). To limit complexity, let’s pretend our exploratory analysis and model selection indicated the best overall models included random yearspecifc intercepts for growth (\(g^{yr}\)) and survival (\(s^{yr}\)), but not probability of reproduction (\(f_p\)), recruit production (\(f_r\)), or the recruit size distribution (\(f_d\)). We will work with a simple model to start (i.e. 1 continuous state variable, no discrete states). This yields the following IPM (\(z,z'\) represent size/height/weight at time \(t\) and \(t+1\), respectively):
\(n(z', t + 1) = \int_L^U [P^{yr}(z',z) + F(z',z)]n(z,t)dz\)
\(P^{yr}(z', z) = s^{yr}(z) * g^{yr}(z',z)\)
\(F(z',z) = r_p(z) + r_r(z) + r_d(z')\)
Note that only the \(P\) kernel get the \(yr\) subscript appended to them  this is the only one that our model selection process decided had substantial year to year variation. Thus, we can actually write the \(F\) kernel the exact same way as we would in a simple IPM when we implement the model in ipmr
.
The vital rate models could be written on paper as follows:
Growth
\(\mu_g = (\alpha_g + \alpha_g^{yr})+ \beta_g * z\)
\(g(z',z) = f_g(\mu_g, \sigma_g)\)
Survival
Probability of reproducing
Recruit production
Recruit size distribution
and modeled in R as follows:
library(lme4)
library(ipmr)
< lmer(size_2 ~ size_1 + (1  year), data = grow_data)
grow_mod
< glmer(surv ~ size_1 + (1  year), data = surv_data, family = binomial)
surv_mod
< glm(repr ~ size_1, data = repr_data, family = binomial)
repr_mod
< glm(recr ~ size_1, data = recr_data, family = poisson)
recr_mod
< mean(rcsz_data$size_2)
rcsz_mu < sd(rcsz_data$size_2) rcsz_sd
We could then extract the fixed coefficients with the fixef
method and the conditional modes of the random effects via the ranef
method:
< fixef(grow_mod)
beta_gs
< ranef(grow_mod)
alpha_g_yrs
< fixef(surv_mod)
beta_ss
< ranef(surv_mod) alpha_s_yrs
The nonmixed models can use the coef
method to extract a vector of coefficients:
< coef(repr_mod)
repr_coef
< coef(recr_mod) recr_coef
The next step is to transform these values into something ipmr
can use. This must always be a named list where the names of the list components are the names used in the vital expressions or kernel formula. Transforming the \(\beta\)s is easier  the output from fixef
is always a named vector, so we just use as.list()
and change the names to whatever we want to use. We then bundle it into one big list with all the fixed coefficients.
< as.list(beta_gs)
g_list names(g_list) < c("alpha_g", "beta_g")
< as.list(beta_ss)
s_list names(s_list) < c("alpha_s", "beta_s")
< as.list(recr_coef)
r_p_list names(f_p_list ) < c("alpha_r_p", "beta_r_p")
< as.list(repr_coef)
r_r_list names(r_r_list) < c("alpha_r_r", "beta_r_r")
< list(mu_r_d = rcsz_mu, sigma_r_d = rcsz_sd)
r_d_list
< c(g_list, s_list, r_p_list, r_r_list, r_d_list) all_fixed_params
For the random effects, we probably need to do some a bit more processing, though this really depends on which modeling framework you use and the format of the outputs. For lme4
, it looks like this:
< as.list(unlist(alpha_g_yrs))
g_alpha_list names(g_alpha_list) < paste("alpha_g_", 2001:2006, sep = "")
< as.list(unlist(alpha_s_yrs))
s_alpha_list names(s_alpha_list) < paste("alpha_s_", 2001:2006, sep = "")
First, we strip away all of lme4
related attributes with unlist()
, then convert it to the flat list format that we need. Next, we set the names. Now, we can combine it with our all_fixed_params
list and we have all of the parameters we need for the kernel functions:
< c(all_fixed_params, g_alpha_list, s_alpha_list) all_params
We are now ready to begin implementing the kernels. This model will be a simple, density independent, stochastic kernelresampled model. The parts where the hierarchical syntax is used are highlighted by comments in the code block:
<init_ipm(sim_gen = "simple",
ex_ipm di_dd = "di",
det_stoch = "stoch",
kern_param = "kern") %>%
define_kernel(
# _yr is appended as a suffix with an underscore. Notice how the formula
# from 2 is translated into the formula argument and the kernel name
name = "P_yr",
family = "CC",
formula = s_yr * g_yr,
# We also modify each parameter name with a suffix as well.
# Here, I've split out the linear predictor and the inverse logit
# transformation into separate steps to avoid over cluttering
s_lin_p = alpha_s + alpha_s_yr + beta_s * z_1,
s_yr = 1 / (1 + exp( s_lin_p)),
# We do the same with growth as we did with survival and the P_yr formula
g_yr = dnorm(z_2, mu_g_yr, sigma_g),
mu_g_yr = alpha_g + alpha_g_yr + beta_g * z_1,
data_list = all_params,
states = list(c("z")),
# We signal that the kernel has hierarchical effects
has_hier_effs = TRUE,
# And provide the values that each suffix can take as a named list.
# The name(s) in this list MUST match the suffix used in the expressions.
levels_hier_effs = list(yr = 2001:2006),
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "g_yr")
%>%
)
# This kernel doesn't get anything special because there are no
# timevarying parameter values.
define_kernel(
name = "F",
family = "CC",
formula = r_p * r_r * r_d,
r_p_lin_p = alpha_r_p + beta_r_p * z_1,
r_p = 1 / (1 + exp( r_p_lin_p)),
r_r = exp(alpha_r_r + beta_r_r * z_1),
r_d = dnorm(z_2, mu_r_d, sigma_r_d),
data_list = all_params,
states = list(c("z")),
has_hier_effs = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "r_d")
%>%
) define_impl(
make_impl_args_list(
kernel_names = c("P_yr", "F"),
int_rule = rep("midpoint", 2),
state_start = rep("z", 2),
state_end = rep("z", 2)
)%>%
) define_domains(
z = c(0.1, 10, 50)
%>%
) define_pop_state(
n_z = runif(50)
%>%
) make_ipm(
iterate = TRUE,
iterations = 100,
kernel_seq = sample(2001:2006, size = 100, replace = TRUE),
normalize_pop_size = TRUE
)
Sometimes, models will have discretely and continuously varying components as functions of different variables. For example, a study could consider multiple sites (discrete component) and spatial and/or temporal fluctuations in the environment at each site (continuous component). We’ll use a hypothetical study that models demography as a function of environment across multiple sites to illustrate this.
We’ll implement a general IPM of a perennial plant species. Our data will come from 6 sites, "A"
 "F"
, and the vital rate intercepts for the plants will vary across each one. Its seeds can either enter a seed bank or recruit immediately, producing recruits at the next time step. We’ll pretend that data from the recruit size distribution was pooled across sites, and so does not vary with space. The seed bank will be age structured and its parameters will be space and timeinvariant. Seeds in the first year can either survive (\(s_{{sb}_1}\)/s_sb1
) and recruit (\(r_{{sb}_1}\)), or survive and become two year old seeds. Two year old seeds can either survive (\(s_{{sb}_2}\)/s_sb2
) and recruit (\(r_{{sb}_2}\)/r_sb2
) or die.
The vital rate information fed into our hypothetical regression models are (\(s^{site}\)/s_site
), growth (\(g^{site}\)/g_site
), probability of reproduction (\(c_{p}^{site}\)/c_p_site
), and seed production (\(c_{r}^{site}\)/c_r_site
). They are all functions of size (z
) and environmental covariates (\(\theta^{t}\) and \(\theta^{p}\), denoted together in the kernel formulas as just \(\theta\)).
The full IPM looks like this:
\[ n(z, t + 1) = \int_L^U [P^{site}(z', z,\theta) + r_i * F^{site}(z', z, \theta)]n(z, t)dz +\\ s_{{sb}_1} * r_{{sb}_1} * sb_1(t) * \int_L^U c_d(z')dz' + \\s_{{sb}_2} * r_{{sb}_2} * sb_2(t) *\int_L^U c_d(z')dz' \]
\(sb_1(t + 1) = (1r_i) * \int_L^U [c_p^{site}(z, \theta) * c_r^{site}(z,\theta)] n(z, t)dz\)
\(sb_2(t + 1) = s_{{sb}_1} * (1r_{{sb}_1}) * sb_1(t)\)
The sub kernels are:
\(P^{site}(z',z, \theta) = s^{site}(z, \theta) * g^{site}(z',z, \theta)\)
\(F^{site}(z',z, \theta) = c_{{p}}^{site}(z, \theta) * c_{r}^{site}(z, \theta) * c_d(z')\)
\(\theta^{t} \sim Norm(\mu^{t}, \sigma^{t})\)
\(\theta^{p} \sim Gamma(\alpha^{p}, \beta^{p})\)
And the size/climate dependent vital rate functions and example R code are:
\(Logit(s^{site}(z, \theta)) = (\alpha_s + \alpha_{s}^{site}) + \beta_s^{z} * z + \beta_s^{t} * \theta^{t} + \beta_s^{p} * \theta^{p}\)
glmer(survival ~ size + temp + precip + (1  site), data = my_surv_data, family = binomial())
\(g^{site}(z',z,\theta) = f_g(\mu_g^{site}(z, \theta), \sigma_g)\)
\(\mu_g^{site}(z, \theta) = (\alpha_g + \alpha_g^{site}) + \beta_g^{z} * z + \beta_g^{t} * \theta^{t} + \beta_g^{p} * \theta^{p}\)
lmer(size_next ~ size + temp + precip + (1  site), data = my_grow_data)
\(Logit(c_{p}^{site}(z, \theta)) = (\alpha_{c_p} + \alpha_{c_p}^{site}) + \beta_{c_p}^{z} * z + \beta_{c_p}^{t} * \theta^{t} + \beta_{c_p}^{p} * \theta^{p}\)
glmer(repro ~ size + temp + precip + (1  site), data = my_repro_data, family = binomial())
\(log(c_{r}^{site}(z, \theta)) = (\alpha_{c_r} + \alpha_{c_r}^{site}) + \beta_{c_r}^{z} * z + \beta_{c_r}^{t} * \theta_{t} + \beta_{c_r}^{p} * \theta_{p}\)
glmer(flower_n ~ size + temp + precip + (1  site), data = my_flower_data, family = poisson())
\(c_d(z') = f_{c_d}(\mu_{c_d}, \sigma_{c_d})\)
mean(my_recr_data$size_next); sd(my_recr_data$size_next)
Since this example doesn’t work with actual data, we need to simulate some parameters for each site. Additionally, we’ll specify the distribution parameters for each \(\theta\) and write a function that samples them once.
library(ipmr)
set.seed(51)
# Set up the sites, and the parameters
< LETTERS[1:6]
sites
< list(
all_pars r_i = 0.6,
r_sb1 = 0.2,
r_sb2 = 0.3,
s_sb1 = 0.4,
s_sb2 = 0.1,
mu_c_d = 4,
sigma_c_d = 0.9,
sigma_g = 3,
beta_s_z = 2.5,
beta_s_prec = 0.3,
beta_s_temp = 0.5,
beta_g_z = 0.99,
beta_g_prec = 0.01,
beta_g_temp = 0.1,
beta_c_p_z = 0.4,
beta_c_p_prec = 0.6,
beta_c_p_temp = 0.3,
beta_c_r_z = 0.005,
beta_c_r_prec = 0.3,
beta_c_r_temp = 0.9
)
< rnorm(6, mean = 4.5, sd = 1) %>%
g_alphs as.list() %>%
setNames(
paste('alpha_g_', sites, sep = "")
)
< rnorm(6, mean = 7, sd = 1.5) %>%
s_alphs as.list() %>%
setNames(
paste('alpha_s_', sites, sep = "")
)
< rnorm(6, mean = 10, sd = 2) %>%
c_p_alphs as.list() %>%
setNames(
paste('alpha_c_p_', sites, sep = "")
)
< runif(6, 0.01, 0.1) %>%
c_r_alphs as.list() %>%
setNames(
paste('alpha_c_r_', sites, sep = "")
)
< c(all_pars,
all_pars
g_alphs,
s_alphs,
c_p_alphs,
c_r_alphs)
# This list contains parameters that define the distributions for environmental
# covariates
< list(
env_params temp_mu = 12,
temp_sd = 2,
prec_shape = 1000,
prec_rate = 2
)
# This function samples the environmental covariate distributions and returns
# the values in a named list. We can reference the names in the list in our
# vital rate expressions.
< function(env_params) {
sample_fun
< rnorm(1, env_params$temp_mu, env_params$temp_sd)
temp
< rgamma(1, env_params$prec_shape, env_params$prec_rate)
prec
< list(temp = temp,
out prec = prec)
return(out)
}
We’ve written out our model  now we can implement it. We’ll use the general, density independent, stochastic parameter resampled machinery for this.
< init_ipm(sim_gen = "general",
ex_ipm di_dd = "di",
det_stoch = "stoch",
kern_param = "param") %>%
define_kernel(
name = "P_site",
family = "CC",
# site is appended so that we don't have to write out s_A, s_B, etc.
formula = s_site * g_site * d_z,
s_site_lin = alpha_s_site +
* z_1 +
beta_s_z * prec +
beta_s_prec * temp,
beta_s_temp s_site = 1 / (1 + exp(s_site_lin)),
mu_g_site = alpha_g_site +
* z_1 +
beta_g_z * prec +
beta_g_prec * temp,
beta_g_temp g_site = dnorm(z_2, mu_g_site, sigma_g),
data_list = all_pars,
states = list(c("z")),
has_hier_effs = TRUE,
levels_hier_effs = list(site = LETTERS[1:6]),
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "g_site")
%>%
) define_kernel(
name = "F_site",
family = "CC",
formula = c_p_site * c_r_site * c_d * d_z,
c_p_site_lin = alpha_c_p_site +
* z_1 +
beta_c_p_z * prec +
beta_c_p_prec * temp,
beta_c_p_temp c_p_site = 1 / (1 + exp(c_p_site_lin)),
c_r_site = exp(alpha_c_r_site +
* z_1 +
beta_c_r_z * prec +
beta_c_r_prec * temp),
beta_c_r_temp c_d = dnorm(z_2, mu_c_d, sigma_c_d),
data_list = all_pars,
states = list(c("z", "sb1", "sb2")),
has_hier_effs = TRUE,
levels_hier_effs = list(site = LETTERS[1:6]),
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "c_d")
%>%
) define_kernel(
name = "go_sb_1_site",
family = "CD",
formula = (1  r_i) * c_p_site * c_r_site * d_z,
c_p_site_lin = alpha_c_p_site +
* z_1 +
beta_c_p_z * prec +
beta_c_p_prec * temp,
beta_c_p_temp c_p_site = 1 / (1 + exp(c_p_site_lin)),
c_r_site = exp(alpha_c_r_site +
* z_1 +
beta_c_r_z * prec +
beta_c_r_prec * temp),
beta_c_r_temp data_list = all_pars,
states = list(c("z", "sb1")),
has_hier_effs = TRUE,
levels_hier_effs = list(site = LETTERS[1:6]),
evict_cor = FALSE
%>%
) define_kernel(
name = "go_sb_2",
family = "DD",
formula = s_sb1 * (1  r_sb1),
data_list = all_pars,
states = list(c("sb1", "sb2")),
has_hier_effs = FALSE,
evict_cor = FALSE
%>%
) define_kernel(
name = "leave_sb_1",
family = "DC",
formula = s_sb1 * r_sb1 * c_d * d_z,
c_d = dnorm(z_2, mu_c_d, sigma_c_d),
data_list = all_pars,
states = list(c("z", "sb1")),
has_hier_effs = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "c_d")
%>%
) define_kernel(
name = "leave_sb_2",
family = "DC",
formula = s_sb2 * r_sb2 * c_d * d_z,
c_d = dnorm(z_2, mu_c_d, sigma_c_d),
data_list = all_pars,
states = list(c("z", "sb2")),
has_hier_effs = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "c_d")
%>%
) define_impl(
make_impl_args_list(
kernel_names = c("P_site",
"F_site",
"go_sb_1_site",
"go_sb_2",
"leave_sb_1",
"leave_sb_2"),
int_rule = rep("midpoint", 6),
state_start = c("z", "z", "z", "sb1", "sb1", "sb2"),
state_end = c("z", "z", "sb1", "sb2", "z", "z")
)%>%
) define_domains(
z = c(0.1, 100, 200)
%>%
) define_pop_state(
n_z = runif(200),
n_sb1 = 20,
n_sb2 = 20
%>%
) define_env_state(
env_state = sample_fun(env_params),
data_list = list(env_params = env_params,
sample_fun = sample_fun)
%>%
) make_ipm(
iterations = 20,
kernel_seq = sample(LETTERS[1:6], 20, replace = TRUE),
normalize_pop_size = TRUE
)
lambda(ex_ipm)