`adnuts`

main purpose is to provide a wrapper for performning Bayesian analyses using the no-U-turn (NUTS) algorithm (Hoffman and Gelman 2014) for ADMB models (Fournier et al. 2012). The ADMB model itself contains the algorithm code, but this package provides the user a convenient environment to run and diagnose Markov chains, and make inference. In addition, NUTS capabilities are provided for any posterior whose log-density and log-density gradient can be written as R functions. This includes TMB models (Kristensen et al. 2016) but also other special cases. This package aims to give ADMB and TMB models similar functionality to Stan (Carpenter et al. 2017; Stan Development Team 2017).

Key features of the packages:

- Run no-U-turn sampler or random walk Metropolis MCMC chains from within R using the
`sample_admb`

and`sample_tmb`

functions. - Adaptation of the NUTS stepsize is automatically done during the warmup phase.
- The mass matrix options are: diagonal adaptation during warmup or an arbitrary dense matrix can be passed from R.
- Parallel execution and automatic merging of chains ease workflow.
- Easy diagnostic checking using functionality provided by packages
`rstan`

and`shinystan`

.

Additional features for ADMB users:

- The MLE covariance matrix can be used (i.e., ADMB file admodel.cov). Likewise, the model can be initialized from the ‘.par’ file (although not recommended).
- A ‘duration’ argument to stop the chains running after a specified period of time (e.g., 2 hours), returning whatever samples were generated in that period.
- When running multiple chains, whether in parallel or serial, samples are merged and written to the ‘.psv’ file. Thus, executing the model in the ‘-mceval’ phase uses all chains.
`sample_admb`

includes an ‘mceval’ argument dictating whether to run in this phase when the sampling is finished. - A modified pairs plot designed to help facilitate comparison between MLE estimates and covariances, and the posterior samples.

Typically NUTS works efficiently with default settings and no user intervention. However, in some cases you may need to modify the settings. See below for a brief description of NUTS and how you can modify its behavior and when needed.

In general very little is needed to prepare an ADMB model for use with `adnuts`

. As with any model, the user must build the template file to return a negative log likelihood value for given data and parameters. The user is responsible for ensuring the a valid and reasonable model is specified. Typical model building practices such as building complexity slowly and validating with simulated data are strongly encouraged.

Sampling for ADMB models is accomplished with the R function `sample_admb`

. This function is designed to be similar to Stan’s `stan`

function in naming conventions and behavior. Some differences are necessary, such as passing a model name and path. Also note that this function does not do optimization nor Variational Inference.

The default behavior for NUTS is to run 3 chains with 2000 iterations, with a warmup (i.e., burn-in) phase during the first 1000. There is no external thinning (in a sense it is done automatically within the algorithm), and thus the `-mcsave`

option does not work with NUTS by design. These defaults work well in most cases and should be changed only after running chains and investigating. Users of the RWM algorithm will accustomed to running millions of iterations with a high thinning rate. **Do not do that!**. The key thing to understand is that NUTS runs as long as it needs to get nearly independent samples. Consult the Stan documentation for advice on a workflow for NUTS models (e.g., this guide)

`sample_admb`

can also run RWM chains via the argument `algorithm="RWM"`

. Consult the ADMB documentation for more information on a workflow with these samplers.

One important overlap with Stan the `control`

arguments, which allows the user to control the algorithm. For NUTS, this includes: - Metric or mass matrix (adapted digonal or dense matrix) [`metric`

] - Maximum treedepth for trajectories [`max_treedepth`

’] - Target acceptance rate [`adapt_delta`

] - Step size, which if NULL is adapted [`stepsize`

] - Mass matrix adaptation tuning parameters (not recommended to change) [`w1`

, `w2`

, `w3`

]

For RWM the only argument used is `metric`

.

This function returns a list whose elements mimic some of that returned by `stan`

as well. `stan`

returns an object of class `stanfit`

while the output of `sample_admb`

is a simple named list. However, this list has been constructed to be useful for plugging into some `rstan`

tools (see below).

No special output files are required to run the model with `adnuts`

. In addition, the user can still use the `mceval_phase`

flag to run specific code on saved samples. ADMB saves posterior draws to a .psv file. When executing the model with `-mceval`

it will loop through those samples and execute the procedure section with flag `mceval_phase()`

evaluating to 1. This behavior is unchanged with `adnuts`

, but is complicated when running multiple chains because there will be multiple .psv files. Thus, `sample_admb`

combines chains in R and writes a single .psv file containing samples from all chains (after warmup and thinned samples are discarded). This also works in parallel (see below).

Previously, ADMB required an estimated covariance function to use the random walk Metropolis (RWM) algorithm. Thus, for models without a valid mode or a Hessian that could not be inverted could not use MCMC methods. With `adnuts`

neither an MLE nor covariance estimate is needed because NUTS adapts these tuning parameters automatically (see below). However, if a mode exists I recommend estimating the model normally before running MCMC.

`sample_admb`

is strongly recommended for running the MCMC (NUTS or RWM). However, it is a convenience function that runs the chains from the command line. The list returned by `sample_admb`

contains an element `cmd`

which shows the user the exact command used to call the ADMB model.

The ADMB model is an executable file that contains the code necessary for NUTS and RWM. When run, it generates many output files. As such, I recommend putting the model into a subdirectory below the directory containing the R script (passed as the `path`

argument). This is required for parallel execution but is recommended in general.

Parameter priors must be specified manually in the ADMB template file. For instance, a standard normal prior on parameter `B`

would be subtracted from the objective as `f+=dnorm(B,0.0,1.0)`

. Note that statistical functions in ADMB, such as `dnorm`

, return the negative log density and thus must be added to the objective function.

Parameter transformations are limited to box constraints within the ADMB template (e.g., `init_bounded_number`

). When used, this puts an implicit uniform prior on the parameter.

However, variance parameters are common and require bounds of (0, Inf). To implement such a bound in ADMB, specify the model parameter as the log of the variance, and then in the template exponentiate it and use throughout. Because of this parameter transformation, the Jacbobian adjustment is needed. This can be accomplished by subtracting the parameter in log space from the negative log-likelihood. For instance, use parameter `log_sd`

in the template, then let `sigma=exp(log_sd)`

, and update the objective function: `f-=log_sd;`

.

Parallel sampling is done by brute force using the `snowfall`

package. `n.cores`

chains will be run by making temporary copies of the directory `path`

(which contain the model executable, data inputs, and any other required files). Then a separate R session calls `sample_admb`

and when done the results are merged together and the temporary folders deleted. If errors occur, these temporary folders may need to be deleted manually.

The `rstan`

package provides an improved function for calculating effective sample size and \(\hat{R}\) statistics. The samples from the fitted object can be plugged directly into it.

```
## sigmayearphi sigmaphi sigmap a[1]
## 202.0221 337.0115 530.4748 426.2951
```

```
## sigmayearphi sigmaphi sigmap a[1]
## 1.012854 1.010836 1.002894 1.018544
```

Likewise both the model parameters and the NUTS sampler parameters can be extrated as a data frame. These functions have optional arguments for whether to include the warmup samples and log posterior column (lp__)

```
## 'data.frame': 1500 obs. of 5 variables:
## $ sigmayearphi: num 0.994 1.294 1.218 1.177 0.858 ...
## $ sigmaphi : num -0.799 -1.214 -1.173 -0.739 -1.065 ...
## $ sigmap : num 0.167 -0.0411 0.065 -0.0245 -0.0701 ...
## $ a[1] : num 1.16 1.12 1.73 1.71 1.34 ...
## $ a[2] : num 0.853 0.954 0.619 0.791 1.258 ...
```

```
## 'data.frame': 1500 obs. of 8 variables:
## $ chain : num 1 1 1 1 1 1 1 1 1 1 ...
## $ iteration : num 501 502 503 504 505 506 507 508 509 510 ...
## $ accept_stat__: num 0.974 0.986 0.372 0.991 0.999 ...
## $ stepsize__ : num 0.0842 0.0842 0.0842 0.0842 0.0842 ...
## $ treedepth__ : num 6 6 6 6 6 6 6 6 6 6 ...
## $ n_leapfrog__ : num 63 63 63 63 63 63 63 63 63 63 ...
## $ divergent__ : num 0 0 0 0 0 0 0 0 0 0 ...
## $ energy__ : num -1854 -1855 -1856 -1843 -1848 ...
```

The list returned by `sample_admb`

can also be plugged directly into the ShinyStan interactive tool environment by calling the wrapper function `launch_shinyadmb`

. See ShinyStan documentation for more information on this. It is designed to provide NUTS specific diagnostics, but also serves as a more general tool for MCMC diagnostics and thus is beneficial for RWM chains as well. If desired, the output samples can be converted into `mcmc`

objects for use with the CODA R package. For instance, CODA traceplots can be accessed like this:

Most ADMB models have well defined modes and estimated covariance matrices used to quantify uncertainty. The `pairs_admb`

function can be used to plot pairwise posterior draws vs the MLE estimate and confidence ellipses. Major discrepancies between the two are cause for concern. As such, this can be a good diagnostic tool for both frequentist and Bayesian inference. In particular, it often is informative to plot the slowest mixing parameters.

```
slow <- c("sigmayearphi", "yeareffphi_raw[3]", "yeareffphi_raw[2]",
"yeareffphi_raw[4]", "yeareffphi_raw[1]")
pairs_admb(fit.admb, pars=slow)
```

Here we see a large mismatch between the ellipses and MCMC samples. Thus, even though ADMB found an MLE and succcessfully inverted the Hessian matrix, its estimates are invalid. It is not surprise, as this is a complex hierarchical model without a true mode. NUTS works efficiently for this model because it does not use these MLE estimates. The standard RWM algorithm would grind to a halt and never converge for this model, but NUTS works well.

Noting special needs to be done to setup a TMB object for use with `sample_tmb`

. Typically the Laplace Approximation is not done during MCMC, and so by default the parameters declared as “random” are treated the same. You can turn this off with the `laplace`

option.

Box constraints can be passed directly to `sample_tmb`

as vectors `lower`

and `upper`

the same as to the optimizer. Values of -Inf and Inf are allowed to create one-sided contraints, as with variance parameters. Note that these parameter transformations are done in R and thus are relatively slow. Alternatively variance parameters can be added to the template in log space and a Jacobian adjustment added as in ADMB (see above).

Similar to `sample_admb`

in its arguments, behavior, and returned value. See above for more information.

Parallel chains can be run using the `snowfall`

package by specifying `parallel=TRUE`

and the number of cores. The TMB object needs to be rebuilt because new R sessions are created. No console output is presented when running in parallel, but text output is piped to a file for monitoring if desired.

Output returned from `sample_tmb`

is very similar as from `sample_admb`

, so refer to those sections. But use `launch_shinytmb`

for these objects. There is no equivalent for `pairs_admb`

for TMB models.

Hamiltonian Monte Carlo is a powerful family of MCMC algorithms that use gradients to propose efficient transitions. We review the basics here but refer to interested readers to (Neal 2011; Betancourt 2017; Monnahan, Thorson, and Branch 2017). Instead of randomly generating a proposed point, to be rejected/accepted, HMC generates *trajectories* from which a point is chosen to be rejected/accepted. These trajectories use gradient information and an analogy of a ball rolling on a surface is often used. These trajectories are efficient when they can transition to nearly anywhere on the posterior (stark contrast with random walk algorithms). However, to do this they need to be well-tuned. Generally there are three aspects of the algorithms that need to be tuned.

- The step size. How big of steps between points on a single trajectory. Bigger steps means fewer calculations (and thus faster), but has a negative cost of rejecting more points.
- The trajectory length. How long should a trajectory should be depends on many factors, and is not constant over the posterior. If it is too short, HMC resembles inefficient random walk behavior. If it is too long, computations are wasted.
- The “mass matrix” used. This matrix tells the algorithm about the global shape of the posterior so that it can generate better trajectories. When large discrepancies between marginal variances exist, the trajectores will be less efficient (e.g., one parameter has a marginal variance of 1, and another a marginal variance of 1000).

The no-U-turn sampler is a powerful sampler because it automated the tuning of the first two of these aspects (Hoffman and Gelman 2014). During warmup it tunes the step size to a target acceptance rate (default of 0.8) which has been shown to be optimal (Betancourt, Byrne, and Girolami 2014). Most importantly, though, is that it uses a recursive tree building algorithm to continue doubling the trajectory until a “U-turn” occurs, meaning going any further would be wasteful computationally. Thus, trajectory lengths are automatically optimal.

The original algorithm was implemented into the Bayesian statistical software Stan (Carpenter et al. 2017; Stan Development Team 2017). In addition to the automation of NUTS, Stan provides a scheme for adapting the step size during the warmup phase. Estimated diagonal mass matrices correct for global differences in scale, but not correlations. A dense matrix can also be adapted, and corrects for global correlations, but comes at a higher computation cost. Typically a diagonal matrix is best and thus is default in both Stan and `adnuts`

.

These three extensions lead to efficient HMC sampling with little to no user intervetion for a wide class of statistical models, including hierarchial ones (Monnahan, Thorson, and Branch 2017). Since publication, further developments have been made in HMC theoretical and practical research. For instance, Stan now includes an update called “exhaustive” HMC (Betancourt 2016) that more efficiently samples from the points in a trajectory.

For both ADMB and TMB models, `adnuts`

uses the original algorithm presented in (Hoffman and Gelman 2014). However it also uses a similar mass matrix adaptation scheme as used in Stan.

The algorithm is initiated with a unit diagonal mass matrix. During the first 50 iterations only the step size is adapted. After the next 75 iterations an estimated variance for each parameter (in untransformed space) is calculated and used as the new mass matrix. The next update occurs after twice the iterations as the previous update. This process repeats until the last 25 samples of the warmup phase. During this phase the mass matrix is held constant and only the step size adapt. See the Stan manual (Stan Development Team 2017) for more details. The step size is adapted during all warmup iterations. No information is returned about mass matrix adapation currently.

Once the warmup phase is over, no adaptation is done. Because of the adaptation the warmup samples are not valid samples from the posterior and *must* be discarded and not used for inference.

In some cases you will need to adjust the behavior of the NUTS algorithm to improve sampling. Here I review the three options for intervention (step size, trajectory lengths, mass matrix) that a user can take, and when and why they might need to.

A maximum tree depth argument is used to prevent excessively long trajectories (which can occur with poorly specified models). This is set to 12 (i.e., a length of \(2^12=4096\) steps) by default, which typically is long enough that a U-turn would occur. However, in some cases a model may need to make longer trajectories to maintain efficient sampling. In this case you will get warnings about exeeding maximum tree depth. Rerun the model with `control=list(max_treedepth=14)`

or higher, as needed.

Recall that a single NUTS trajectory consists of a set of posterior samples, resulting from a numerical approximation to a path along the posterior. The step size controls how close the approximation is along the true path. When the step size is too large and encounters extreme curvature in the posterior a divergence will occur. Divergences should not be ignored because they could lead to bias in inference. Instead, you force the model to take smaller step sizes by increasing the target acceptance rate. Thus, when you get warnings about divergences, rerun the model with `control=list(adapt_delta=.9)`

or higher, as necessary. If the divergences do not go away, investigate the cause and try to eliminate the extreme curvature from the model, for example with a reparameterization (Stan Development Team 2017; Monnahan, Thorson, and Branch 2017).

If there are extreme global correlations in your model, NUTS will be inefficient when using a diagonal mass matrix (the default). In this case, you can pass a dense matrix, estimated externally or from previous runs (`sample_admb`

returns an element `covar.est`

which can be passed to the next call). Do this with `control=list(metric=M)`

where M is a matrix in untransformed space that approximates the posterior. For ADMB models, you can try using the MLE covariance by setting `control=list(metric="mle"). Note that, for technical reasons, you need to reoptimize the model with the command line argument`

-hbf 1`. (ADMB uses different transformation functions for HMC so the covariance would be mismatched otherwise). Note that when using a dense mass matrix there is additional computational overhead, particularly in higher dimensions. That is, a dense matrix leads to shorter trajectories, but they take longer to calculate. Whether a dense metric is worth the increase in sampling efficiency will depend on the model.

The following figure demonstrates the effect of the mass matrix on a 2d normal model with box constraints. The columns denote the different model “spaces” and the rows different mass matrices. Random, arbitrary NUTS trajectories are show in red over the top of posterior draws (points). The right column is the model space, the middle the untransformed, and the far left the untransformed after being rotated by the mass matrix. Note the differences in scales in the axes among plots. The key here is the rightmost column. The top panel is with no mass matrix (i.e., unit diagonal), and the trajectories ungulate back and forth as they move across the posterior. Thus to go from one end to the other is not very straight. When a diagonal matrix is used, the trajectories become noticeably straighter. Finally, with the dense matrix the trajectories are even better. This is the effect of the mass matrix: trajectories can move between regions in the posterior more easily.

Betancourt, Michael. 2016. “Identifying the Optimal Integration Time in Hamiltonian Monte Carlo.” *arXiv Preprint arXiv:1601.00225*.

———. 2017. “A Conceptual Introduction to Hamiltonian Monte Carlo.” *arXiv Preprint arXiv:1701.02434*.

Betancourt, MJ, Simon Byrne, and Mark Girolami. 2014. “Optimizing the Integrator Step Size for Hamiltonian Monte Carlo.” *arXiv Preprint arXiv:1411.6669*.

Carpenter, B., A. Gelman, M. D. Hoffman, D. Lee, B. Goodrich, M. Betancourt, A. Riddell, J. Q. Guo, P. Li, and A. Riddell. 2017. “Stan: A Probabilistic Programming Language.” *Journal of Statistical Software* 76 (1):1–29.

Fournier, D. A., H. J. Skaug, J. Ancheta, J. Ianelli, A. Magnusson, M. N. Maunder, A. Nielsen, and J. Sibert. 2012. “AD Model Builder: Using Automatic Differentiation for Statistical Inference of Highly Parameterized Complex Nonlinear Models.” *Optimization Methods & Software* 27 (2):233–49.

Hoffman, M. D., and A. Gelman. 2014. “The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo.” *Journal of Machine Learning Research* 15 (1):1593–1623.

Kristensen, Kasper, Anders Nielsen, Casper W. Berg, Hans Skaug, and Bradley M. Bell. 2016. “TMB: Automatic Differentiation and Laplace Approximation.” *Journal of Statistical Software* 70 (5):21.

Monnahan, C. C., J. T. Thorson, and T. A. Branch. 2017. “Faster Estimation of Bayesian Models in Ecology Using Hamiltonian Monte Carlo.” *Methods in Ecology and Evolution* 8 (3):339–48.

Neal, Radford M. 2011. “MCMC Using Hamiltonian Dynamics.” *Handbook of Markov Chain Monte Carlo* 2.

Stan Development Team. 2017. “Stan Modeling Language Users Guide and Reference Manual, Version 2.17.0.”