\documentclass[11pt]{article}
%\VignetteIndexEntry{Introduction to Hierarchical Continuous Time Dynamic Modelling with ctsem}
%\VignetteKeyword{SEM, time series, panel data, dynamic models}
%\VignetteEngine{knitr::knitr_notangle}
%\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{lmodern}
\usepackage[utf8]{inputenc}
\usepackage{rotating}
\usepackage[style=apa,backend=biber,doi=true,natbib=true,url=true,sortcites=true,sorting=nyt]{biblatex}
\usepackage[british]{babel}
\usepackage{csquotes}
\DeclareLanguageMapping{british}{british-apa}
\addbibresource{hierarchicalrefs.bib}
\usepackage[margin=1in]{geometry}
\usepackage{graphicx}
\usepackage{authblk}
%\usepackage{cmap} % to ensure pdf text can be copied - error
\usepackage{mathtools}
\usepackage{hyperref}
\setlength{\parindent}{0cm}
\setlength{\parskip}{1ex}
%clear unnecessary citation stuff
\AtEveryBibitem{
\clearfield{labelmonth}
\clearfield{labelday}
\clearfield{urldate}
}
\DeclareSourcemap{
\maps[datatype=bibtex]{
\map{
\step[fieldsource=url,
notmatch=\regexp{wiki},
final=1]
\step[fieldset=urldate, null]
}
}
}
\usepackage{amsmath} %for multiple line equations
\usepackage{amsfonts} %for mathbb
\usepackage{caption}
\captionsetup{justification=raggedright,singlelinecheck=false}
\newcommand{\vect}[1]{\boldsymbol{\mathbf{#1}}}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% declarations for jss.cls %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%to redefine jss commands to no longer do anything
\newcommand{\pkg}[1]{#1}
\newcommand{\proglang}[1]{#1}
%for wrapping long verbatim words
\makeatletter
\let\old@sverb\@sverb
\def\@sverb#1{\old@sverb{#1}\zz}
\def\zz#1{#1\ifx\@undefined#1\else\penalty\z@\expandafter\zz\fi}
\makeatother
%% almost as usual
\author{Charles C. Driver \\ Max Planck Institute for Human Development \And
Manuel C. Voelkle \\ Humboldt University Berlin \\ Max Planck Institute for Human Development}
\title{Introduction to Hierarchical Continuous Time Dynamic Modelling With ctsem}
\newcommand\numberthis{\addtocounter{equation}{1}\tag{\theequation}} %for adding numbers to specific lines
\begin{document}
<>=
library('ctsem')
library(knitr)
set.seed(22)
knit_hooks$set(crop = hook_pdfcrop)
opts_chunk$set(warning = FALSE, fig.align = 'center', width.cutoff = 100, fig.show = 'hold', eval = TRUE, echo = TRUE, message = FALSE, comment = NA, tidy = FALSE, autodep=TRUE, out.truncate = 100, size='small', crop=TRUE, fig.pos="htbp",pardef=TRUE,cache=FALSE)
knit_hooks$set(pardef = function(before, options, envir) {
if (before) par(mfrow=c(1,1),mgp=c(1.5,.6,0),mar=c(3,2,2,1)+.2, cex=.7)
})
options(width = 100, scipen = 12, digits = 3)
set.seed(1)
cran_check <- ("CheckExEnv" %in% search()) || any(c("_R_CHECK_TIMINGS_",
"_R_CHECK_LICENSE_") %in% names(Sys.getenv()))
knitr::opts_chunk$set(eval = !cran_check)
library(ggplot2)
g_legend<-function(a.gplot){
tmp <- ggplot_gtable(ggplot_build(a.gplot))
leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
legend <- tmp$grobs[[leg]]
return(legend)}
@
\section{Overview}
In this work, we describe the basic usage of the hierarchical formulation of the ctsem software for continuous-time dynamic modelling in R \citep{rcoreteam2014r}. This formulation was described in detail in \citet{driver2018hierarchicala}, though since then the scope has been expanded to include regular (i.e. not Bayesian) maximum likelihood estimation, nonlinear models (such as parameters that vary over time), and optimization with optional importance sampling. The latter enhancement means that the approach described herein largely supersedes the initial mixed effects approach based upon OpenMx \citep{neale2016openmx}, as estimation options now include maximum likelihood, maximum a posteriori, and fully Bayesian with the Stan software's \citep{carpenter2017stan, standevelopmentteam_rstan} Hamiltonian Monte Carlo sampler. The major use case for the OpenMx based wide formulation is many subjects, few time points, and most or all time intervals consistent across subjects. This older structural equation formulation is discussed in detail in \citet{driver2017continuous}. Although some of the R commands and functionality are now different in the newer Stan based format, the general concepts discussed in the original work are still relevant background. Additional material can be found at \url{https://github.com/cdriveraus/ctsem} (very quick start guide on main page), \url{https://github.com/cdriveraus/ctsem/discussions} (question and answer), and \url{https://cdriver.netlify.app/} (tutorials and Bayesian quick start).
ctsem allows for specification and fitting of a range of continuous and discrete time models to study dynamics and processes within a subject, as well as individual differences in these dynamics. The models may include multiple indicators (dynamic factor analysis), multiple, interrelated, potentially higher order processes, and both time dependent and time independent covariates. Classic longitudinal models like latent growth curve and latent change score models can be formulated as special cases. This approach offers some advantages over what could be considered more typical dynamic modelling approaches, such as vector autoregressive models or latent change score models. Two important advantages relate to the handling of time, and the treatment of individual differences. Time information is explicitly incorporated into the model, such that predictions from one measurement to another relate exactly to the amount of time that has passed, rather than simply the number of measurements, as is typical in discrete-time models. When the time interval between all measurements are equal, there is an exact relationship between the discrete and continuous time form (in many cases), but when they differ, the continuous-time form is typically more appropriate. For more on continuous-time models, see \citet{singer1993continuoustime, oud2000continuous, voelkle2013continuous}. With respect to individual differences, the hierarchical Bayesian approach allows for individual variation across all model parameters, while making full use of the data from all subjects to enhance estimation. This has the result that reasonable individual specific parameter estimates may be obtained with far fewer time points than would be required by single-subject time-series type modelling approaches. For more on hierarchical Bayesian models, see \citet{gelman2014bayesian}.
This document is structured such that we briefly describe the continuous time dynamic model governing within subject dynamics, and the hierarchical model governing the distribution of subject level parameters. A more conceptual overview along with technical details is given in \citet{driver2018hierarchicala}. Following, we walk through installing the ctsem software, setting up a data structure, specifying and fitting the model, followed by summary and plotting functions. Some details on additional complexity are then provided, including an example model with a more complex dynamic structure, a discussion of the various options for incorporating stationarity assumptions into the model, and a walk-through of the various transformations involved in the model.
\subsection{Subject level latent dynamic model}
This section describes the subject level model characterising the system dynamics and measurement properties. Although we do not describe it explicitly, the corresponding discrete time autoregressive / moving average models can be specified and use the same set of parameter matrices we describe. The subject level dynamics are described by the following stochastic differential equation:
\begin{equation}
\label{eq:process1}
\mathrm{d} \vect{\eta} (t) =
\bigg(
\vect{A \eta} (t) +
\vect{b} +
\vect{M \chi} (t)
\bigg) \mathrm{d} t +
\vect{G} \mathrm{d} \vect{W}(t)
\end{equation}
Vector $ \vect{\eta} (t)\in\mathbb{R}^{v}$ represents the state of the latent processes at time $t$. The matrix $ \vect{A} \in \mathbb{R}^{v \times v}$ (DRIFT) represents the drift matrix, with auto effects on the diagonal and cross effects on the off-diagonals characterizing the temporal dynamics of the processes.
The continuous time intercept vector $ \vect{b} \in\mathbb{R}^{v}$ (CINT), in combination with $\vect{A}$, determines the long-term level at which the processes fluctuate around.
Time dependent predictors $\vect{\chi}(t)$ represent inputs to the system that vary over time and are independent of fluctuations in the system. Equation \ref{eq:process1} shows a generalized form for time dependent predictors, that could be treated a variety of ways dependent on the assumed time course of time dependent predictors and their effects. We use a simple impulse form shown in Equation \ref{eq:spike}, in which the predictors are treated as impacting the processes only at the instant of an observation occasion $u$. When necessary, the evolution over time can be modeled by extending the state matrices, for examples and discussion see \citet{driver2018understanding}.
\begin{equation}
\label{eq:spike}
\vect{\chi} (t) = \sum_{ u \in \vect{U}} \vect{x}_{u} \delta (t-t_u)
\end{equation}
Here, time dependent predictors $\vect{x}_u \in \mathbb{R}^{l}$ (tdpreds) are observed at measurement occasions $ u \in \vect{U}$. The Dirac delta function $\delta(t-t_u)$ is a generalized function that is $\infty$ at 0 and 0 elsewhere, yet has an integral of 1 (when 0 is in the range of integration). It is useful to model an impulse to a system, and here is scaled by the vector of time dependent predictors $\vect{x}_u$. The effect of these impulses on processes $\vect{\eta}(t)$ is then $\vect{M}\in \mathbb{R}^{v \times l}$ (TDPREDEFFECT).
$\vect{W}(t) \in \mathbb{R}^{v}$ (DIFFUSION) represents independent Wiener processes, with a Wiener process being a random-walk in continuous time. $\textnormal{d}\vect{W}(t)$ is meaningful in the context of stochastic differential equations, and represents the stochastic error term, an infinitesimally small increment of the Wiener process. Lower triangular matrix $\vect{G} \in \mathbb{R}^{v \times v}$ represents the effect of this noise on the change in $\vect{\eta}(t)$. $\vect{Q}$, where $\vect{Q} = \vect{GG}^\top$, represents the variance-covariance matrix of the diffusion process in continuous time.
\subsection{Subject level measurement model}
The latent process vector $\vect{\eta}(t)$ has measurement model:
\begin{equation}
\label{eq:measurement}
\vect{y}(t) = \vect{\Lambda} \vect{\eta}(t) + \vect{\tau} + \vect{\epsilon}(t)
\quad \text{where } \vect{\epsilon}(t) \sim \mathrm{N} (\vect{0}_c, \vect{\Theta})
\end{equation}
$\vect{y} (t)\in\mathbb{R}^{c}$ is the vector of manifest variables, $\vect{\Lambda} \in \mathbb{R}^{c \times v}$ (LAMBDA) represents the factor loadings, and $\vect{\tau} \in\mathbb{R}^{c}$ (MANIFESTMEANS) the manifest intercepts. The residual vector $\vect{\epsilon} \in \mathbb{R}^{c}$ has covariance matrix $\vect{\Theta} \in\mathbb{R}^{c \times c}$ (MANIFESTVAR).
\subsection{Overview of hierarchical model}
Parameters for each subject are first drawn from a simultaneously estimated higher level distribution over an unconstrained space, then a set of parameter specific transformations are applied so that a) each parameter conforms to necessary bounds and b) is subject to the desired prior. Following this, in some cases matrix transformations are applied to generate the continuous time matrices described. The higher level distribution has a multivariate normal prior (though priors may be switched off as desired). We provide a brief description here, and an R code example later in this work, but for the full details see \citet{driver2018hierarchicala}.
The joint-posterior distribution of the model parameters given the data is as follows:
\begin{equation}
p(\vect{\Phi},\vect{\mu},\vect{R},\vect{\beta} | \vect{Y}, \vect{z}) \propto p(\vect{Y} | \vect{\Phi}) p(\vect{\Phi} | \vect{\mu},\vect{R},\vect{\beta}, \vect{z}) p(\vect{\mu},\vect{R},\vect{\beta})
\end{equation}
Subject specific parameters $\vect{\Phi}_i$ are determined in the following manner:
\begin{equation}
\label{eq:subjectparams}
\vect{\Phi}_i = \text{tform} \bigg(\vect{\mu} + \vect{Rh}_i + \vect{\beta} \vect{z}_i \bigg)
\end{equation}
\begin{equation}
\vect{h}_i \sim \mathrm{N}(\vect{0,1})
\end{equation}
\begin{equation}
\vect{\mu} \sim \mathrm{N}(\vect{0,1})
\end{equation}
\begin{equation}
\vect{\beta} \sim \mathrm{N}(\vect{0,1})
\end{equation}
$\vect{\Phi}_i \in\mathbb{R}^{s}$ is the $s$ length vector of parameters for the dynamic and measurement models of subject $i$.
$\vect{\mu} \in\mathbb{R}^{s}$ parameterizes the means of the raw population distributions of subject level parameters.
$\vect{R} \in\mathbb{R}^{s \times s}$ is the matrix square root of the raw population distribution covariance matrix, parameterizing the effect of subject specific deviations $\vect{h}_i \in\mathbb{R}^{s}$ on $\vect{\Phi}_i$.
$\vect{\beta} \in\mathbb{R}^{s \times w}$ is the raw effect of time independent predictors $\vect{z}_i \in\mathbb{R}^{w}$ on $\vect{\Phi}_i$, where $w$ is the number of time independent predictors.
$\vect{Y}_i$ contains all the data for subject $i$ used in the subject level model -- $\vect{y}$ (process related measurements) and $\vect{x}$ (time dependent predictors). $\vect{z}_i$ contains time independent predictors data for subject $i$.
$\text{tform}$ is an operator that applies a transform to each value of the vector it is applied to. The specific transform depends on which subject level parameter matrix the value belongs to, and the position in that matrix.
At a number of points, we will refer to the parameters prior to the tform function as 'raw' parameters. So for instance `raw population standard deviation' would refer to a diagonal entry of $\vect{R}$, and `raw individual parameters for subject i' would refer to $\vect{\mu} + \vect{Rh}_i + \vect{\beta} \vect{z}_i$. In contrast, without the `raw' prefix, `population means' would refer to $\text{tform} (\vect{\mu})$, and would typically reflect values the user is more likely to be interested in, such as the continuous time drift parameters.
\subsection{Install software and prepare data}
Install ctsem software within R:
<>=
install.packages("ctsem")
library("ctsem")
@
Prepare data in long format, each row containing one time point of data for one subject. We need a subject id column, named by default "id", though this can be changed in the model specification. Some of the outputs are simpler to interpret if subject id is a sequence of integers from 1 to the number of subjects, but this is not a requirement. We also need a time column "time", containing positive numeric values for time, columns for manifest variables (the names of which must be given in the next step using \verb|ctModel|), columns for time dependent predictors (these vary over time but have no model estimated and are assumed to impact latent processes instantly), and columns for time independent predictors (which predict the subject level parameters, that are themselves time invariant -- thus the values for a particular time independent predictor must be the same across all observations of a particular subject).
<>=
ctstantestdat[c(3:5,17:22),]
@
As will be discussed in detail later, default priors for the model are set up with an attempt to be 'weakly informative' for typical applications in the social sciences, on data that is centered and scaled. Because of this, we recommend grand mean centering and scaling each variable in the data, with the exception of time dependent predictors, which should be scaled, but centered such that a value of zero implies no effect. This exception is because time dependent predictors are modelled as impulses to the system with no persistence through time, at all times when not observed the value is then inherently zero. Similarly, we expect a time interval of 1.00 to reflect some `moderate change' in the underlying process. If we wished to model daily hormonal fluctuations, with a number of measurements each day, a time scale of hours, days, or weeks could be sensible -- minutes or years would likely be problematic. If the data are not adjusted according to these considerations, the priors themselves should be adjusted, or at least their impact carefully considered -- though note also that an inappropriate scaling of time can also result in numerical difficulties, regardless of priors.
<>=
ctstantestdat[,c('Y1','Y2','TI1','TI2','TI3')] <-
scale(ctstantestdat[,c('Y1','Y2','TI1','TI2','TI3')])
@
Functions to convert between wide and long formats used by ctsem are available, these are \verb}ctWideToLong}, \verb}ctDeintervalise}, \verb}ctLongToWide}, \verb}ctIntervalise}. For details see the relevant help in R.
\subsection{Missing values}
Missingness in the manifest variables is handled using the typical filtering / full information maximum likelihood approach. Missing values on the covariates -- either time dependent or independent -- are somewhat more problematic, and will cause an error by default. Various alternative approaches are available: In certain cases it may be reasonable to replace missing values with zeros, or to sample them as part of the model. See the ctStanFit help for more details.
\subsection{Model specification}
Specify model using \verb}ctModel(type="stanct",...)}. \verb|"stanct"| specifies a continuous time model in Stan format, \verb|"standt"| specifies discrete time, while \verb|"omx"| is the original ctsem behaviour and prepares an OpenMx model \citet[discussed in][]{driver2017continuous}. Other arguments to \verb|ctModel| include the list of within-subject parameter matrices, with LAMBDA, the factor loading matrix, the only one that must be specified. All other matrices are set to the system dimensions implied by LAMBDA (number of columns representing latent processes, rows representing manifest indicators), and in general filled with free parameters. The CINT matrix is the only exception to this, it is filled with zero's by default for identification purposes. Such parameter matrices can be manually specified using either numeric values when an element of the model should be fixed, a character string when it should be estimated, or a character string containing at least one square bracket for deterministic relations between parameter matrices and or latent states. The following code demonstrates this by freeing the CINT matrix of latent intercepts, and instead fixing the MANIFESTMEANS matrix of manifest intercepts. See the help for \verb|ctModel|, available in R via \verb}?ctModel}, for more details.
<>=
model<-ctModel(type='stanct',
latentNames=c('eta1','eta2'),
manifestNames=c('Y1','Y2'),
CINT=matrix(c('cint1','cint2 |||| TI1'),nrow=2,ncol=1),
MANIFESTMEANS=matrix(c(0,0),nrow=2,ncol=1),
TDpredNames='TD1',
TIpredNames = 'TI1',tipredDefault = FALSE,
LAMBDA=diag(2))
@
It is possible to simplify the specification of parameter matrices somewhat, by passing a vector of values and or parameters to be put row-wise (distinct from the R default of column-wise) into a matrix of the correct size. Singular values can also be used as a fill for the entire matrix. The above model in this shorter form is specified like:
<>=
model<-ctModel(type='stanct',
latentNames=c('eta1','eta2'),
manifestNames=c('Y1','Y2'),
CINT=c('cint1','cint2 |||| TI1'),
MANIFESTMEANS=0,
TDpredNames='TD1',
TIpredNames = 'TI1',tipredDefault = FALSE,
LAMBDA=c(1,0,0,1))
@
The code above specifies a first order bivariate latent process model, with each process measured by a single, potentially noisy, manifest variable. A single time dependent predictor is included in the model, as well as a single time independent predictor with effects restricted to only the second continuous intercept. To see the between and within subject model equations, the \verb|ctModelLatex| function can be used on the model object, generating output as shown in Figure \ref{fig:ctmodellatexout}.
<>=
cat(paste0('
\\begin{sidewaysfigure}
\\begin{tiny}'))
cat(ctModelLatex(model,textsize = 'tiny',compile = FALSE,equationonly = TRUE))
cat('\\caption{Output from ctModelLatex function }
\\label{fig:ctmodellatexout}
\\end{tiny}
\\end{sidewaysfigure}')
@
Table \ref{tab:ctmodel} shows the basic arguments one may consider and their link to the dynamic model parameters. Note that ctModel requires variance covariance matrices (DIFFUSION, T0VAR, MANIFESTVAR) to be specified in a matrix square root form, with standard deviations on the diagonal, covariance related parameters on the lower off diagonal (excluding MANIFESTVAR which should for many casess be diagonal only), and (typically) zeroes on the upper off diagonal. While it is possible to fix the lower off diagonals of covariance related matrices to non-zero values, in general this is difficult to interpret because of the necessary matrix transformations, thus we recommend either to fix to zero, or leave free.
\begin{table}\footnotesize
\caption{ctModel arguments} \label{tab:ctmodel}
\begin{tabular}{l|l|l p{8cm} }
\textbf{Argument} & \textbf{Sign} & \textbf{Default} & \textbf{Meaning}\\
\hline
LAMBDA & $\Lambda$& & n.manifest $\times$ n.latent loading matrix relating latent to manifest variables.\\
n.manifest & \textit{c} & & Number of manifest indicators per individual at each measurement occasion.\\
n.latent & \textit{v} & & Number of latent processes.\\
manifestNames & & Y1, Y2, etc & n.manifest length character vector of manifest names.\\
latentNames & & eta1, eta2, etc & n.latent length character vector of latent names.\\
T0VAR & $Q^*_1$ & free & lower tri n.latent $\times$ n.latent matrix of latent process initial covariance, specified with standard deviations on diagonal and covariance related parameters on lower triangle.\\
T0MEANS & $\eta_1$ & free & n.latent $\times$ 1 matrix of latent process means at first time point, T0.\\
MANIFESTMEANS & $\tau$ & free & n.manifest $\times$ 1 matrix of manifest intercepts.\\
MANIFESTVAR & $\Theta$ & free & diagonal matrix of var / cov between manifests, specified with standard deviations on diagonal and zeroes elsewhere.\\
DRIFT & $A$ & free & n.latent $\times$ n.latent matrix of continuous auto and cross effects.\\
CINT & $b$ & 0 & n.latent $\times$ 1 matrix of continuous intercepts.\\
DIFFUSION & $Q$ & free & lower triangular n.latent $\times$ n.latent matrix containing standard deviations of latent process on diagonal, and covariance related parameters on lower off-diagonals.\\
n.TDpred & \textit{l} & 0 & Number of time dependent predictors in the dataset.\\
TDpredNames & & TD1, TD2, etc & n.TDpred length character vector of time dependent predictor names.\\
TDPREDEFFECT & $M$ & free & n.latent $\times$ n.TDpred matrix of effects from time dependent predictors to latent processes.\\
n.TIpred & \textit{p} & 0 & Number of time independent predictors.\\
TIpredNames & & TI1, TI2, etc & n.TIpred length character vector of time independent predictor names.\\
PARS & & free & $0 \times 0$ Matrix containing additional parameters to be used, for instance in nonlinear models.\\
\end{tabular}
\end{table}
The \verb|pars| subobject of the created model object (in this case, \verb|model$pars| ) shows the parameter specification, including both fixed and free parameters, whether the parameters vary across individuals, how the parameter is transformed from a standard normal distribution (thus setting both priors and bounds), and whether that parameter is regressed on the time independent predictors.
<>=
head(model$pars,4)
@
By default, intercept related model parameters (T0MEANS, MANIFESTMEANS, CINT) are set to individually varying using a correlated random effects formulation, while other parameters are fixed across subjects. In contrast to this, when time independent predictors are included (time invariant covariates), the default (adjustable using the \verb|tipredDefault| argument to ctModel) that they affect all free subject level parameters. One may modify the output model to further restrict, or allow, between subject differences (set some parameters to not vary over individuals), alter the transformation used to set the prior / bounds, or restrict which effects of time independent predictors to estimate -- see the section on adjusting model specification for details on this.
Priors are not included by default but for those who wish to use Bayesian estimation approaches, they can be switched on during model fitting using the \verb|priors=TRUE| argument. For the default of maximum likelihood the information in this text regarding priors is obviously less relevant, however some awareness may be useful as a) priors are always used during an initial rough estimation, and b) although the prior probability is not relevant, some parameters (such as standard deviations) rely on transformations to ensure they stay within a particular range.
\subsection{Model fitting}
Once model specification is complete, the model is fit to the data using the \verb|ctStanFit| function as shown in the following example. Various options are available for fitting, from a maximum likelihood approach using optimization, to a fully Bayesian approach via Stan's Hamiltonian Monte Carlo sampler. For these examples we will use optimization, though for cases with many random effects on non-intercept parameters, or missing time independent predictors, optimization may not always provide the most appropriate results and should be interpreted with care.
For a classic maximum likelihood approach, optimization may be used with the argument (set by default) \verb|priors=FALSE| to disable the priors. By default, when optimization concludes, samples are drawn from the parameter covariance matrix implied by the Hessian to allow for inference and uncertainty plotting on all aspects of the model (e.g., parameter interactions such as the implied regression effect for a given time interval). Improvements on this maximum a posteriori, Hessian based sampling approach can be made by requesting importance sampling (see the \verb|optimcontrol| argument of the \verb|?ctStanFit| help, which is somewhat slower but generally still faster than full sampling using Stan's Hamiltonian Monte Carlo (HMC) -- the caveats mentioned above still apply however, and importance sampling may not work well with many parameters. When importance sampling is ineffective, this will generally be indicated by errors during the sampling process and possibly an inability to converge. Stan's HMC is in general the most robust solution, and can be obtained by setting \verb|optimize=FALSE| and \verb|priors=TRUE|. Note that if random effects are specified when optimization is used, between subject differences are integrated over. Estimates of the subject specific parameters can be obtained via the \verb|ctStanKalman| function.
When \verb|optimize=FALSE|, Stan's Hamiltonian Monte Carlo sampler is used. Depending on the data, model, and number of iterations requested, this can take anywhere from a few minutes to days. Important arguments when using this approach are \verb|chains| and \verb|iterations| which allow specification of the number of sampling chains, and iterations per chain. Three chains is a reasonable minimum to support convergence checking, and current experience suggests 300 iterations is often enough to get an idea of what is going on -- more is very likely necessary for robust inference, but how much more is highly dependent on the data and model. To examine progress while sampling, the \verb|plot=TRUE| argument may be used. In some cases, extended warmup / adaptation times can be avoided (at cost of reduced computational efficiency during the actual sampling) by reducing the maximum treedepth using the \verb|control| argument.
%For the sake of speed for this example we only sample for 300 iterations, with a max treedepth of the Hamiltonian sampler reduced from the default of 10 to 6. With these settings the fit should take only a minute or two, but is unlikely adequate for inference!. Those that wish to try out the functions without waiting, can simply use the already existing \verb|ctstantestfit| object with the relevant functions (the code for this is commented out). The dataset specified here is built-in to the ctsem package, and available whenever ctsem is loaded in R.
<>=
suppressWarnings(fit<-ctStanFit(datalong = ctstantestdat,
ctstanmodel = model,optimize=TRUE,cores=1,
# savescores=TRUE,savesubjectmatrices = TRUE,
priors=TRUE))
@
<>=
fit<-ctStanFit(datalong = ctstantestdat, ctstanmodel = model, priors=TRUE)
@
%The plot argument allows for plotting of sampling chains in real time, which is useful for slow models to ensure that sampling is proceeding in a functional manner. Models with many parameters (e.g., many subjects and all parameters varying over subject) may be too taxing for the plotting function to handle smoothly - we have had success with up to around 4000 parameters.
\subsection{Summary}
After fitting, the summary function may be used on the fit object, which returns details regarding the population mean parameters, population standard deviation parameters, population correlations, and the effect parameters of time independent predictors.
Additionally, the summary function outputs a range of matrices regarding correlations between subject level parameters. \verb|rawpopcorr_means| reports the posterior mean of the correlation between raw (not yet transformed from the standard normal scale) parameters. \verb|rawpopcorr_sd| reports the standard deviation of these parameters.
<