j.rittmo@gmail.com

r.d.mcintosh@ed.ac.uk

Abstract

Comparison of single cases to populations estimated from a sample has many potential applications. Historically, one major application is in the field of neuropsychology, where single-case statistics may be used to infer whether an individual has suffered a significant cognitive deficit as the consequence of a brain lesion. One may wish to estimate whether that individual has abnormally low performance on some cognitive ability, or if one cognitive ability is abnormally discrepant with respect to another cognitive ability. Several statistical methods have been developed to test for abnormality on a single variate and abnormality of the difference between two variates when a single case is compared to a sample, without losing control of the Type I error rate. This paper describes some of the main methods and presents a package in which they are implemented.

There are many reasons why researchers and clinicians might want to look at single cases instead of at the average of some group. In certain fields, such as neuropsychology, this need arises because the pattern of naturally-occurring brain damage will be unique in each individual case. From a theoretical perspective, this means that a single patient might be the only available source of data for a given phenomenon. From a practical, clinical perspective, diagnosis and description of the pattern of cognitive impairment is done at the individual level. Individual brain-damaged patients are thus often compared to the healthy population to assess changes in cognitive functioning. If we want to assess the patient score on some variate Y, for which we do not know the population parameters, these must be estimated from a sample. Thus, the single-case of interest is compared to a control sample. There are many other areas where the application of such methods could also be useful: for example studies of uncommon human expertise, targeted quality checks in industries with limited production output, or animal studies where rearing a large experimental group might be infeasible.

As it represents the canonical field for the application of these methods, the
nomenclature of neuropsychology will here be adopted. An abnormally low score on
a single variate will be referred to as a *deficit*, an important concept
for clinical and basic neuropsychology alike. For the latter area another
concept is also considered to be of cardinal importance: the ability to test for
*dissociation*, which is taken to provide evidence for some degree of
functional independence between two cognitive abilities. By charting
dissociations, a cognitive architecture of the mind can be theorized
(Shallice 1988).

During the last 20 years methods have been developed to estimate and test for
both deficits and dissociations in the single case, while controlling the Type I
error rate, mainly by John Crawford and Paul Garthwaite
(e.g., Crawford and Howell 1998; Crawford, Garthwaite, and Ryan 2011; Crawford and Garthwaite 2007, 2002, 2005).
They are available as standalone computer programs, only taking summary data as
input, at https://homepages.abdn.ac.uk/j.crawford/pages/dept/psychom.htm.
But the majority of these methods have not yet been implemented in any standard
statistical environment. By doing so in the package `singcar`

for the `R`

environment (R Core Team 2020), our aim
is to encourage and simplify their usage. Further, limiting Type II errors has
not received as much attention as limiting Type I errors in single-case
methodology (McIntosh and Rittmo 2020). By including novel tools
for power calculations, our hope is to increase awareness of the inherently low
power in this field as well as to aid in the planning and design of experiments.

The `R`

package `singcar`

contains seven functions to estimate a case’s
abnormality compared to a normal population estimated from a small sample, three of them
with regards to a single variate and four with regards to the discrepancy between two variates.
Both frequentist and Bayesian methods are provided, all developed originally by
Crawford and colleagues (Crawford and Howell 1998; Crawford, Garthwaite, and Ryan 2011; Crawford and Garthwaite 2007, 2002, 2005).
Of special note for psychological research are the
methods allowing the inclusion of covariates (Crawford, Garthwaite, and Ryan 2011)
using Bayesian regression techniques (Section 2.3.3). These methods make matching the control
sample to the case less cumbersome.

In Section 2 the implemented methods are described in detail and in Section 3 the package is described and usage exemplified with a dataset from a recent neuropsychological single case study.

In the neuropsychological application of the methods to be presented, the variates of interest will be scores obtained by participants on tasks assessing relevant cognitive functions. The variates will thus often be referred to as task scores. Historically, it was not uncommon for researchers to compare single cases against a control population estimated from a sample by evaluating the case score as a \(Z\) score from the estimated distribution. The \(p\) value associated with this \(Z\) score would then be treated as an estimate of the case’s abnormality. A similar logic was sometimes applied for estimating the abnormality of the discrepancy between two tasks, using a method developed by Payne and Jones (1957): researchers calculated a \(Z\) score based on the case’s difference between two standardised variates divided by the standard deviation of the difference.

The \(Z\) score approach is of course problematic because it treats parameter estimations from a restricted control sample as if they were population parameters. This could be appropriate with very large samples, but control samples in neuropsychology are often small (sometimes < 10), and it is well known that the sampling distribution of the sample variance is right skewed for small sample sizes. Hence, underestimation of variance would be more probable than overestimation, inflating obtained \(Z\) scores and thus Type I errors if the Z scores were used for hypothesis testing (Crawford and Howell 1998).

The following sections describe methods that have been devised to allow for the evaluation of the abnormality of a case against a restricted control sample, whilst retaining appropriate control over the Type I error rate. These methods include frequentist approaches (Section 2.2) and Bayesian approaches (Section 2.3).

An appropriate method for comparing a single observation to the mean of a sample was proposed within the biological sciences by Sokal and Rohlf (1981) (p. 227). It was popularized within neuropsychology by Crawford and Howell (1998), where its common application was as a (one-tailed) test of deficit (TD) to determine whether the score of a single case with brain damage was abnormally low (assuming that a low score indicates poorer performance) with respect to the mean score of a control sample. The \(t\) distribution is used to account for the underestimation of the sample variance. The basic approach is a modified two samples \(t\) test where the case simply is treated as a sample of size 1. The degrees of freedom for this distribution is \(n + 1 - 2 = n - 1\).

\[\begin{equation} t_{n-1} = \frac{y^* - \overline{y}}{s \sqrt{\frac{n + 1}{n}}} \tag{2.1} \label{eq:TD} \end{equation}\]

Where \(y^*\) is the case score, \(\overline{y}, \ s\) the sample mean and sample standard deviation respectively and \(n\) the size of the control sample. This method does not allow estimation of a notional patient population. However, since the question posed when using this test is about the probability of the case being part of the control population, the potential alternative affinity of the case is of no concern (Crawford and Howell 1998). This test of deficit provides transparent control of the Type I error rate Crawford and Garthwaite (2012). Moreover, the \(p\) value obtained constitutes an unbiased point estimate of the abnormality of the case, as demonstrated by Crawford and Garthwaite (2006). The simple proof for this is given below.

The proportion of controls that would score lower on a random variable \(Y\) than the case \(y^*\) is \[ \mathbb{P}[Y< y^*] \] Subtracting \(\overline{y}\) from both sides of the inequality and dividing by \(s \sqrt{\frac{n + 1}{n}}\) \[ \mathbb{P}[Y< y^*] =\mathbb{P}\left[\frac{Y-\overline{y}}{s \sqrt{\frac{n + 1}{n}}} < \frac{y^* - \overline{y}}{s \sqrt{\frac{n + 1}{n}}}\right] \] The quantity to the left of the inequality, i.e., \(\frac{y-\overline{y}}{s \sqrt{\frac{n + 1}{n}}}\) is \(t\) distributed with \(n-1\) degrees of freedom. Hence, \[ \mathbb{P}[Y< y^*] =\mathbb{P}\left[t_{n-1} < \frac{y^* - \overline{y}}{s \sqrt{\frac{n + 1}{n}}}\right] \] The quantity to the right of the inequality is the test statistic from Equation (2.1), hence \(\mathbb{P}[y< y^*]\) is the same as the \(p\) value obtained from the test of deficit. This fact also makes the construction of confidence intervals of this abnormality possible.

Crawford, Garthwaite, and Porter (2010) pointed out that, although the \(Z\) score is not appropriate for estimating the abnormality of a case when the control sample is small, it does provide a standardised effect size measure of abnormality, similar to Cohen’s d (Cohen 1988). Insensitivity to sample size is indeed a requirement for an effect size index and the proposed quantity was: \[\begin{equation} Z_{CC} = \frac{y^* - \overline{y}}{s} \tag{2.2} \label{eq:zcc} \end{equation}\] This is simply a \(Z\) score calculated from sample estimates. The subscript (CC) indicates that it relates to a “case-controls” comparison. The construction of confidence intervals for the point estimate of abnormality using the \(Z_{CC}\) index of effect size was described by Crawford and Garthwaite (2002).

Put \(p\) as the percentage of the population that would fall below a case score, then the \(1-\frac{\alpha}{2}\) confidence interval for \(p\) is constructed as follows: Let \(Z_{CC} = \frac{y^*-\overline{y}}{s}\) and \(y^* \neq \overline{y}\) then \(Z_{CC}\) comes from a non-central \(t\) distribution on \(n-1\) degrees of freedom. By deploying a search algorithm we find the value \(\delta_U\) for the non-centrality parameter (NCP) of a non-central \(t\) distribution on \(n-1\) degrees of freedom such that the \(100\frac{\alpha}{2}\) percentile equates \(Z_{CC}\sqrt{n}\) and similarly we find the NCP \(\delta_L\) of a non-central \(t\) distribution such that its \(100(1-\frac{\alpha}{2})\) percentile equates \(Z_{CC}\sqrt{n}\). The upper and lower boundaries for \(Z_{CC}\) are then given by: \[ Z_{CC_U} = \frac{\delta_U}{n}, \ \ Z_{CC_L} = \frac{\delta_L}{n} \] and the boundaries for \(p\) by: \[ p_U = \Phi\left(\frac{\delta_U}{n}\right), \ p_L = \Phi\left(\frac{\delta_L}{n}\right) \] Where \(\Phi\) is the CDF of the standard normal distribution. Note that the above equation is appropriate when the case score falls on the left side of the distribution. If it were to fall on the right side, then the upper and lower boundaries would be given by: \[ p_U = 1 - \Phi\left(\frac{\delta_L}{n}\right), \ p_L = 1 - \Phi\left(\frac{\delta_U}{n}\right) \]

As has been shown, estimating abnormality on a single variate is a simple matter. For estimating abnormality of discrepancy, when the normally distributed variates \(Y_1\) and \(Y_2\) can be compared without standardisation, the approach is equally simple and in fact identical to the test of deficit, Equation (2.1), except that the statistic is based on difference scores, taking the correlation between the variates into account (\(\rho_{12}\)). \[\begin{equation} t_{n-1} = \frac{(y^*_1 - \overline{y}_1) - (y^* _2 - \overline{y}_2) }{ \sqrt{(s^2_1 +s^2_2 -2s_1 s_2 \rho_{12})(\frac{n+1}{n})}} \tag{2.3} \label{eq:UDT} \end{equation}\] Construction of confidence intervals for abnormality estimates based on this unstandardised difference test (UDT) is done in an analogous way to that of the confidence intervals of the TD (Crawford and Garthwaite 2005). However, the test is somewhat limited in its usefulness, because it is only applicable if the two variates are measured on equivalent scales. Far more common, at least in neuropsychology, is the need to assess discrepancies between differently-scaled variates, which require standardisation to be comparable.

By standardising the variates (not taking sample size into account) we get an effect size measure similar to \(Z_{CC}\), Equation (2.2) (Crawford, Garthwaite, and Porter 2010): \[\begin{equation} Z_{DCC} = \frac{z^*_1 - z^*_2}{\sqrt{2-2\rho_{12}}} \tag{2.4} \label{eq:PJ} \end{equation}\] Where \(z_1^*\) and \(z_2^*\) are the standardised case scores on some task A and some task B and the subscript (DCC) indicates “discrepancy-case-controls.” This quantity was proposed by Payne and Jones (1957) as a significance test for discrepancies, but of course it is not appropriate for such a purpose if small samples are used: we cannot use the \(Z\) distribution for estimating the abnormality of the discrepancy for the same reason we cannot use \(Z_{CC}\) to test for a deficit.

When the case scores on variates \(Y_1\) and \(Y_2\) are estimated from small samples and they need to be standardised, it means that instead of estimating the discrepancy between two normally distributed variates we need to estimate the discrepancy between two \(t\) distributed variates. Because linear combinations of correlated \(t\) distributed random variates are not themselves \(t\) distributed this problem is non-trivial. The distribution of such difference scores was examined by Garthwaite and Crawford (2004). They used asymptotic expansion to find functions of the sample correlation that, if used as a denominator to the difference between the variates, would yield an asymptotically \(t\) distributed quantity. They found:

\[\begin{equation} \psi=\frac{\frac{(y^*_1-\overline{y}_1)}{s_{1}}-\frac{(y^*_2-\overline{y}_2)}{s_{2}}}{ \sqrt{ (\frac{n+1}{n}) \left( (2-2 \rho)+ \frac{2(1-\rho^{2})}{n-1}+ \frac{(5+c^{2})(1-\rho^{2})}{2(n-1)^{2}}+ \frac{\rho(1+c^{2})(1-\rho^{2})}{2(n-1)^{2}}\right) }} \end{equation}\]

Where \(\rho\) is the sample correlation between the tasks and \(c\) the critical two-tailed \(t\) value with \(n-1\) degrees of freedom. Garthwaite and Crawford (2004) demonstrated that \(\mathbb{P}[ \psi > c] \approx \mathbb{P}[t >c]\). To obtain a precise probability for \(\psi\), one solves for \(\psi = c\), which gives a quantity not dependent on a pre-specified critical value. \(\psi = c\) is a quadratic equation in \(c^2\), choosing the positive root of which yields:

\[\begin{align} \begin{split} c & = \sqrt{\frac{ -b + \sqrt{b^2 - 4ad}}{2a}}, \ \text{where} \\ a & = (1+r)(1-r^2), \\ b & = (1-r)[4(n-1)^2+4(1+r)(n-1)+(1+r)(5+r)], \\ d & = - 2\left[\frac{y^*_{1} - \overline{y}_1}{s_1}-\frac{y^*_2 -\overline{y}_2}{s_2}\right]^2\left(\frac{n(n-1)^2}{n+1}\right) \end{split} \tag{2.5} \label{eq:RSDT} \end{align}\]

Where \(p = \mathbb{P}[t_{n-1}>c]\) is the estimate of abnormality, and is used for significance testing. It should be noted that because \(c\) is quadratic and we choose the positive root, the resultant statistic cannot be negative. If it is required that the test statistic expresses the direction of a negative effect the correct sign must be imposed.

The quantity in Equation (2.5) is referred to as the “revised standardised difference test” . The name stems from it being preceded by a test similar to that of the unstandardised difference test (Equation (2.3)) but with standardised normal variates, developed by Crawford, Howell, and Garthwaite (1998). Crawford and Garthwaite (2005) show with Monte Carlo simulations that the RSDT is superior in controlling Type I errors compared to their earlier \(t\) test and the \(Z\) score method of Payne and Jones (1957).

Even for very small sample sizes of \(n=5\), RSDT was shown to barely exceed the specified 5% error rate. However, if a case lacks a true discrepancy between the variates but exhibits extreme scores on both of them (in the same direction), the error rate of the RSDT increases steeply. Crawford and Garthwaite (2007) showed that the RSDT starts to lose control of the error for task scores being more extreme than two standard deviations away from the mean on both variates, without them exhibiting a discrepancy. For task scores at 8 standard deviations from the mean, the Type I error rate of the RSDT was inflated to nearly 35%.

Another major drawback of the RSDT is that it has proved difficult to construct confidence limits on the point estimate of abnormality due to \(\psi\) only being approximately \(t\) distributed. To remedy this and to be able to provide an exact rather than just approximate estimate of abnormality Crawford and colleagues started looking into Bayesian methodology.

In the frequentist framework, parameters are treated as fixed attributes of a population and their estimations are thought to converge to the true value across a series of trials. In the Bayesian framework, by contrast, parameters are treated as random variables with associated probability distributions. To estimate a parameter distribution, we can use any prior knowledge of that parameter to assign probabilities to possible values of the parameter, forming what is known as a prior distribution, or simply a prior. If no prior information is available we may want to use a non-informative prior, the most simple of which would assign equal probabilities to all possible parameter values. The prior is updated when new information is obtained to form what is called a posterior distribution. The posterior probability of a hypothesis (i.e., a specified value of the parameter) is calculated by using Bayes theorem. If we disregard the marginal probability of the data in Bayes theorem it can be rewritten as: \[\begin{equation*} posterior \ \propto \ likelihood \times prior \end{equation*}\]

What this is saying is that the posterior density of a hypothesis is proportional (\(\propto\)) to the likelihood of the data under that hypothesis times the prior probability of the hypothesis.

In many cases it is not feasible to calculate the posterior analytically and instead Monte Carlo methods are often used. These methods are mathematical algorithms that solve numerical problems by repeated sampling from distributions. The algorithms used differ depending on the problem at hand, but in general they are all building on rules of drawing random numbers based on the likelihood and the prior, and observing the distribution formed after a large number of iterations. The peak of such a distribution is often taken as an estimation of the parameter of interest and when using non-informative priors this often also corresponds to the maximum likelihood estimate. Hence, using a non-informative prior often yields estimations with frequentist properties, a feature that is useful if hypothesis testing is desired.

This was a requirement when Crawford and Garthwaite (2007) and
Crawford, Garthwaite, and Ryan (2011) developed Bayesian versions of the tests
described in Section 2.2, now implemented in `singcar`

. The
following sections describe the procedural details of these methods which are
all based on Monte Carlo simulations. The steps presented closely follow those
outlined in the original papers with just a few slight changes of notation.

The Bayesian test of deficit (BTD) allows us to obtain an estimate of \(p\) (and accompanying credible intervals), which is the proportion of controls that would obtain a value more extreme than the case on the variate of interest.

Assume a sample of \(n\) controls on which we measure some value \(y\) that is normally distributed with unknown mean \(\mu\) and unknown variance \(\sigma^2\). Let \(\overline{y}\) and \(s^2\) denote the sample mean and sample variance respectively and \(y^*\) the case score. Because \(\mu\) and \(\sigma^2\) are unknown, the prior distribution of \(\mu\) is conditioned on the prior distribution of \(\sigma^2\). A non-informative prior \(\mu | \sigma^2 \sim \mathcal{N}(0, \ \infty)\) is assumed. For the posterior we have that the marginal distribution of \(\frac{(n-1)s^2}{\sigma^2} \sim \chi^2_{n-1}\) and the posterior distribution for \(\mu|\sigma^2 \sim \mathcal{N}(\overline{y}, \sigma^2/n)\), see e.g., Gelman et al. (2013) (p. 45 and 65). To estimate \(p\), the following steps are iterated:

Let \(\psi\) be a random draw from a \(\chi^2\)-distribution on \(n-1 \ df\). Then let \(\hat{\sigma}^2_{(i)} = \frac{(n-1)s^2}{\psi}\) be the estimation of \(\sigma^2\) for this iteration, hence the subscript \((i)\).

Let \(z\) be a random draw from a standard normal distribution. Then let \(\hat{\mu}_{(i)}=\overline{y}+z\sqrt{(\hat{\sigma}_{(i)}^2/n)}\) be the estimate of \(\mu\) for this iteration.

With estimates of \(\mu\) and \(\sigma\), \(p\) is calculated conditional on these estimates being the “correct” \(\mu\) and \(\sigma\). Let \(z^*_{(i)}= \frac{y^* - \hat{\mu}_{(i)}}{\sqrt{\hat{\sigma}_{(i)}^2}}\) be the standardised case score then \(\hat{p}_{(i)} =\Phi\left(z^*_{(i)}\right)\) or \(\hat{p}_{(i)} = 1-\Phi\left(z^*_{(i)}\right)\) depending on alternative hypothesis, is the estimate of \(p\) for this iteration. That is the probability of drawing a value more extreme than \(z^*_{(i)}\) from a standard normal distribution.

Repeating these steps a large number of times will yield a distribution of \(\hat{p}\), the mean of which is taken as the point estimate of \(p\) and used for hypothesis testing. Multiplied by 100 it gives a point estimate of the percentage of the control population expected to exhibit a more extreme score than the case. If repeated e.g., 1000 times, the 25th smallest and 25th largest \(\hat{p}_{(i)}\) would give the lower and upper boundaries of the 95% credible interval for \(p\). Similarly, the 25th smallest and 25th largest values of \(z^*_{(i)}\) would give the lower and upper boundaries of the 95% credible interval for \(Z_{CC}\) the point estimate of which is that of Equation (2.2). Crawford and Garthwaite (2007) show that this method yields converging results to that of the frequentist test of deficit, Equation (2.1).

The Bayesian standardised difference test (BSDT) follow a similar procedure to that of BTD. However, we now assume a sample of \(n\) controls from which we obtain values on the variates \(Y_1\) and \(Y_2\) following a bivariate normal distribution and representing task A and task B. Let \(\overline{y}_1\) and \(\overline{y}_2\) denote the sample means and \[\begin{equation*} \pmb{A}=\begin{bmatrix} s^2_{1} & s_{12} \\ s_{12} & s^2_{2} \end{bmatrix} \end{equation*}\] the sample variance-covariance matrix, where \(s_{12}\) is the sample covariance, then \(\pmb{S} =\pmb{A}(n-1)\) is the sums of squares and cross products (SSCP) matrix. It is assumed that the observations come from a bivariate normal distribution with unknown mean \(\pmb{\mu}\) and unknown variance \(\Sigma\):

\[\begin{equation*} \pmb{\mu} = \begin{pmatrix} \mu_1 \\ \mu_2 \end{pmatrix} \ \text{and} \ \Sigma=\begin{bmatrix} \sigma^2_{1} & \sigma_{12} \\ \sigma_{12} & \sigma^2_{2} \end{bmatrix} \end{equation*}\]

Let the case scores be denoted \(y_1^*\) and \(y_2^*\). Just as for the frequentist dissociation tests we want to estimate the proportion \(p\) of the control population that would exhibit a greater difference \(Y_1-Y_2\) than the case’s \(y_1^*-y_2^*\).

The multivariate generalization of the \(\chi^2\)-distribution is the Wishart distribution. That is, a Wishart distribution of 1 dimension is a \(\chi^2\)-distribution on \(n\) degrees of freedom. One can view the Wishart distribution parameterised with \(n\) degrees of freedom and some variance-covariance matrix as being a distribution of SSCP-matrices related to that variance-covariance matrix. Similarly, one can view the inverse Wishart distribution parameterised with the same degrees of freedom and a SSCP-matrix as being a distribution of variance-covariance matrices related to that SSCP-matrix.

The non-informative prior for \(f(\pmb\mu, \Sigma^{-1})\) chosen in Crawford and Garthwaite (2007) was \(f(\pmb\mu, \Sigma^{-1}) \propto |\Sigma|\) in favour of the perhaps more commonly used \(f(\pmb\mu, \Sigma^{-1}) \propto |\Sigma|^{(k+1)/2}\) (\(k =\) number of variates) because, for unstandardised differences, it was shown to yield identical interval estimates to the frequentist UDT. The posterior marginal distribution of \(\Sigma^{-1}\) for this choice of prior is a Wishart distribution with \(n\) degrees of freedom and scale matrix \(\pmb{S}^{-1}\). The conditional distribution of \(\pmb\mu|\Sigma\) is then a multivariate normal distribution with mean \([\overline{y}_1 \ \overline{y}_2]\) and variance \(\Sigma/n\), see e.g., Gelman et al. (2013) (p. 72-73). Crawford and Garthwaite (2007) refer to this as the “standard theory” prior.

Using the standard theory prior produces good frequentist properties for \(\sigma_1\) and \(\sigma_2\), but Berger and Sun (2008) have shown that the convergence to frequentist estimates is less good for \(\rho\) and differences in means (\(\mu_1 - \mu_2\)). Instead, they recommend that \(f(\pmb\mu, \Sigma^{-1}) \propto \frac{1}{\sigma_1\sigma_2(1-\rho^2)}\) should be used as a general purpose prior for bivariate normal data. To construct posteriors with this prior rejection sampling is used. Random observations are drawn from an inverse Wishart on \(n-1\) degrees of freedom and only a subset of the draws are accepted. Crawford, Garthwaite, and Ryan (2011) noticed however that this prior gave rise to too narrow credible intervals and, with simulations, showed that if the sample size was treated as \(n-1\) for estimations of \(\Sigma\), so that the intervals were somewhat conservative, the frequentist properties of their estimations were improved. Crawford, Garthwaite, and Ryan (2011) recommend this and refer to it as the “calibrated” prior. The procedure for sampling from the posterior using this prior is given below:

Since the unbiased estimate of \(\Sigma\) is \(\pmb{S}/(n-1)\), we put \(\pmb{S}^*= (n-2)\pmb{S}/(n-1)\) to retain this property when reducing the sample size.

Generate an observation from an inverse Wishart distribution on \(n-2\) degrees of freedom and with scale matrix \(\pmb{S}^*\). Denote this: \[ \hat{\Sigma} = \begin{bmatrix} \hat{\sigma}^2_{1} & \hat{\sigma}_{12} \\ \hat{\sigma}_{12} & \hat{\sigma}^2_{2} \end{bmatrix} \] and put \[ \hat{\rho}= \frac{\hat{\sigma}_{12}}{\sqrt{\hat{\sigma}^2_{1}\hat{\sigma}^2_{2}}} \]

Generate a value \(u\) from a uniform distribution such that \(u \sim U(0, 1)\). If \(u^2 \leq 1-\hat{\rho}^2\) we accept \(\hat\Sigma\) as the estimation of \(\Sigma\) for this iteration and denote it \[\begin{equation*} \hat{\Sigma}_{(i)}=\begin{bmatrix} \hat{\sigma}^2_{1(i)} & \hat{\sigma}_{12(i)} \\ \hat{\sigma}_{12(i)} & \hat{\sigma}^2_{2(i)} \end{bmatrix} \end{equation*}\] otherwise we iterate the procedure until we have an accepted \(\hat{\Sigma}\).

Bayesians might argue that good frequentist properties are not necessary when conducting Bayesian analyses. If so, one can use the “standard theory” prior by generating a random observation from an inverse Wishart distribution on \(n\) degrees of freedom and scale matrix \(\pmb{S}\) and denote it \(\hat{\Sigma}_{(i)}\). With an estimate of \(\Sigma\) (regardless of the prior chosen), follow the steps below to obtain an estimate of \(p\), that is the percentage of the control population expected to show a more extreme score than the case.

Let \(z_{r1}\) and \(z_{r2}\) be two random draws from a standard normal distribution. Perform Cholesky decomposition on \(\hat{\Sigma}_{(i)}\), that is finding the lower triangular matrix \(\pmb{T}\) such that \(\pmb{T}\pmb{T'}=\hat{\Sigma}_{(i)}\). Then \[\begin{equation*} \pmb{\hat{\mu}}_{(i)} = \begin{pmatrix} \hat{\mu}_{1(i)} \\ \hat{\mu}_{2(i)} \end{pmatrix} = \begin{pmatrix} \overline{y}_1 \\ \overline{y}_2 \end{pmatrix}+ \pmb{T} \begin{pmatrix} z_{r1} \\ z_{r2} \end{pmatrix} / \sqrt{n} \end{equation*}\] is the estimation of \(\pmb{\mu}\) for this iteration.

With estimations of \(\pmb{\mu}\) and \(\Sigma\) we can calculate \(p\), given that they are the the correct values of \(\pmb{\mu}\) and \(\Sigma\). If an unstandardised test is required put: \[\begin{equation*} z_{(i)}^* = \frac{(y_1^* - \hat{\mu}_{1(i)}) - (y^*_2 - \hat{\mu}_{2(i)})} {\sqrt{\hat{\sigma}^2_{1(i)}+\hat{\sigma}^2_{2(i)}-2\hat{\sigma}_{12(i)}}} \end{equation*}\] If a standardised test is required, put: \[\begin{equation*} z_{1(i)} = \frac{y_1^* - \hat{\mu}_{1(i)}}{\sqrt{\hat{\sigma}^2_{1(i)}}}, \ z_{2(i)} = \frac{y_2^* - \hat{\mu}_{2(i)}}{\sqrt{\hat{\sigma}^2_{2(i)}}}, \ \hat{\rho}_{(i)} = \frac{\hat{\sigma}_{12(i)}}{\sqrt{\hat{\sigma}_{1(i)}\hat{\sigma}_{2(i)}}} \end{equation*}\] and \[\begin{equation*} z^*_{(i)} = \frac{z_{1(i)} - z_{2(i)}}{\sqrt{2-2\hat{\rho}_{(i)}}} \end{equation*}\]

Let \(\hat{p}_{(i)}\) be the tail area of a standard normal distribution less or greater than \(z^*_{(i)}\) (depending on alternative hypothesis). \(\hat{p}_{(i)}\) is then the estimate of \(p\) for this iteration.

Repeating these steps a large number of times will yield a distribution of \(\hat{p}\), the mean of which is taken as the point estimate of \(p\) and used for hypothesis testing. Multiplied by 100 it gives the point estimate of the percentage of controls expected to exhibit a more extreme task difference than the case. If repeated e.g., 1000 times, the 25th smallest and 25th largest \(\hat{p}_i\) is the lower and upper boundaries of the 95% credible interval for \(p\). Similarly, the 25th smallest and 25th largest values of \(z^*_{(i)}\) gives the lower and upper boundaries of the 95% credible interval for \(Z_{DCC}\), the point estimate of which is that of Equation (2.4).

Crawford, Garthwaite, and Ryan (2011) extended the previously described Bayesian tests by using Bayesian regression techniques that allowed the case’s abnormality to be assessed in the presence of covariates. These tests thus allow you to compare the case’s score on the task of interest conditioned on the results of the controls having the same score as the case on the covariate(s). If a case has 15 years of education, his/her score on the task would be compared to the controls with equal length of education.

In addition to improving the precision of these tests, by accounting for spurious noise, this also facilitates the collection of larger control samples, because it reduces the need to very closely match control samples to a single case. Moreover, it means that the same control sample may be used for the evaluation of multiple single-cases, since the matching on relevant covariates of interest can be achieved statistically. One degree of freedom is lost for each covariate included, so as a rule of thumb they should be included only if they have a sample correlation with the variate(s) of interest stronger than \(0.3\) (Crawford, Garthwaite, and Ryan 2011). Of course, the control sample should also ideally bracket the case on the covariates, in order to avoid extrapolating “out of sample.” As for the earlier tests, the assumption of the variates of interest are still that they follow a normal or bivariate normal distribution. However, no assumptions are made about the distribution of the covariates.

Suppose we have data from a sample of size \(n\), with scores on \(m\) covariates and \(k = 1, 2\) variates of interest, depending on whether we are testing for a deficit or a discrepancy. We denote the covariates \(\boldsymbol{X} = (X_1, \dots, X_m)\).

From these values we wish to estimate \[ \boldsymbol{B} = \begin{bmatrix} \boldsymbol{\beta}_1 & \cdots & \boldsymbol{\beta}_k \end{bmatrix} \] Where \(\boldsymbol{\beta}_i\) is a vector of length \(m+1\) containing regression coefficients for each covariate on the \(i\)th variate of interest, the first element in \(\boldsymbol{\beta}_i\) being the intercept. We also wish to estimate \[ \Sigma \ | \ \boldsymbol{X} = \begin{bmatrix} \sigma^2_1 \ | \ \boldsymbol{X} & \rho\sigma_1\sigma_2 \ | \ \boldsymbol{X} \\ \rho\sigma_1\sigma_2 \ | \ \boldsymbol{X} & \sigma^2_2 \ | \ \boldsymbol{X} \end{bmatrix} \] That is, the covariance matrix of the variates of interest conditioned upon the covariates, for task \(Y_1\) and task \(Y_2\) i.e., when \(k=2\). If we wish to test for a deficit, that is \(k = 1\) then \(\Sigma \ | \ \boldsymbol{X}\) is a \(1\times1\) matrix containing the conditional variance of the variate of interest. We assume \(\Sigma\) not to vary with the covariates.

The procedure will be outlined in the general case, that is for \(k = 1\) or \(k = 2\). The main difference between testing for abnormality on a single variate and testing for abnormal discrepancy between two variates is the recommended specification of the prior.

Let \(\boldsymbol{X}\) be the \(n \times m+1\) design matrix on which we regress \(\boldsymbol{Y}\), the \(n \times k\) response matrix. \[ \boldsymbol{X} = \begin{bmatrix} 1 & x_{11} & \cdots & x_{1m} \\ \vdots & \vdots & \ddots & \vdots \\ 1 & x_{n1} & \cdots & x_{nm} \end{bmatrix},\ \ \boldsymbol{Y} = \begin{bmatrix} y_{11} & \cdots & y_{1k} \\ \vdots & \ddots & \vdots \\ y_{n1} & \cdots & y_{nk} \end{bmatrix} \] The data estimates of \(\boldsymbol{B}\) and \(\Sigma\) are then \[ \boldsymbol{B}^* = (\boldsymbol{X}'\boldsymbol{X})^{-1}\boldsymbol{X}'\boldsymbol{Y} \ \text{and} \ \Sigma^* = \frac{1}{n-m-1}(\boldsymbol{Y}-\boldsymbol{X}\boldsymbol{B}^*)'(\boldsymbol{Y}-\boldsymbol{X}\boldsymbol{B}^*) \] For the “standard theory” prior the posterior of \(\Sigma\) is an inverse Wishart distribution with \(df = n - m\) when \(k=2\) and \(df=n-m-1\) when \(k = 1\) Tiao and Zellner (1964) and scale matrix \((n-m-1)\Sigma^*\). For each iteration we generate an estimate of \(\Sigma\) from this distribution. For the “calibrated” prior (only used for \(k=2\)) the steps described in 2.3.2 are followed and we use \(\pmb{S}^* = (n-m-2)\pmb{S}/(n-m-1)\) as the scale matrix when we sample from an inverse Wishart distribution on \((n-m-2)\) degrees of freedom, where \(\pmb{S} = (n-m-1)\Sigma^*\). The generated observation from either method is denoted \(\hat{\Sigma}_{(i)}\). We have

\[\begin{equation} \hat{\Sigma}_{(i)}=[\hat{s}^2_{(i)}] \ \text{when} \ k=1 \ \text{and} \ \hat{\Sigma}_{(i)} = \begin{bmatrix} \hat{s}^2_{1(i)} & \hat{s}_{12(i)} \\ \hat{s}_{12(i)} & \hat{s}^2_{2(i)} \end{bmatrix} \ \text{when} \ k=2 \ \tag{2.6} \label{eq:sigma} \end{equation}\]

\(\boldsymbol{B}^*\) will be an \((m+1) \times k\) matrix. We turn this into a \(k(m + 1) \times 1\) vector by concatenating the colums of regression coefficients in \(\boldsymbol{B}^*\), such that \(\boldsymbol{B}^*_{\text{vec}}=(\boldsymbol{\beta}^{*'}_1, ...,\boldsymbol{\beta}^{*'}_k)'\). Then this vector is the data estimate for \(\boldsymbol{B}_{\text{vec}}=(\boldsymbol{\beta}'_1, ...,\boldsymbol{\beta}'_k)'\). Take the kronecker product \(\boldsymbol{\hat\Sigma}_{(i)} \otimes (\boldsymbol{X'X})^{-1}\) and denote this \(\boldsymbol{\Lambda}_{(i)}\). Then the posterior of \(\boldsymbol{B}_{\text{vec}}|\boldsymbol{\hat\Sigma}_{(i)}\) is a multivariate normal distribution with mean vector \(\boldsymbol{B}^*_{\text{vec}}\) and variance-covariance matrix \(\boldsymbol{\Lambda}_{(i)}\). Draw a random value from this distribution such that \(\hat{\boldsymbol{B}}^*_{\text{vec(i)}} =(\hat{\boldsymbol{\beta}}'_{1(i)}, ...,\hat{\boldsymbol{\beta}}'_{k(i)})'\) and \(\hat{\boldsymbol{B}}_{(i)} = (\hat{\boldsymbol{\beta}}_{1(i)}, ...,\hat{\boldsymbol{\beta}}_{k(i)})\), where each \(\hat{\boldsymbol{\beta}}_{j(i)}\) is a vector of length \(m+1\) and \(\hat{\boldsymbol{B}}_{(i)}\) an \(m+1 \times k\) matrix.

Let \(\boldsymbol{x}^*\) be a vector of the case’s values on the covariates. The conditional values for the case on the tasks of interest is then \[ \hat{\boldsymbol{\mu}}_{(i)} = \hat{\boldsymbol{B}}_{(i)}\boldsymbol{x}^* \]

We now estimate the effect size of the case using the conditional estimations derived above. Depending on whether we are testing for a deficit or a discrepancy (i.e., whether \(k=1\) or \(k=2\)) the calculations of the effect sizes differ. Crawford, Garthwaite, and Ryan (2011) call these effect sizes \(Z_{CCC}\) and \(Z_{DCCC}\) for a deficit and a discrepancy respectively. They are similar to \(Z_{CC}\) (Equation (2.2)) and \(Z_{DCC}\) (Equation (2.4)), however, the extra \(C\) in the subscript indicates that they are conditional on covariates. Denote \(y^*_1\) and \(y^*_2\) as the case’s scores on the two variates of interest. Given that we want to estimate deficiency of a case on \(Y_1\), we calculate \[\begin{equation} \hat{Z}_{CCC(i)} = \frac{y^*_1-\hat{\mu}_{(i)}}{\hat{s}^2_{(i)}} \tag{2.7} \label{eq:zccc} \end{equation}\] For \(Z_{DCCC}\) we have two conditional means which will be denoted \(\hat{\mu}_{1(i)}\) and \(\hat{\mu}_{2(i)}\) for \(Y_1\) and \(Y_2\) respectively. We then calculate \[\begin{equation} \hat{Z}_{DCCC(i)} = \frac{\frac{y^*_1-\hat{\mu}_{1(i)}}{\hat{s}_{1(i)}}-\frac{y^*_2-\hat{\mu}_{2(i)}} {\hat{s}_{2(i)}}}{\sqrt{2-2\hat{\rho}_{12(i)}}} \tag{2.8} \label{eq:zdccc} \end{equation}\] Where \(\hat{s}_{1(i)}\) and \(\hat{s}_{2(i)}\) are conditional standard deviations obtained from \(\hat{\Sigma}_{(i)}\) in Equation (2.6). The conditional correlation between \(Y_1\) and \(Y_2\) in the denominator is given by \[\hat{\rho}_{12(i)} = \frac{\hat{s}_{12(i)}}{\sqrt{\hat{s}^2_{1(i)}\hat{s}^2_{2(i)}}}\]

We then find the tail area under the standard normal distribution that is less or greater than \(\hat{Z}_{CCC(i)}\) or \(\hat{Z}_{DCCC(i)}\) depending on the problem at hand and the alternative hypothesis specified. Denote the value obtained \(\hat{p}_{(i)}\) which then is an estimate of \(p\). If testing for a deficit we would have: \[ \hat{p}_{(i)}= \Phi(\hat{Z}_{CCC(i)}) \]

Iterating these steps a large number of times will yield a distribution of \(\hat{p}\), the mean of which is taken as the point estimate of \(p\). This estimate is used for significance testing and if multiplied by 100 it will give an estimate of the percentage of controls expected to exhibit a more extreme score than the case. If repeated e.g., 1000 times, the 25th smallest and 25th largest \(\hat{p}_{(i)}\) is the lower and upper boundaries of the 95% credible interval for \(p\).

To obtain a point estimate of \(Z_{CCC}\) and \(Z_{DCCC}\) we use Equation (2.7) and (2.8), but use the conditional means, standard deviations and correlation calculated directly from the control sample. The \(1-\alpha\) credible intervals for these effect sizes are given by the \(\alpha/2\) and \(1-\alpha/2\) quantiles of the sampled distribution of \(Z_{CCC}\) or \(Z_{DCCC}\). That is, just as for \(p\), if repeated e.g., 1000 times, the 25th smallest and 25th largest value of \(\hat{Z}_{CCC(i)}\) or \(\hat{Z}_{DCCC(i)}\) is the lower and upper boundaries of the 95% credible interval for \(Z_{CCC}\) or \(Z_{DCCC}\).

`singcar`

packageName | Function | Source | Reference |
---|---|---|---|

Test of deficit | `TD()` |
Crawford and Howell (1998) | Equation (2.1) |

Bayesian test of deficit | `BTD()` |
Crawford and Garthwaite (2007) | Section 2.3.1 |

Bayesian test of deficit with covariates | `BTD_cov()` |
Crawford, Garthwaite, and Ryan (2011) | Section 2.3.3 |

Unstandardised difference test | `UDT()` |
Crawford and Garthwaite (2005) | Equation (2.3) |

Revised standardised difference test | `RSDT()` |
Crawford and Garthwaite (2005) | Equation (2.5) |

Bayesian standardised difference test | `BSDT()` |
Crawford and Garthwaite (2007) | Section 2.3.2 |

Bayesian standardised difference test with covariates | `BSDT_cov()` |
Crawford, Garthwaite, and Ryan (2011) | Section 2.3.3 |

To facilitate meta-analyses of studies using case-control comparisons,
all functions in the table above can take both summary statistics as
input as well as raw data. The output will be a list set to class `"htest"`

,
for which the generic function `print`

have a method. To use the package,
begin by installing and loading it:

```
install.packages("singcar")
library("singcar")
```

The package comes with the dataset `size_weight_illusion`

, a
neuropsychological dataset from an investigation of the size-weight illusion in
DF, a patient with visual form agnosia following bilateral lesions to the
lateral occipital complex (Hassan et al. 2020). The size-weight illusion
is a perceptual phenomenon in which smaller objects are perceived as heavier
during lifting than larger objects of equal weight (Buckingham 2014).
The illusion implies that sensory cues about object size affect the perception
of weight. It has been suggested that patient DF does not experience this
illusion in the same way as the healthy population when only visual cues about
object size are available (Dijkerman et al. 2004). In contrast, when kinaesthetic (tactile) cues are
provided it is suggested that DF’s experience of the illusion is unaffected
by her brain damage. In other words, patient DF was expected to have a deficit
in the visual size-weight illusion and exhibit an abnormally large discrepancy
(dissociation) between visual and kinaesthetic size-weight illusions. The
dataset consists of data from patient DF and 28 control participants, with the
variables sex, age and visual as well as kinaesthetic size-weight illusion. The
measure of the size-weight illusion is a scaled measure expressing the number of
grams weight difference perceived per cubic cm of volume change. Below follows
examples of how to analyse this dataset using the tests provided in
`singcar`

.

`head(size_weight_illusion)`

```
## GROUP PPT SEX YRS V_SWI K_SWI
## 1 SC DF Female 65 0.02814921 0.10012712
## 2 HC E01 Female 70 0.21692634 0.21792930
## 3 HC E02 Male 64 0.17223226 0.26338899
## 4 HC E03 Female 63 0.07138049 0.09331700
## 5 HC E04 Female 65 0.10186453 0.25938045
## 6 HC E05 Female 66 0.15911439 0.08922615
```

The simplest way to test for an abnormality on a single variate is to use the frequentist test of deficit. Start by extracting patient (patient DF is the first observation) and control data from the relevant variate, in this case the visual size-weight illusion:

```
<- size_weight_illusion[1, "V_SWI"]
PAT_VSWI <- size_weight_illusion[-1, "V_SWI"] CON_VSWI
```

Using the function `TD()`

we then apply the formula in Equation (2.1).
The argument `conf_int_spec`

specifies how fine grained the search
algorithm for the confidence interval should be. The arguments `sd`

and `sample_size`

can be given if the test should be based on summary
statistics rather than raw data, the `controls`

argument should then be the
mean of the control sample.

```
TD(case = PAT_VSWI,
controls = CON_VSWI,
sd = NULL,
sample_size = NULL,
alternative = "less",
conf_int = TRUE,
conf_level = 0.95,
conf_int_spec = 0.01,
na.rm = FALSE)
```

```
##
## Test of Deficit
##
## data: Case = 0.03, Controls (m = 0.16, sd = 0.08, n = 28)
## t = -1.7243, df = 27, p-value = 0.04804
## alternative hypothesis: true diff. between case and controls is less than 0
## sample estimates:
## Std. case score (Z-CC), 95% CI [-2.34, -1.15]
## -1.754857
## Proportion below case (%), 95% CI [0.95, 12.47]
## 4.804003
```

This can similarly be tested with the Bayesian analogue which has a very similar syntax.
This test yields an output that converges to that of TD as the argument for the
number of iterations (`iter`

) increase. The degrees of freedom shown in the
output below is the degrees of freedom for the \(\chi^2\) distribution from which
we sample \(\psi\), as described in Section 2.3.1.

```
set.seed(42)
BTD(case = PAT_VSWI,
controls = CON_VSWI,
sd = NULL,
sample_size = NULL,
alternative = "less",
int_level = 0.95,
iter = 10000,
na.rm = FALSE)
```

```
##
## Bayesian Test of Deficit
##
## data: Case = 0.03, Controls (m = 0.16, sd = 0.08, n = 28)
## df = 27, p-value = 0.04821
## alternative hypothesis: true diff. between case and controls is less than 0
## sample estimates:
## Std. case score (Z-CC), 95% CI [-2.35, -1.15]
## -1.754857
## Proportion below case (%), 95% CI [0.94, 12.60]
## 4.821283
```

If the control sample for a study is not appropriately matched to the case on variables such as, for example, age or education level it is appropriate to use tests that account for this by allowing for the inclusion of covariates. Including theoretically sound covariates is often a good idea. Crawford, Garthwaite, and Ryan (2011) recommends however to only include a covariate if it correlates \(\geq 0.3\) with the variate of interest, because of the loss of degrees of freedom.

The function `BTD_cov()`

allows for the inclusion of covariates and therefore to
assess the patient on the task of interest by essentially comparing him/her to
the controls with the same score on the covariate. Even though the correlation
between age and visual size-weight illusion is \(< 0.3\) it is included here as
a coviariate for demonstrative purposes. Start again by extracting the
scores on the covariate for the patient and for the control participants.

```
<- size_weight_illusion[1, "YRS"]
PAT_age <- size_weight_illusion[-1, "YRS"] CON_age
```

Since `BTD_cov()`

is somewhat computationally intense, the number of
iterations has been reduced in the example compared to `BTD()`

. For actual analysis the number
of iterations should be based on required precision. It should be noted that
there is no restriction on the number of covariates used as long as \(n > m+1\). If more than one
covariate is used the case scores should be given as a vector of values
and the control scores should be given as a data frame or matrix with the
same number of columns as the number of values in the covariate vector of the case.

If summary statistics are used instead of raw data, the argument `use_sumstats`

must be set to `TRUE`

and the correlation matrix for the covariates
and variate of interest must be given as well as the sample size. In addition,
the `control_covar`

argument must be supplied as an \(m \times 2\)
matrix or data frame giving the mean of the covariate(s) in the first column
and the standard deviation in the second. The degrees of freedom shown in the
output below is the degrees of freedom for the inverse Wishart distribution from which
we sample \(\psi\), as described in Section 2.3.3.

```
BTD_cov(case_task = PAT_VSWI,
case_covar = PAT_age,
control_task = CON_VSWI,
control_covar = CON_age,
alternative = "less",
int_level = 0.95,
iter = 1000,
use_sumstats = FALSE,
cor_mat = NULL,
sample_size = NULL)
```

```
##
## Bayesian Test of Deficit with Covariates
##
## data: Case = 0.03, Controls (m = 0.16, sd = 0.08, n = 28)
## df = 26, p-value = 0.05173
## alternative hypothesis: true diff. between case and controls is less than 0
## sample estimates:
## Std. case score (Z-CCC), 95% CI [-2.30, -1.10]
## -1.749556
## Proportion below case (%), 95% CI [1.08, 13.47]
## 5.172973
```

For assessing abnormal discrepancy between two variates the simplest
function to use is the unstandardised difference test, Equation (2.3).
This test is in `singcar`

called by the function `UDT()`

. However, one
should use this only if the variates are known to come from equivalent distributions.
Otherwise, tests that can evaluate standardised scores
without inflating Type I errors should be used. In the frequentist framework the
appropriate test for this is the RSDT, Equation (2.5). In this example we wish
to estimate and test the abnormality of the discrepancy between visual
and kinaesthetic size-weight illusion in patient DF. That is, we want to compare the difference
between the variates exhibited by the patient and the distribution of differences
in the healthy control sample. Again, start by extracting
the patient and control scores for the second variate of interest.

```
<- size_weight_illusion[1, "K_SWI"]
PAT_KSWI <- size_weight_illusion[-1, "K_SWI"] CON_KSWI
```

Using the function `RSDT()`

we then apply the formula in
Equation (2.5). This test is most often used two-sided due to the fact the the
sign of the discrepancy solely depends on the order of the input.
This function does, however, not provide any confidence intervals. The syntax of
`UDT()`

is very similar to that of `RSDT()`

, the main difference being
options for confidence intervals as shown for `TD()`

. If summary statistics
are used then the additional argument `r_ab`

, which is the sample
correlation, must be set as well as the sample size, standard deviation and mean
for both variates.

```
RSDT(case_a = PAT_VSWI,
case_b = PAT_KSWI,
controls_a = CON_VSWI,
controls_b = CON_KSWI,
sd_a = NULL,
sd_b = NULL,
sample_size = NULL,
r_ab = NULL,
alternative = "two.sided",
na.rm = FALSE)
```

```
##
## Revised Standardised Difference Test
##
## data: Case A: 0.03, B: 0.10, Ctrl. A (m, sd): (0.16, 0.08), B: (0.18, 0.10)
## approx. abs. t = 1.015, df = 27, p-value = 0.3191
## alternative hypothesis: true difference between tasks is not equal to 0
## sample estimates:
## Std. case score, task A (Z-CC) Std. case score, task B (Z-CC)
## -1.7548574 -0.7836956
## Std. task discrepancy (Z-DCC) Proportion below case (%)
## -1.0647889 15.9560625
```

The Bayesian analogue of this test is recommended over the RSDT
because it keeps a better control
over Type I errors when the case exhibits extreme deficits on both variates but
no discrepancy between them (Crawford and Garthwaite 2007).
The syntax of the two functions is similar, but the `BSDT()`

comes with more
optional arguments. For example, one has the option of applying this test
without standardising the variates by setting the argument `unstandardised`

to `TRUE`

. The output then converges to that of the frequentist UDT.
Furthermore, one can choose between priors. Setting the
argument `calibrated`

to `FALSE`

specifies the use of the “standard
theory” prior. If left to the default (`TRUE`

) an accept-reject algorithm
is deployed for each simulation of \(\Sigma\), as described in Section
2.3.2. This default prior has been shown to have better frequentist
properties for estimating \(\rho\) and differences between means in a bivariate
normal distributions (Berger and Sun 2008) and is therefore
the default and recommended choice.

```
BSDT(case_a = PAT_VSWI,
case_b = PAT_KSWI,
controls_a = CON_VSWI,
controls_b = CON_KSWI,
sd_a = NULL,
sd_b = NULL,
sample_size = NULL,
r_ab = NULL,
alternative = "two.sided",
int_level = 0.95,
iter = 10000,
unstandardised = FALSE,
calibrated = TRUE,
na.rm = FALSE)
```

```
##
## Bayesian Standardised Difference Test
##
## data: Case A: 0.03, B: 0.10, Ctrl. A (m, sd): (0.16,0.08), B: (0.18,0.10)
## df = 26, p-value = 0.3245
## alternative hypothesis: true difference between tasks is not equal to 0
## sample estimates:
## Std. case score, task A (Z-CC)
## -1.7548574
## Std. case score, task B (Z-CC)
## -0.7836956
## Std. discrepancy (Z-DCC), 95% CI [-1.69, -0.42]
## -1.0647889
## Proportion below case (%), 95% CI [4.51, 33.81]
## 16.2259126
```

If analysing discrepancies between variates in the presence of covariates the syntax is slightly
different, requiring that one specifies the case’s scores on the variates of
interest as a vector and the control scores as a data frame or matrix.
One has the option to choose between the “calibrated” and “standard theory”
prior, just as for `BSDT()`

, where `calibrated = TRUE`

is the
recommended and default behaviour. If using summary statistics
`use_sumstats`

must be set to `TRUE`

and the summary input should be
supplied to the arguments `control_tasks`

/`control_covar`

as data
frames or matrices with the means of each variable represented by the first
column and the standard deviations by the second. The case’s scores should
be supplied as vectors.

```
BSDT_cov(case_tasks = c(PAT_VSWI, PAT_KSWI),
case_covar = PAT_age,
control_tasks = cbind(CON_VSWI, CON_KSWI),
control_covar = CON_age,
alternative = "two.sided",
int_level = 0.95,
calibrated = TRUE,
iter = 1000,
use_sumstats = FALSE,
cor_mat = NULL,
sample_size = NULL)
```

```
##
## Bayesian Standardised Difference Test with Covariates
##
## data: Case A = 0.03, B = 0.10, Ctrl. (m, sd) A: (0.16,0.08), B: (0.18,0.10)
## df = 25, p-value = 0.3383
## alternative hypothesis: true difference between tasks is not equal to 0
## sample estimates:
## Std. case score, task A (Z-CC)
## -1.754857
## Std. case score, task B (Z-CC)
## -0.783696
## Std. discrepancy (Z-DCCC), 95% CI [-1.60, -0.38]
## -1.064152
## Proportion below case (%), 95% CI [5.53, 35.32]
## 16.920000
```

A further capacity of `singcar`

is that it can be used to calculate
statistical power of the tests. The notion of power when comparing cases to
control samples have been somewhat overlooked for this class of statistical tests.
In recent work (McIntosh and Rittmo 2020), we argued that,
even though power is inherently limited in this paradigm, a priori calculations are
still useful for study design and interpretation in neuropsychological and other applications.
Calculating power for the test of deficit is similar to calculating power for
any \(t\) test and can be done analytically.
\[\begin{equation} power = 1 - \beta =
T_{n-1}\left(t_{\alpha, \ n-1} \Bigg\rvert \frac{x^* - \overline{x}}{\sigma
\sqrt{\frac{n+1}{n}}}\right)
\tag{3.1}
\label{eq:TDpower}
\end{equation}\]
Where \(T_{n-1}(.\rvert \theta)\) is the cumulative distribution function for the
non-central \(t\) distribution with \(n-1\) degrees of freedom and non-centrality
parameter \(\frac{y^* - \overline{y}}{\sigma \sqrt{\frac{n+1}{n}}}\) (i.e., TD, Equation
(2.1) and \(t_{\alpha, \ n-1}\) is the \(\alpha\) quantile of the
\(t\) distribution on \(n-1\) degrees of freedom (note that this is for a one-sided
test). For the unstandardised difference test power is calculated in an
analogous way by putting Equation (2.3) as the non-centrality parameter. Deriving
power for the other functions in an analytic manner is however not possible (the
RSDT is only approximately \(t\) distributed) and a Monte Carlo approach has been
used for these tests. To call any power calculator in the package one simply
uses the function names with `_power`

added as a suffix.

So, for example, to calculate power for the test of deficit we call `TD_power()`

.
The expected case score and either sample size or desired power must
be supplied. The mean and standard deviation of the control sample
can also be specified with the arguments `mean`

and `sd`

.
If not, they take the default values of 0 and 1 respectively so
that the case score is interpreted as distance from the mean
in standard deviations. A conventional \(\alpha\)-level of
\(0.05\) is assumed if nothing else is supplied. The alternative
hypothesis can also be specified by the argument `alternative`

:
specify `"less"`

(default) or `"greater"`

for a one-tailed test, specify
`"two.sided"`

for a two-tailed test.

```
TD_power(case = 70,
mean = 100,
sd = 15,
sample_size = 16,
power = NULL,
alternative = "less",
alpha = 0.05,
spec = 0.005)
```

`## [1] 0.5819579`

`TD_power()`

can also be used to calculate required sample size for a
desired level of power. For example, if we specify a desired power level of 0.6,
leave `sample_size`

to its default and let the rest of the arguments
be as in the previous example, we see from the output of the function that
power will not increase more than 0.5% for any additional participant after a sample
size of 15. That is, the algorithm stops searching
when this level of specificity has been reached and we are nearing the
asymptotic maximum power for this effect size. We can increase the specificity
by lowering the `spec`

argument.

```
TD_power(case = 70,
mean = 100,
sd = 15,
sample_size = NULL,
power = 0.6,
alternative = "less",
alpha = 0.05,
spec = 0.005)
```

`## Power (0.578) will not increase more than 0.5% per participant for n > 15`

```
## n power
## 1 15 0.5780555
```

Power calculators for the Bayesian tests of deficit cannot calculate
required sample size. This is because they rely on simulation methods
to estimate approximate power and deploying a search algorithm to find the required sample
size for a given level of power would be computationally too intense. The syntax
is otherwise relatively similar to that of `TD_power()`

. For `BTD_power()`

we have the two extra arguments `nsim`

and `iter`

, indicating the number
of simulations used in the power function and by `BTD()`

, respectively.

```
BTD_power(case = 70,
mean = 100,
sd = 15,
sample_size = 15,
alternative = "less",
alpha = 0.05,
nsim = 1000,
iter = 1000)
```

`## [1] 0.593`

The only difference in syntax of `BTD_cov_power()`

is due to the inclusion of covariates.
The variate of interest must be specified as a vector where the first element
gives the mean and the second the standard deviation in the argument `control_task`

.
The covariates can be specified similarly or as an \(m \times 2\) matrix where the first
column gives the means of each covariate and the second column gives the standard
deviations. The correlation matrix of the variates must be given as well. In the example
below, power is evaluated for a test taking two covariates into account, both with a mean
of 0 and a standard deviation of 1. The correlation is specified as a \(3\times 3\)
matrix with pairwise correlations of \(0.3\). The default settings include only one
covariate having a \(0.3\) correlation with the variate of interest.
This function is computationally intense and hence, the number
of simulations has, for the example below, been decreased.

```
<- matrix(c(0, 1,
covars 0, 1), ncol = 2, byrow = TRUE)
BTD_cov_power(case = -2,
case_cov = c(0.2, -0.6),
control_task = c(0, 1),
control_covar = covars,
cor_mat = diag(3) + 0.3 - diag(c(0.3, 0.3, 0.3)),
sample_size = 15,
alternative = "less",
alpha = 0.05,
nsim = 100,
iter = 100)
```

`## [1] 0.5`

For the difference tests one must supply the expected case scores on both
variates as well as sample size. The means and standard deviations of the
control sample can also be specified. If unspecified, they take on the default values of
0 and 1 respectively, so that the expected case scores are interpreted as
distances from the means in standard deviations. `RSDT_power()`

,
`BSDT_power()`

and `UDT_power()`

additionally require an estimate of
the sample correlation between the variates of interest, `r_ab`

. If this is
not specified a correlation of 0.5 is assumed by default. For
`BSDT_cov_power()`

the correlation matrix between the variates of interest
and the covariates must instead be supplied (i.e., at least a \(3\times3\) matrix
where the first correlation is the correlation between the variates of
interest).

The alternative hypothesis is by default assumed to be
`"two.sided"`

since the direction of the effect is dependent on the
order of the inputs, but can be specified to be `"less"`

or
`"greater"`

as well. The syntax is similar for all three functions but with small
differences. For `UDT_power()`

one can request required sample size for a
desired power, as for `TD_power()`

. Calculators for the Bayesian tests
have the extra argument `calibrated`

as to be able to specify the prior.
`BSDT_cov_power()`

requires input in the same format as `BTD_cov_power()`

for both `control_tasks`

and `control_covar`

. The two examples
below demonstrate usage for `RSDT_power()`

and `BSDT_cov_power()`

.

```
RSDT_power(case_a = 70,
case_b = 55,
mean_a = 100,
mean_b = 50,
sd_a = 15,
sd_b = 10,
r_ab = 0.5,
sample_size = 15,
alternative = "two.sided",
alpha = 0.05,
nsim = 1000)
```

`## [1] 0.607`

```
<- matrix(c(1, 0.5, 0.6,
cor_mat 0.5, 1, 0.3,
0.6, 0.3, 1), ncol = 3, byrow = TRUE)
BSDT_cov_power(case_tasks = c(70, 55),
case_cov = 65,
control_tasks = matrix(c(100, 15,
50, 10), ncol = 2, byrow = TRUE),
control_covar = c(50, 25),
cor_mat = cor_mat,
sample_size = 15,
alternative = "two.sided",
alpha = 0.05,
nsim = 100,
iter = 100,
calibrated = TRUE)
```

`## [1] 0.74`

The `singcar`

package for`R`

has been outlined and
its main functionalities described. The package consists of methods for
estimating abnormality of a single case when compared to a population estimated
by a small sample. Methods for estimating abnormality on a single variate have been
described as well as methods for estimating abnormality of the difference
between two variates. Historically, the use of these tests has mainly been
within the field of neuropsychology, but their potential applicability is far
wider, extending to other areas of psychology and perhaps even to completely new
fields.

In neuropsychology the methods developed by John Crawford and Paul Garthwaite
are frequently used, especially the test of deficit, Equation (2.1), and
the revised standardised difference test, Equation (2.5). However, the
Bayesian standardised difference test, which outperforms the RSDT regarding
control of Type I errors, has not gained as much traction. Neither have the more
flexible Bayesian methods allowing for the inclusion of covariates.
It is hoped that by providing them in a documented package for a
popular language such as`R`

, they will receive further uptake.

The methods described have been developed for keeping transparent control over Type I errors, but power calculators have been implemented in the package as well. Consideration of power can assist researchers in study design and in setting realistic expectations for what these types of statistical hypothesis tests can achieve (McIntosh and Rittmo 2020).

Berger, James O., and Dongchu Sun. 2008. “Objective Priors for the Bivariate Normal Model.” *The Annals of Statistics* 36 (2): 963–82. https://www.jstor.org/stable/25464652.

Buckingham, Gavin. 2014. “Getting a Grip on Heaviness Perception: A Review of Weight Illusions and Their Probable Causes.” *Experimental Brain Research* 232 (6): 1623–29. https://doi.org/10.1007/s00221-014-3926-9.

Cohen, J. 1988. *Statistical Power Analysis for the Behavioral Sciences*. Lawrence Erlbaum Associates.

Crawford, John, and Paul Garthwaite. 2002. “Investigation of the Single Case in Neuropsychology: Confidence Limits on the Abnormality of Test Scores and Test Score Differences.” *Neuropsychologia* 40 (8): 1196–1208. https://doi.org/10.1016/S0028-3932(01)00224-X.

———. 2005. “Testing for Suspected Impairments and Dissociations in Single-Case Studies in Neuropsychology: Evaluation of Alternatives Using Monte Carlo Simulations and Revised Tests for Dissociations.” *Neuropsychology* 19 (3): 318–31. https://doi.org/10.1037/0894-4105.19.3.318.

———. 2006. “Methods of Testing for a Deficit in Single-Case Studies: Evaluation of Statistical Power by Monte Carlo Simulation.” *Cognitive Neuropsychology* 23 (6): 877–904. https://doi.org/10.1080/02643290500538372.

———. 2007. “Comparison of a Single Case to a Control or Normative Sample in Neuropsychology: Development of a Bayesian Approach.” *Cognitive Neuropsychology* 24 (4): 343–72. https://doi.org/10.1080/02643290701290146.

———. 2012. “Single-Case Research in Neuropsychology: A Comparison of Five Forms of t-Test for Comparing a Case to Controls.” *Cortex* 48 (8): 1009–16. https://doi.org/10.1016/j.cortex.2011.06.021.

Crawford, John, Paul Garthwaite, and D Howell. 2009. “On Comparing a Single Case with a Control Sample: An Alternative Perspective.” *Neuropsychologia* 47 (13): 2690–95. https://doi.org/10.1016/j.neuropsychologia.2009.04.011.

Crawford, John, Paul Garthwaite, D Howell, and C Gray. 2004. “Inferential Methods for Comparing a Single Case with a Control Sample: Modified t-Tests Versus Mycroft Et Al.’s (2002) Modified Anova.” *Cognitive Neuropsychology* 21 (7): 750–55. https://doi.org/10.1080/02643290342000276.

Crawford, John, Paul Garthwaite, and S Porter. 2010. “Point and Interval Estimates of Effect Sizes for the Case-Controls Design in Neuropsychology: Rationale, Methods, Implementations, and Proposed Reporting Standards.” *Cognitive Neuropsychology* 27 (3): 245–60. https://doi.org/10.1080/02643294.2010.513967.

Crawford, John, Paul Garthwaite, and K Ryan. 2011. “Comparing a Single Case to a Control Sample: Testing for Neuropsychological Deficits and Dissociations in the Presence of Covariates.” *Cortex* 47 (10): 1166–78. https://doi.org/10.1016/j.cortex.2011.02.017.

Crawford, John, and D Howell. 1998. “Comparing an Individual’s Test Score Against Norms Derived from Small Samples.” *The Clinical Neuropsychologist* 12 (4): 482–86. https://doi.org/10.1076/clin.12.4.482.7241.

Crawford, John, D Howell, and Paul Garthwaite. 1998. “Payne and Jones Revisited: Estimating the Abnormality of Test Score Differences Using a Modified Paired Samples t Test.” *Journal of Clinical and Experimental Neuropsychology* 20 (6): 898–905. https://doi.org/10.1076/jcen.20.6.898.1112.

Dijkerman, H.Chris, Sandra Lê, Jean-François Démonet, and A.David Milner. 2004. “Visuomotor Performance in a Patient with Visual Agnosia Due to an Early Lesion.” *Cognitive Brain Research* 20 (1): 12–25. https://doi.org/10.1016/j.cogbrainres.2003.12.007.

Garthwaite, Paul, and John Crawford. 2004. “The Distribution of the Difference Between Two t-Variates.” *Biometrika* 91 (4): 987–94.

Gelman, A., J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D. B. Rubin. 2013. *Bayesian Data Analysis, Third Edition*. Chapman & Hall/CRC Texts in Statistical Science. Taylor & Francis. https://books.google.se/books?id=ZXL6AQAAQBAJ.

Hassan, Eleanor K, Anna Sedda, Gavin Buckingham, and Robert D McIntosh. 2020. “The Size-Weight Illusion in Visual Form Agnosic Patient DF.” *Neurocase*, August, 1–8. https://doi.org/10.1080/13554794.2020.1800748.

McIntosh, Robert D., and Jonathan Ö. Rittmo. 2020. “Power Calculations in Single-Case Neuropsychology: A Practical Primer.” *Cortex* 135: 146–58. https://doi.org/10.1016/j.cortex.2020.11.005.

Payne, R. W., and H. G. Jones. 1957. “Statistics for the Investigation of Individual Cases.” *Journal of Clinical Psychology* 13 (2): 115–21.

R Core Team. 2020. *R: A Language and Environment for Statistical Computing*. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Shallice, Tim. 1988. *From Neuropsychology to Mental Structure*. Cambridge: Cambridge University Press.

Sokal, R. R., and F. J. Rohlf. 1981. *Biometry: The Principles and Practice of Statistics in Biological Research*. W. H. Freeman. https://books.google.co.uk/books?id=C-OTQgAACAAJ.

Tiao, George C., and Arnold Zellner. 1964. “On the Bayesian Estimation of Multivariate Regression.” *Journal of the Royal Statistical Society: Series B (Methodological)* 26 (2): 277–85. https://doi.org/https://doi.org/10.1111/j.2517-6161.1964.tb00560.x.