`ciTools`

```
library(tidyverse)
library(ciTools)
library(MASS)
library(arm)
set.seed(20171102)
```

In this vignette we will discuss the current ability of `ciTools`

to handle generalized linear models. Small simulations will be provided in addition to examples that show how to use `ciTools`

to quantify uncertainty in fitted values. Primarily we focus on the Logistic and Poisson models, but `ciTools`

’s method for handling GLMs is not limited to these models.

Note that the *binomial logistic* regression is handled in a separate vignette.

Generalized linear models are extensions of linear models that seek to accommodate certain types of non-linear relationships. The manner in which the non-linearity is addressed also allows users to perform inferences on data that are not strictly continuous. GLMs are the most common model type that allow for a non-linear relationship between the response variable \(y\) and covariates \(X\). Recall that linear regression directly predicts a continuous response from the linear predictor, \(X \beta\). A GLM extends this linear prediction scheme. A GLM consists of:

A linear predictor \(X \beta\).

A monotonic and everywhere differentiable link function \(g\), which transforms the linear predictor: \(\hat{y} = g^{-1}(X \hat{\beta})\).

A response distribution: \(f(y|\mu)\) from the exponential family with expected value \(\mu = g^{-1} (X \beta)\).

Other components, such as over dispersion parameters and off-set terms are possible, but not common to all GLMs. The most common GLMs in practice are the Logistic model (Bernoulli response with logit link) and the Poisson model with log link function. These are detailed below for convenience.

Logistic Regression: \[ \begin{equation} \begin{split} & y|X \sim \mathrm{Binomial}(1, p) \\ & g(p) = \log \left( \frac{p}{1-p} \right) = X\beta \\ & \mathbb{E}[y|X] = p = \frac{\exp(X \beta)}{ 1 + \exp(X \beta)} \end{split} \end{equation} \]

Poisson Regression with the \(\log\) link function: \[ \begin{equation} \begin{split} & y|X \sim \mathrm{Poisson}(\lambda) \\ & g(\lambda) = \log \left( \lambda \right) = X\beta \\ & \mathbb{E}[y|X] = \lambda = \exp(X \beta) \end{split} \end{equation} \]

Due to the variety of options available, fitting generalized linear models is more complicated than fitting linear models. In **R**, `glm`

is the starting point for handling GLM fits, and is currently the only GLM fitting function that is supported by `ciTools`

. We can use `ciTools`

in tandem with `glm`

to fit and analyze logistic, Poisson, quasi-Poisson, gamma, Gaussian and certain other models.

`ciTools`

methods for GLMsUnlike linear models, interval estimates pertaining to GLMs generally do not have clean, parametric forms. This is problematic because from a computational point of view we would prefer a solution that is fast and relatively simple. Parametric interval estimates are available in certain cases, and wherever available, `ciTools`

will choose to implement them by default. Below we detail precisely which computations `ciTools`

performs when one of the core functions (`add_ci`

, `add_pi`

, `add_probs`

, `add_quantile`

) is called on an object of class `glm`

.

For any model fit by `glm`

, `add_ci()`

may compute confidence intervals for predictions using either a parametric method or a bootstrap. The parametric method computes confidence intervals on the scale of the linear predictor \(X \beta\) and transforms the intervals to the response level through the inverse link function \(g^{-1}\). Confidence intervals on the linear predictor level are computed using a Normal distribution for Logistic and Poisson regressions or a \(t\) distribution otherwise. (This is consistent with the default behavior for the `predict.glm`

function.) The intervals are given by the following expressions:

\[ \begin{equation} g^{-1}\left(x'\hat{\beta} \pm z_{1 - \alpha/2} \sqrt{\hat{\sigma}^2x'(X'X)^{-1} x}\right) \end{equation} \]

for Binomial and Poisson GLMs or

\[ \begin{equation} \label{eq:glmci} g^{-1}\left(x'\hat{\beta} \pm t_{1 - \alpha/2, n-p-1} \sqrt{\hat{\sigma}^2x'(X'X)^{-1} x}\right) \end{equation} \]

for other generalized linear models. In these expressions, we regard \(X\) as the model matrix from the original fit, and \(x\) as the “new data” matrix. The default method is parametric and is called with `add_ci(data, fit, ...)`

. This is the method we generally recommend for constructing confidence intervals for model predictions.

The bootstrap method is called with `add_ci(tb, fit, type = "boot", ...)`

and was included originally for making comparisons against the parametric method. There are multiple methods of bootstrap for regression models (resampling cases, resampling residuals, parametric, etc.). The bootstrap method employed by `ciTools`

in `add_ci.glm()`

resamples cases and iteratively refits the model (using the default behavior of `boot::boot`

) to determine confidence intervals. After collecting the bootstrap replicates, a bias-corrected and accelerated (BCa) bootstrap confidence interval is formed for each point in the sample `tb`

.

Although there are several methods for computing bootstrap confidence intervals, we don’t provide options to compute all of these types of intervals in `ciTools`

. BCa intervals are slightly larger than parametric intervals, but are less biased than other types of bootstrapped intervals, including percentile based intervals. We may consider adding more types of bootstrap intervals to `ciTools`

in the future.

For comparison, we show an example of the confidence intervals for the probability estimates of a Logistic regression model.

```
x <- rnorm(100, mean = 5)
y <- rbinom(n = 100, size = 1, prob = invlogit(-20 + 4*x))
df <- data.frame(x = x, y = y)
fit <- glm(y ~ x, family = binomial)
```

We use `ciTools`

to compute the two types of confidence intervals, then we stack the dataframes together.

```
df1 <- add_ci(df, fit, names = c("lwr", "upr"), alpha = 0.1) %>%
mutate(type = "parametric")
df2 <- add_ci(df, fit, type = "boot", names = c("lwr", "upr"), alpha = 0.1, nSims = 500) %>%
mutate(type = "bootstrap")
df <- bind_rows(df1, df2)
```

```
ggplot(df, aes(x = x, y = y)) +
ggtitle("Logistic Regression", subtitle = "Model fit (black line) and confidence intervals (gray)") +
geom_jitter(height = 0.01) +
geom_ribbon(aes(ymin = lwr, ymax = upr), alpha = 0.4) +
geom_line(aes(x =x , y = pred), size = 2) +
facet_grid(~type)
```

Our two confidence interval methods mostly agree, but the bootstrap method produces slightly wider intervals.

Another perspective on the difference between these two interval calculation methods is shown below. It’s fairly clear that the BCa intervals (red) indeed exhibit little bias, but are not as tight as the parametric intervals (purple). This is expected behavior because the bootstrap confidence intervals don’t “know” about the model assumptions. In practice, if the sample size is small or the model is false, these interval estimates may exhibit more disagreement.

If the sample size increases, we will find that the two estimates increasingly agree and converge to \(0\) in width. Note that we do not calculate prediction intervals for \(y\) when \(y\) is Bernoulli distributed because the support of \(y\) is exactly \(\{0,1\}\).

Generally, parametric prediction intervals for GLMs are not available. The solution `ciTools`

takes is to perform a parametric bootstrap on the model fit, then take quantiles on the bootstrapped data produced for each observation. The procedure is performed via `arm::sim`

. The method of the parametric bootstrap is described by the following algorithm:

Fit the GLM, and collect the regression statistics \(\hat{\beta}\) and \(\hat{\mathrm{Cov}}(\hat{\beta})\). Set the number of simulations, \(M\).

Simulate \(M\) draws of the regression coefficients, \(\hat{\beta}_{*}\), from \(N(\hat{\beta}, \hat{\mathrm{Cov}}(\hat{\beta}))\), where \(\hat{\mathrm{Cov}}(\hat{\beta}) = \hat{\sigma}^2 (X'X)^{-1}\).

Simulate \([y_{*}|x]\) from the response distribution with mean \(g^{-1}(x \hat{\beta}_{*})\) and a variance determined by the response distribution.

Determine the \(\alpha/2\) and \(1-\alpha/2\) quantiles of the simulated response \([y_{*}|x]\) for each \(x\).

The parametric bootstrap method propagates the uncertainty in the regression effects \(\hat{\beta}\) into the simulated draws from the predictive distribution.

Generally, there are many different ways to calculate the quantiles of an empirical distribution, but the approach that `ciTools`

takes ensures that estimated quantiles lie in the support set of the response. The choice we make corresponds to setting `type = 1`

in `quantile()`

.

We have seen in our simulations that the parametric bootstrap provides interval estimates with approximately nominal probability coverage. The unfortunate side effect of opting to construct prediction intervals through a parametric bootstrap is that the parameters of the predictive distributions need to hard-coded for each model. For this reason, `ciTools`

does not have complete coverage of all models one could fit with `glm`

.

One exception to this scheme are GLMs with Gaussian errors. In this case we may parametrically calculate prediction intervals. Under Gaussian errors, `glm()`

permits the use of the link functions “identity”, “log”, and “inverse”. The corresponding models are given by the following expressions:

\[ \begin{equation} \label{eq:gauss-link} \begin{split} y = X\beta + \epsilon\\ y = \exp(X\beta) + \epsilon \\ y = \frac{1}{X\beta} + \epsilon \end{split} \end{equation} \]

The conditional response distribution in each case may be written \(y \sim N(g^{-1}(X\beta), \sigma^2)\), and prediction intervals may be computed parametrically. Parametric prediction intervals are therefore constructed via

\[ \begin{equation} \label{eq:gauss-pi} g^{-1}(x'\beta) \pm t_{1-\alpha/2, n-p-1} \sqrt{\hat{\sigma}^2 + \hat{\sigma}^2x'(X'X)^{-1}x} \end{equation} \]

for Gaussian GLMs. As in the linear model, \(\hat{\sigma}^2\) estimates the predictive uncertainty and \(\hat{\sigma}^2 x(X'X)^{-1}x\) estimates the inferential uncertainty in the fitted values. Note that \(g^{-1}(X \hat{\beta})\) is a maximum likelihood estimate for the parameter \(g^{-1}(X \beta)\), by the functional invariance of maximum likelihood estimation.

At this point, the only GLMs that we support with `add_pi()`

are Gaussian, Poisson, Binomial, Gamma, and quasi-Poisson (all of which need to be fit with `glm()`

). The models not supported by `add_pi.glm`

are inverse Gaussian, quasi, and quasi-binomial.

Poisson regression is usually the first line of defense against count data, so we wish to present a complete example of quantifying uncertainty for this type of model with `ciTools`

. For simplicity we fit a model on fake data.

We use `rnorm`

to generate a covariate, but the randomness of \(x\) has no bearing on the model.

```
x <- rnorm(100, mean = 0)
y <- rpois(n = 100, lambda = exp(1.5 + 0.5*x))
df <- data.frame(x = x, y = y)
fit <- glm(y ~ x , family = poisson(link = "log"))
```

As seen previously, the commands in `ciTools`

are “pipeable”. Here, we compute confidence and prediction intervals for a model fit at the \(90\%\) level. The warning message only serves to remind the user that precise quantiles cannot be formed for non-continuous distributions.

```
df_ints <- df %>%
add_ci(fit, names = c("lcb", "ucb"), alpha = 0.1) %>%
add_pi(fit, names = c("lpb", "upb"), alpha = 0.1, nSims = 20000) %>%
print()
```

```
## Warning in add_pi.glm(., fit, names = c("lpb", "upb"), alpha = 0.1, nSims
## = 20000): The response is not continuous, so Prediction Intervals are
## approximate
```

```
## # A tibble: 100 x 7
## x y pred lcb ucb lpb upb
## <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 -2.0519678 2 1.841790 1.512871 2.242220 0 4
## 2 -1.3503557 7 2.579160 2.217452 2.999868 0 6
## 3 0.3743030 2 5.901355 5.502938 6.328618 2 10
## 4 0.4393664 6 6.088538 5.681194 6.525089 2 10
## 5 0.2497295 2 5.558872 5.172883 5.973663 2 10
## 6 2.2621171 18 14.602662 12.696356 16.795192 9 22
## 7 0.7149323 10 6.949438 6.480422 7.452399 3 12
## 8 0.4745805 8 6.192311 5.779338 6.634794 2 10
## 9 1.8237586 7 11.832166 10.558956 13.258901 6 18
## 10 -1.4035391 1 2.514161 2.154300 2.934134 0 5
## # ... with 90 more rows
```

As with other methods available in `ciTools`

the requested statistics are computed, appended to the data frame, and returned to the user as a tibble.

```
df_ints %>%
ggplot(aes(x = x, y = y)) +
geom_point(size = 2) +
ggtitle("Poisson Regression", subtitle = "Model fit (black line), with prediction intervals (gray), confidence intervals (dark gray)") +
geom_line(aes(x = x, y = pred), size = 1.2) +
geom_ribbon(aes(ymin = lcb, ymax = ucb), alpha = 0.4) +
geom_ribbon(aes(ymin = lpb, ymax = upb), alpha = 0.2)
```

Since the response \(y\) is count data, and the method we used to determine the intervals involves simulation, we find that `ciTools`

will produce “jagged” bounds when all the intervals are plotted simultaneously. Increasing the number of simulations using the `nSims`

argument in `add_pi`

can help reduce some of this unsightliness.

We may also wish to compute response-level probabilities and quantiles. `ciTools`

can also handle these estimates with `add_probs()`

and `add_quantile()`

respectively. We use the same parametric bootstrap approach for estimating quantiles and probabilities that we employed for `add_pi()`

. Once again, an error message reminds the user that their support is not continuous.

```
df %>%
add_probs(fit, q = 10) %>%
add_quantile(fit, p = 0.4) %>%
print()
```

```
## Warning in add_probs.glm(., fit, q = 10): The response is not continuous,
## so estimated probabilities are only approximate
```

```
## Warning in add_quantile.glm(., fit, p = 0.4): The response is not
## continuous, so estimated quantiles are only approximate
```

```
## # A tibble: 100 x 5
## x y pred prob_less_than10 quantile0.4
## <dbl> <int> <dbl> <dbl> <dbl>
## 1 -2.0519678 2 1.841790 1.0000 1
## 2 -1.3503557 7 2.579160 0.9995 2
## 3 0.3743030 2 5.901355 0.9035 5
## 4 0.4393664 6 6.088538 0.9200 5
## 5 0.2497295 2 5.558872 0.9410 5
## 6 2.2621171 18 14.602662 0.0910 13
## 7 0.7149323 10 6.949438 0.8350 6
## 8 0.4745805 8 6.192311 0.9055 5
## 9 1.8237586 7 11.832166 0.2640 11
## 10 -1.4035391 1 2.514161 1.0000 2
## # ... with 90 more rows
```

A common problem with the Poisson model is the presence of over-dispersion. Recall that for the Poisson model, we require that the variance and mean agree, however this is practically a strict and unreasonable modeling assumption. A quasi-Poisson model is one remedy: it estimates an additional dispersion parameter and will provide a better fit. Under the quasi-Poisson assumption

\[ \mathbb{E}[y|X] = \mu = \exp (X \beta) \]

and

\[ \mathbb{V}\mathrm{ar}[y|X] = \phi \mu \]

Quasi models are not full maximum likelihood models, however it is possible to embed a quasi-Poisson in the negative binomial framework using

\[ \mathrm{QP}(\mu, \theta) = \mathrm{NegBin}(\mu, \theta = \frac{\mu}{\phi - 1}) \]

Where NegBin is the parameterization of the Negative Binomial distribution used by `glm.nb`

in the `MASS`

library. This model for the negative binomial distribution, a continuous mixture of Poisson random variables with gamma distributed means, is preferred over that classical parameterization in applications. The preference stems from the fact that it allows for non-integer-valued \(\theta\).

**Warning**: As in Gelman and Hill’s *Data Analysis using Regression and Multilevel/Hierarchical Model*, `ciTools`

does not simulate the uncertainty in the over-dispersion parameter \(\hat{\phi}\). According to our simulations, dropping this uncertainty from the parametric bootstrap has a negligible effect on the coverage probabilities. While the distribution of \(\hat{\phi}\) is asymptotically Normal, it is very likely that the finite sample estimator has a skewed distribution. Approximating this distribution for use in a parametric bootstrap is ongoing research. As it stands, the prediction intervals we form for over-dispersed models tend to be conservative.

Negative binomial regression (via `glm.nb`

) is implemented as a separate method in `ciTools`

, and is an alternative to quasi-Poisson regression. For more information on the difference between these two models, we recommend Jay Ver Hoef and Peter Boveng’s *Quasi-poisson vs. Negative Binomial Regression: How Should We Model Overdispersed Count Data?*

Again, we generate fake data. The dispersion parameter is set to \(5\) in the quasi-Poisson model.

```
x <- runif(n = 100, min = 0, max = 2)
mu <- exp(1 + x)
y <- rnegbin(n = 100, mu = mu, theta = mu/(5 - 1))
```

The data is over-dispersed:

```
df <- data.frame(x = x, y = y)
fit <- glm(y ~ x, family = quasipoisson(link = "log"))
summary(fit)$dispersion
```

`## [1] 4.663126`

But `ciTools`

can still construct appropriate interval estimates for the range of a new observation:

```
df_ints <- add_ci(df, fit, names = c("lcb", "ucb"), alpha = 0.05) %>%
add_pi(fit, names = c("lpb", "upb"), alpha = 0.1, nSims = 20000)
```

```
## Warning in add_pi.glm(., fit, names = c("lpb", "upb"), alpha = 0.1, nSims
## = 20000): The response is not continuous, so Prediction Intervals are
## approximate
```

```
ggplot(df_ints, aes(x = x, y = y)) +
ggtitle("Quasi-Poisson Regression", subtitle = "Model fit (black line), with Prediction intervals (gray), Confidence intervals (dark gray)") +
geom_point(size = 2) +
geom_line(aes(x = x, y = pred), size = 1.2) +
geom_ribbon(aes(ymin = lcb, ymax = ucb), alpha = 0.4) +
geom_ribbon(aes(ymin = lpb, ymax = upb), alpha = 0.2)
```

The darker region represents the confidence intervals formed by `add_ci`

and the lighter intervals are formed by `add_pi`

. Again, intervals are “jagged” because the response is not continuous and the bounds are formed through a simulation.

A simulation study was performed to examine the empirical coverage probabilities of prediction intervals formed using the parametric bootstrap. We focus on these intervals because we could not find results in the literature addressing their performance. Our simulation is not comprehensive, so users of `ciTools`

should exercise care when using these methods. In each simulation, a simple \(y = g^{-1}(mx + b)\) model is fit on a variable number of observations.

New observations were generated from the true model to determine if the empirical coverage was close to the nominal level. The mean interval width was also recorded, as were standard errors of the estimated coverage and mean interval width.

Note that in contrast to a study of confidence intervals, we do not expect interval widths to shrink to \(0\) as sample size tends to infinity. This is due to the predictive error in the conditional response distribution, which is not a factor in the construction of confidence intervals.

We take the same approach in the simulation study of each of the four models described below:

- Set a sequence of sample sizes e.g. \(n = 20, 50, 100, ...\).
- For each sample size, set a model matrix
- Then loop …

- Generate a response vector from the true model
- Fit a GLM to the simulated response \(\boldsymbol{y}\) with the fixed model matrix
- Calculate a prediction interval for \(y_{new}\) given \(x_0\), the midpoint of the range of \(x\) using \(2000\) bootstrap replicates.
- Store the width of this prediction interval.
- Generate a response \(y_{new} | x_0\) from the true model and determine if the new response is in the PI.

- Repeat Step 3 \(M\) times.

A Poisson model is fit with the log link function.

\[ y|x \sim \mathrm{Poisson}(\lambda = \exp(1 + 2 x)), \quad x \in (1, 2) \]

For each sample size, \(10,000\) simulations were performed. We find that observed coverage levels are biased conservative by about \(0.5\%\). This bias is likely a side effect of the type of quantile we calculate on the bootstrapped data. Interval widths generally decrease with sample size, however there is an increase from sample size \(500\) to \(1000\), which is within simulation error.

```
pois <- read_csv("pois_pi_results.csv") %>%
dplyr::select(-total_time) %>%
rename(nominal_level = level) %>%
dplyr::select(sample_size, everything())
knitr::kable(pois)
```

sample_size | nominal_level | cov_prob | cov_prob_se | int_width | int_width_se |
---|---|---|---|---|---|

20 | 0.95 | 0.9585 | 0.0019945 | 28.9824 | 0.0089195 |

30 | 0.95 | 0.9568 | 0.0020332 | 28.9686 | 0.0083300 |

50 | 0.95 | 0.9611 | 0.0019337 | 28.9707 | 0.0079568 |

100 | 0.95 | 0.9547 | 0.0020797 | 28.9350 | 0.0077294 |

250 | 0.95 | 0.9573 | 0.0020219 | 28.9146 | 0.0073916 |

500 | 0.95 | 0.9567 | 0.0020354 | 28.9136 | 0.0074926 |

1000 | 0.95 | 0.9585 | 0.0019945 | 28.9161 | 0.0075347 |

2000 | 0.95 | 0.9542 | 0.0020906 | 28.8923 | 0.0075509 |

A negative binomial model with log link function was fit with dispersion parameter \(\theta = 4\).

\[ y|x \sim \mathrm{NegBin}(\mu = \exp(1 + 2x), \theta = 4), \qquad x \in (1, 2) \]

For each sample size, \(10,000\) models were fit. Estimated coverage probabilities lie below the nominal level for small sample sizes. Statistical “folklore” recommends against fitting negative binomial models on a small sample, so our observations are in-line with this advice. Even though observed coverage probabilities closely agree with the nominal level, we find that interval width estimates tend to actually *increase slightly* with sample size, a trend we find concerning given the standard errors of the interval width estimates. Possible explanations of this trend could be our requirement that our prediction intervals are forced to lie in the positive integers, and that we do not simulate the distribution of the dispersion parameter. These results lead us to believe that more study is warranted on this type of interval estimate.

```
neg_bin <- read_csv("negbin_pi_results.csv") %>%
dplyr::select(-total_time) %>%
rename(nominal_level = level) %>%
dplyr::select(sample_size, everything())
knitr::kable(neg_bin)
```

sample_size | nominal_level | cov_prob | cov_prob_se | int_width | int_width_se |
---|---|---|---|---|---|

20 | 0.95 | 0.9225 | 0.0026740 | 99.5669 | 0.1996143 |

30 | 0.95 | 0.9365 | 0.0024387 | 102.7880 | 0.1667759 |

50 | 0.95 | 0.9441 | 0.0022974 | 104.9926 | 0.1292217 |

100 | 0.95 | 0.9455 | 0.0022701 | 106.6165 | 0.0933470 |

150 | 0.95 | 0.9505 | 0.0021692 | 107.3229 | 0.0794959 |

200 | 0.95 | 0.9530 | 0.0021165 | 107.4567 | 0.0707095 |

250 | 0.95 | 0.9503 | 0.0021734 | 107.7405 | 0.0642865 |

500 | 0.95 | 0.9528 | 0.0021208 | 108.1632 | 0.0501491 |

1000 | 0.95 | 0.9560 | 0.0020511 | 108.1898 | 0.0407571 |

2000 | 0.95 | 0.9495 | 0.0021899 | 108.2864 | 0.0354954 |

A Gamma regression model with `"inverse"`

link function was fit and simulated \(10,000\) times.

\[ y|x \sim \Gamma(\mathrm{shape} = 5, \mathrm{rate} = \frac{5}{2 + 4x}) \qquad x \in (30, 70) \]

Gamma regression was not discussed in this vignette but is still supported by `ciTools`

. Estimated coverage probabilities are generally close to, but consistently slightly below, the nominal level. In contrast to the results from the Poisson simulation, these coverage probabilities are within simulation error of the nominal level. Average interval widths tend to exhibit a high degree of variation.

```
gam <- read_csv("gamma_pi_results.csv") %>%
dplyr::select(-total_time) %>%
rename(nominal_level = level) %>%
dplyr::select(sample_size, everything())
knitr::kable(gam)
```

sample_size | nominal_level | cov_prob | cov_prob_se | int_width | int_width_se |
---|---|---|---|---|---|

100 | 0.95 | 0.9466 | 0.0022484 | 343.8714 | 0.3424111 |

250 | 0.95 | 0.9492 | 0.0021960 | 339.1537 | 0.2634857 |

500 | 0.95 | 0.9468 | 0.0022444 | 348.9826 | 0.1639942 |

1000 | 0.95 | 0.9496 | 0.0021878 | 342.0288 | 0.1347365 |

2000 | 0.95 | 0.9463 | 0.0022544 | 346.6101 | 0.1121606 |

In practice, the (non-canonical) log link function is more common due to its numerical stability. Since the choice of link function does not drastically alter our prediction interval procedure, `ciTools`

can also handle these types of models.

Gaussian prediction intervals may be determined parametrically by `ciTools`

. The model we considered was

\[ y|x \sim \mathcal{N}(\mathrm{mean} = \exp(1 + x), \mathrm{sd} = 1) \qquad x \in (0,1) \]

Since a bootstrap is unnecessary, total running time for the simulation is much faster than for simulations involving other distributions. Coverage probabilities are estimated within a standard error or two of \(0.95\) and interval widths generally decrease according to the increased sample size.

```
norm_log <- read_csv("gaussian_pi_loglink_results.csv") %>%
dplyr::select(-total_time) %>%
rename(nominal_level = level) %>%
dplyr::select(sample_size, everything())
knitr::kable(norm_log)
```

sample_size | nominal_level | cov_prob | cov_prob_se | int_width | int_width_se |
---|---|---|---|---|---|

20 | 0.95 | 0.9441 | 0.0022974 | 4.147159 | 0.0069663 |

30 | 0.95 | 0.9389 | 0.0023953 | 4.078163 | 0.0054658 |

50 | 0.95 | 0.9451 | 0.0022780 | 4.006229 | 0.0041607 |

100 | 0.95 | 0.9487 | 0.0022062 | 3.963054 | 0.0028657 |

250 | 0.95 | 0.9448 | 0.0022838 | 3.938591 | 0.0017536 |

500 | 0.95 | 0.9513 | 0.0021525 | 3.928583 | 0.0012280 |

1000 | 0.95 | 0.9469 | 0.0022424 | 3.924278 | 0.0008787 |

2000 | 0.95 | 0.9526 | 0.0021250 | 3.921294 | 0.0006189 |

`ciTools`

is a versatile **R** package that helps users quantify uncertainty about their generalized linear models. Creating interval estimates that are amenable to plotting is now as simple as fitting a GLM. To date, we provide coverage for many common GLMs used by practitioners. For the models covered by `ciTools`

, our simulations show that our confidence and prediction intervals are trustworthy.

There is still work to be done on this portion of `ciTools`

. We would like to

Include more parametric methods for prediction intervals.

Add facilities to handle CIs and PIs when offset terms are present in the model fit.

Further study interval estimates pertaining to the negative binomial model.

Produce a simulation study that compares parametric vs. bootstrap intervals for GLMs.

Offer alternative prediction intervals e.g. the shortest intervals that contain \(95\%\) of the simulated data.

Include options beyond BCa for creating bootstrap confidence intervals.

`sessionInfo()`

```
## R version 3.4.2 (2017-09-28)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 17.10
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] arm_1.9-3 lme4_1.1-14 Matrix_1.2-11 MASS_7.3-47
## [5] knitr_1.17.6 bindrcpp_0.2 broom_0.4.2 ciTools_0.3.0
## [9] dplyr_0.7.4 purrr_0.2.3 readr_1.1.1 tidyr_0.7.0
## [13] tibble_1.3.4 ggplot2_2.2.1 tidyverse_1.1.1
##
## loaded via a namespace (and not attached):
## [1] reshape2_1.4.2 splines_3.4.2 haven_1.1.0 lattice_0.20-35
## [5] colorspace_1.3-2 htmltools_0.3.6 yaml_2.1.14 rlang_0.1.2
## [9] nloptr_1.0.4 foreign_0.8-69 glue_1.1.1 modelr_0.1.1
## [13] readxl_1.0.0 bindr_0.1 plyr_1.8.4 stringr_1.2.0
## [17] munsell_0.4.3 gtable_0.2.0 cellranger_1.1.0 rvest_0.3.2
## [21] coda_0.19-1 psych_1.7.5 evaluate_0.10.1 labeling_0.3
## [25] forcats_0.2.0 parallel_3.4.2 highr_0.6 Rcpp_0.12.13
## [29] scales_0.4.1 backports_1.1.0 jsonlite_1.5 abind_1.4-5
## [33] mnormt_1.5-5 hms_0.3 digest_0.6.12 stringi_1.1.5
## [37] grid_3.4.2 rprojroot_1.2 tools_3.4.2 magrittr_1.5
## [41] lazyeval_0.2.0 pkgconfig_2.0.1 xml2_1.1.1 lubridate_1.6.0
## [45] assertthat_0.2.0 minqa_1.2.4 rmarkdown_1.6 httr_1.3.1
## [49] boot_1.3-20 R6_2.2.2 nlme_3.1-131 compiler_3.4.2
```