Bayes via Goodness of Fit

Subhadeep Mukhopadhyay and Douglas Fletcher

I. Illustration Using Rat Tumor Data (Binomial Family)

The rat tumor data consists of observations of endometrial stromal polyp incidence in \(k=71\) groups of rats. For each group, \(y_i\) is the number of rats with polyps and \(n_i\) is the total number of rats in the experiment. Here we describe the analysis of Rat tumor data using Bayes-\({\rm DS}(G,m)\) modeling.

Step 1. We begin by finding the starting parameter values for \(g \sim Beta(\alpha, \beta)\) by MLE:

library(BayesGOF)
set.seed(8697)
data(rat)
###Use MLE to determine starting values
rat.start <- gMLE.bb(rat$y, rat$n)$estimate

We use our starting parameter values to run the main DS.prior function:

rat.ds <- DS.prior(rat, max.m = 6, rat.start, family = "Binomial")

Step 2. We display the U-function to quantify and characterize the uncertainty of the a priori selected \(g\):

plot(rat.ds, plot.type = "Ufunc")

The deviations from the uniform distribution (the red dashed line) indicates that our initial selection for \(g\), \(\text{Beta}(\alpha = 2.3,\beta = 14.1)\), is incompatible with the observed data and requires repair; the data indicate that there are, in fact, two different groups of incidence in the rats.

Step 3a. Extract the parameters for the corrected prior \(\hat{\pi}\):

rat.ds
## $g.par
##     alpha      beta 
##  2.304768 14.079707 
## 
## $LP.coef
##        LP1        LP2        LP3 
##  0.0000000  0.0000000 -0.5040361

Therefore, the DS prior \(\hat{\pi}\) given \(g\) is: \[\hat{\pi}(\theta) = g(\theta; \alpha,\beta)\Big[1 - 0.52T_3(\theta;G) \Big]\]

Step 3b. We can now plot the estimated DS prior \(\hat{\pi}\) along with the original parametric \(g\):

plot(rat.ds, plot.type = "DSg", main = "DS vs g: Rat")

MacroInference

Step 4. Here we are interested in the overall macro-level inference by combining the \(k=70\) parallel studies. The group-specific modes along with their SEs can be computed as folows:

rat.macro.md <- DS.macro.inf(rat.ds, num.modes = 2 , iters = 25, method = "mode") 
rat.macro.md
##      1SD Lower Limit   Mode 1SD Upper Limit
## [1,]          0.0161 0.0340          0.0520
## [2,]          0.1442 0.1562          0.1681
plot(rat.macro.md, main = "MacroInference: Rat")

MicroInference

Step 5. Given an additional study \(\theta_{71}\) where \(y_{71} = 4\) and \(n_{71} = 14\), the goal is to estimate the probability of a tumor for this new clinical study. The following code performs the desired microinference (posterior distribution along with its mean and mode):

rat.y71.micro <- DS.micro.inf(rat.ds, y.0 = 4, n.0 = 14)
rat.y71.micro
## Posterior summary for y = 4, n = 14:
##  Posterior Mean = 0.1897
##  Posterior Mode = 0.1833
## Use plot(x) to generate posterior plot
plot(rat.y71.micro, main = "Rat (4,14)")

II. Illustration using arsenic data (Normal Family)

For this example, we will focus on the macroinference for the arsenic data set. The arsenic data set details the measurements of the level of arsenic in oyster tissue from \(k=28\) laboratories.

Step 1. We begin by finding the starting parameter values for \(g \sim Normal(\mu, \tau^2)\) by MLE:

data(arsenic)
arsn.start <- gMLE.nn(arsenic$y, arsenic$se, method = "DL")$estimate

We use our starting parameter values to run the main DS.prior function:

arsn.ds <- DS.prior(arsenic, max.m = 8, arsn.start, family = "Normal")

Step 2. We display the U-function to quantify and characterize the uncertainty of the a priori selected \(g\):

plot(arsn.ds, plot.type = "Ufunc")

Step 3. We now extract the parameters for the corrected prior \(\hat{\pi}\) and plot it, along with the original \(g\):

arsn.ds
## $g.par
##        mu     tau^2 
## 13.220522  3.407165 
## 
## $LP.coef
##        LP1        LP2        LP3        LP4        LP5        LP6 
##  0.0000000 -0.4777655 -0.5091652  0.4401269  0.3457535 -0.3862848
plot(arsn.ds, plot.type = "DSg", main = "DS vs g: Arsenic")

MacroInference

Step 4. We now execute the macroinference to find a global estimate to summarize the \(k = 28\) studies.

arsn.macro <- DS.macro.inf(arsn.ds, num.modes = 2, iters = 25, method = "mode")
arsn.macro
##      1SD Lower Limit   Mode 1SD Upper Limit
## [1,]         10.1102 10.776         11.4418
## [2,]         13.0750 13.470         13.8649

Based on our results, we find two significant modes. Therefore, the prior shows structured heterogeneity and requires both modes to describe the distribution and its two groups. We plot the results, including an interval for one standard error for each mode.

plot(arsn.macro, main = "MacroInference: Arsenic Data")

III. Illustration using child illness data (Poisson Family)

The next example will conduct microinference on the child illness data. The child illness data comes from a study where researchers followed \(k=602\) pre-school children in north-east Thailand, recording the number of times (\(y\)) a child became sick during every 2-week period for over three years. In particular, we want to compare posterior distributions for the number of children who became sick 1,3, 5, and 10 times during a two week period.

Step 1. We begin by finding the starting parameter values for \(g \sim Gamma(\alpha, \beta)\) by MLE:

data(ChildIll)
child.start <- gMLE.pg(ChildIll)

We use our starting parameter values to run the main DS.prior function for the Poisson family:

child.ds <- DS.prior(ChildIll, max.m = 8, child.start, family = "Poisson")

Step 2. We display the U-function to quantify and characterize the uncertainty of the selected \(g\):

plot(child.ds, plot.type = "Ufunc")

Step 3. We now extract the parameters for the corrected prior \(\hat{\pi}\):

child.ds
## $g.par
##    alpha     beta 
## 1.060878 4.193337 
## 
## $LP.coef
##        LP1        LP2        LP3        LP4        LP5        LP6 
##  0.0000000  0.0000000 -0.1259159  0.0000000  0.0000000 -0.2797667

The DS prior \(\hat{\pi}\) given \(g\) is: \[\hat{\pi}(\theta) = g(\theta; \alpha,\beta)\Big[1 - 0.13T_3(\theta;G) - 0.28T_6(\theta;G) \Big].\] We can plot \(\hat{\pi}\), along with \(g\):

plot(child.ds, plot.type = "DSg", main = "DS vs. g: Child Illness Data")

MicroInference

Step 4. The plot shows some very interesting behavior in \(\hat{\pi}\). We want to explore the posterior distributions for \(y = 1,3,5,10\). For those results, we use the microinference functions.

child.micro.1 <- DS.micro.inf(child.ds, y.0 = 1)
child.micro.3 <- DS.micro.inf(child.ds, y.0 = 3)
child.micro.5 <- DS.micro.inf(child.ds, y.0 = 5)
child.micro.10 <- DS.micro.inf(child.ds, y.0 = 10)

By plotting the posterior distributions we see how the distributions change based on the number of times a child is ill. The plots for each of the four microinferences are shown below.

plot(child.micro.1, xlim = c(0,10), main = "y = 1")
plot(child.micro.3, xlim = c(0,10), main = "y = 3")
plot(child.micro.5, xlim = c(0,10), main = "y = 5")
plot(child.micro.10, xlim = c(0,20), main = "y = 10")