Theory of Bayesian FROC with R scripts

Issei Tsunoda ( Please employ me !! :’-D) send me a mail: tsunoda.issei1111 at gmail.com

2019-08-02

Nowadays, many people are difficult to get job and I am also one of such unemployed person. Such people have a enough time to spent for reading my poor writing, so, this vignettes are made for such unemployed people. Please Please me! Please employ me (Heart shout)!

“X-ray of my teeth!! Let’s study togother with me !! :-D”

“X-ray of my teeth!! Let’s study togother with me !! :-D”

What this Package is

Execute the following R script:

Graphical User Interface: GUI

library(BayesianFROC)
          BayesianFROC::fit_GUI() #   Enjoy fitting!

Or

library(BayesianFROC)
          fit_GUI_dashboard() #   Enjoy fitting!

Or

library(BayesianFROC)
           fit_GUI_simple() #   Enjoy fitting!

Provides graphical user interface which let reader know what this package is without this poor writing vignettes. Do not read this bothering (:’-D) vignette, but execute the above!

Introduction

In this vignette, we provide Bayesian models for FROC analysis. Our work was inspired by a classical 1989 paper of Dev Chakraborty who treated it in the case of Maximal likelihood methods. In classical FROC analysis, modality comparison is discussed in terms of ANOVA. In our new methods, we use hyper-parameter representing AUC. The author hopes to take up some of these applications in Radiology.

Contents of Vignettes

\({}^{\dagger}\) Modality is treatment or imaging methods such as MRI, CT, PET, … etc.

\({}^{\dagger \dagger}\) MRMC is a traditional abbreviation for multiple reader and multiple case, where case means modality in this context. Note that case means image (radiographs) in ordinal contexts.

\(\color{green}{\textit{Aim of FROC }}\)

\[\color{green}{\textit{which imaging method is best to detect signal such as lesion or nodule? }}\]

For example, AUC of MRI is higher than that of CT, then we may say that MRI is better than CT to detect lesion. This is modality comparison issue. So, we want to obtain, e.g., the one of the following

\[\color{red}{\textit{MRI > PET > CT >.. }}\] or

\[\color{red}{\textit{PET > CT > MRI >.. }}\]

or

\[\color{red}{\textit{CT > MRI > PET >.. }}\] or ….

Another approach is the following null hypothesis \(H_0\) that

\[ H_o: \color{red}{\textit{All modalities are same ability to find nodule.}} \]

Comparison of modalilty

Radiograhs is associated with notion of modality, and there are two case.

Anyway, the aim of FROC is to compare modality.

\(\color{green}{\textit{What is FROC Data set?}}\)

What is FROC task from which data arise

In the following section, we will provide what the following table means and how to arise.

Confidence Level No. of Hits No. of False alarms
5 = definitely present \(H_{5}\) \(F_{5}\)
4 = probably present \(H_{4}\) \(F_{4}\)
3 = questionable \(H_{3}\) \(F_{3}\)
2 = subtle \(H_{2}\) \(F_{2}\)
1 = very subtle \(H_{1}\) \(F_{1}\)

where \(H_1,H_2,...,H_c,...H_5\) and \(F_1,F_2,...,F_c,...,F_5\) are non negative integers, representing \(\text{H}\)its and \(\text{F}\)alse alarms.

The next section, we give the definition of \(H_c,F_c\).

After the next section, please keep in mind the notations \(H_c,F_c\).

\(\color{green}{\textit{There are three actors}}\) for FROC trial

\(\color{red}{\text{Images, Reader, Gold-standard}}\)
  • \(\color{red}{\text{Images}}\), which contains shadows indicating signal or noise. Also it is called case in other books. Signal means nodule, lesion, diseased … etc. Non-diseased image and diseased image are prepared, the former contains no signals and later contains at least one nodule (signal).
  • \(\color{red}{\text{Readers}}\), who tries to detect lesions from shadows in radiographs. Also it is called observer
    • reader identifies his suspicious locations from shadows in each image with his confidence level.
      • Given an image for reader. Then researcher asks where do lesions exist?
      • If reader thinks “Oh! Definitely lesion!” Then he identify the location with number 5. And if he thinks “What the hell, it is also another lesion,..subtle but,… lesion?”, then he will mark the low confidence number, e.g., 2 or 1.
      • If Reader marked all his suspicious locations, then the researcher gives a new image to reader.
      • Be back to the localization and rating.
    • If reader thinks there is one more lesions then reader can answer several suspicious locations for each image, so, the answer is not dichotomous, it differs from the ROC analysis.
  • \(\color{red}{\text{A Gold Standard of Diagnosis}}\), identifying true lesion locations for all images. One can assign each reader’s suspicious location to TP (True positive) or FP (False Positive), whether the location of reader is correct with some tolerance or not.

In this package, we assume that the each image specified true lesion location which are evaluated base on the gold standard and reader does not know such locations.

If user dataset is not the above format but the Jafroc xlsx format, then use BayesianFROC::convertFromJafroc()

Radiographs and FROC task

Radiologist try to detect lesions from radiographs. In FROC task, there are radiographs, readers, and researcher who knows the truth (gold-standard).

  1. Reader marks his suspicious locations with his confidence level.\({}^{\dagger}\)

    • In each image, mark locations which he thinks it will be lesion
    • Also, write number indicating his confidence level as follows;
      • 5 = Definitely lesion
      • 4 = Probably lesion
      • 3 = Hmm … Probably lesion
      • 2 = subtle
      • 1 = very subtle
  2. By gold standard of diagnosis, researcher evaluate reader’s answer, that is, if reader correctly marked his suspicious location for some true lesion location then it is called hit (true positive). If not, then researcher associate such mis-marked location with a false alarm ( false positive)

  3. By doing over all radiographs, we obtain FROC data.

\({}^{\dagger}\) If reader thinks one more lesion exist in a single image, then he can mark one more locations for a image, and it differs from the ROC analysis, which allows each reader only dichotomous answer whether lesion exists.

\(\color{blue}{\textit{Flow of FROC task}}\)

\(\color{blue}{\textit{0. Gold standard}}\)
  • Suppose that \(\color{red}{\text{two red circles}}\) indicate true lesions, i.e. signal, which should be detected by reader. ( I am not doctor, it may be not lesion, this is only fictitious assumptions)

  • Of course, such red circle does not exists, when reader try to find lesions.

“Red circles representing true lesions”

“Red circles representing true lesions”

\(\color{blue}{\textit{1. Localization and rating}}\)
  • Reader marks the following two;

    • Localize his suspicious lesions

    • Rating his confidence level from the following numbers;

      • 5 = Definitely lesion
      • 4 = Probably lesion
      • 3 = Hmm … Probably lesion
      • 2 = subtle
      • 1 = very subtle
  • In the figure, each yellow triangle means that the reader thinks it is lesion with his confidence level denoted by the inner number.

    “Yellow triangle indicates suspicious locations of reader”

    “Yellow triangle indicates suspicious locations of reader”

\(\color{blue}{\textit{2. Evaluate hits }}\)
  • Each yellow triangle indicates reader’s suspicious location and the one of them (whose confidence number is 5) is very near one of the red circle indicating true lesion locations, thus we may conclude it as a hit with confidence level 5.
" The yellow triangle with confidence level 5 is a hit! "

" The yellow triangle with confidence level 5 is a hit! "

\(\color{blue}{\textit{3. Count false alarms}}\)
  • The other marks are false alarms, because they are far from the true lesion location denoted by the two red circles.

    “Falsel Alarms”

    “Falsel Alarms”

\(\color{blue}{\textit{4. Summarizing for } \textbf{ only one } \textit{image.}}\)

Summarizing the above,

Confidence Level No. of Hits No. of False alarms
5 = definitely present \(H_{5}\color{red}{=1}\) \(F_{5}\color{red}{=0}\)
4 = probably present \(H_{4}\color{red}{=0}\) \(F_{4}\color{red}{=0}\)
3 = equivocal \(H_{3}\color{red}{=0}\) \(F_{3}\color{red}{=1}\)
2 = subtle \(H_{2}\color{red}{=0}\) \(F_{2}\color{red}{=0}\)
1 = very subtle \(H_{1}\color{red}{=0}\) \(F_{1}\color{red}{=2}\)

Note that this is done only for the one image, and actual trial, there are many images, so number of hits and false alarms are larger.

\(\color{blue}{\textit{5. Evaluate for } \textbf{ all } \textit{ images.}}\)

Summarizing for the 1-st image as an above example,

Confidence Level No. of Hits No. of False alarms
5 = definitely present \(H_{5}\color{red}{=1}\) \(F_{5}\color{red}{=0}\)
4 = probably present \(H_{4}\color{red}{=0}\) \(F_{4}\color{red}{=0}\)
3 = equivocal \(H_{3}\color{red}{=0}\) \(F_{3}\color{red}{=1}\)
2 = subtle \(H_{2}\color{red}{=0}\) \(F_{2}\color{red}{=0}\)
1 = very subtle \(H_{1}\color{red}{=0}\) \(F_{1}\color{red}{=2}\)

Summarizing for the 2-nd image, (which is not shown in the above)

Confidence Level No. of Hits No. of False alarms
5 = definitely present \(H_{5}\color{red}{=0}\) \(F_{5}\color{red}{=0}\)
4 = probably present \(H_{4}\color{red}{=1}\) \(F_{4}\color{red}{=1}\)
3 = equivocal \(H_{3}\color{red}{=0}\) \(F_{3}\color{red}{=0}\)
2 = subtle \(H_{2}\color{red}{=1}\) \(F_{2}\color{red}{=1}\)
1 = very subtle \(H_{1}\color{red}{=0}\) \(F_{1}\color{red}{=1}\)

Summarizing for the 3-rd image, (which is not shown in the above)

Confidence Level No. of Hits No. of False alarms
5 = definitely present \(H_{5}\color{red}{=0}\) \(F_{5}\color{red}{=0}\)
4 = probably present \(H_{4}\color{red}{=0}\) \(F_{4}\color{red}{=3}\)
3 = equivocal \(H_{3}\color{red}{=1}\) \(F_{3}\color{red}{=1}\)
2 = subtle \(H_{2}\color{red}{=0}\) \(F_{2}\color{red}{=3}\)
1 = very subtle \(H_{1}\color{red}{=0}\) \(F_{1}\color{red}{=3}\)

Summarizing for the 4-th image,

Summarizing for the 5-th image,

Summarizing for the 6-th image,

Summarizing for ….,

Summarizing for ….,

Summarizing for ….,

Summarizing for ….,

…………………,

…………………,

…………………,

Summarizing for the \(N_I\)-th image (the last image), we finish FROC task.

Summing for \(\color{red}{\textit{ all }}\) tables, i.e.,

summing (pooling) for all images, we will get, e.g., the following FROC data

Confidence Level No. of Hits No. of False alarms
5 = definitely present \(H_{5}\color{red}{=1+0+0 \cdots =112}\) \(F_{5}\color{red}{=0+0+0 \cdots =21}\)
4 = probably present \(H_{4}\color{red}{=0+1+0 \cdots =56}\) \(F_{4}\color{red}{=0+1+3 \cdots =33}\)
3 = equivocal \(H_{3}\color{red}{=0+0+1 \cdots =44}\) \(F_{3}\color{red}{=1+0+1 \cdots =65}\)
2 = subtle \(H_{2}\color{red}{=0+1+0 \cdots = 32}\) \(F_{2}\color{red}{=0+1+3 \cdots =78}\)
1 = very subtle \(H_{1}\color{red}{= 0+0+0 \cdots =11}\) \(F_{1}\color{red}{=2+1+3 \cdots =122}\)

Consequently, we …

\(\color{blue}{\textit{6. Obtain FROC data}}\)
Confidence Level No. of Hits No. of False alarms
5 = definitely present \(H_{5}\color{red}{=112}\) \(F_{5}\color{red}{=21}\)
4 = probably present \(H_{4}\color{red}{=56}\) \(F_{4}\color{red}{=33}\)
3 = equivocal \(H_{3}\color{red}{=44}\) \(F_{3}\color{red}{=65}\)
2 = subtle \(H_{2}\color{red}{=32}\) \(F_{2}\color{red}{=78}\)
1 = very subtle \(H_{1}\color{red}{=11}\) \(F_{1}\color{red}{=122}\)

Note that this is done only for the one image, and actual trial, there are many images, so number of hits and false alarms are larger.

Note that the following monotonicity condition is not required in precisely. However, genreally speaking, the high confidence level would generate high number of hits rather than low confidence levels.

\(\color{green}{\textit{Example Data from R Console }}\)

In the above we use 5 confidences, but in the following example we use 3 confidences.

To create data, use the one of the followings:
  • dataset_creator_new_version()
  • create_dataset()
  • convertFromJafroc()

The third function convertFromJafroc() converts the Jafroc formulation to the authors formulation.

\(\color{green}{\textit{Modeling}}\)

“My face! My health is bad, my prurigo nodularis bother me.”

“My face! My health is bad, my prurigo nodularis bother me.”

There is two case, that is,

First, we explain the former for simplicity.

In order to build up the theory of FROC analysis in a simple manner, we first explain the former case.

It would be difficult for someone, but do not bother you if you cannot understand, since it is all my responsibility

Parameter

In Statistical conventional notation, the parameter of model is denoted by \(\theta \in \mathbb{R}^n\) and a likelihood by \(f(y|\theta)\) for future data \(y\).

In FROC model, we may say that \[\theta = (\mu,\sigma, z_1 ;dz_1,dz_2,...,dz_{C-1}) \in \mathbb{R}^{C+2}.\]

In the following we show what these notations mean.

FROC data

If number of confidence level is five (\(C=5\)), then the data-set for FROC analysis is the follows;

Confidence Level No. of Hits No. of False alarms
5 = definitely present \(H_{5}\) \(F_{5}\)
4 = probably present \(H_{4}\) \(F_{4}\)
3 = equivocal \(H_{3}\) \(F_{3}\)
2 = subtle \(H_{2}\) \(F_{2}\)
1 = very subtle \(H_{1}\) \(F_{1}\)

In the above table, the \(H_!,H_2,...,H_5\) and \(F_!,F_2,...,F_5\) are non-negative integers. Confidence level is sometimes called rating in other books.

Note that, in conventional likelihood notation \(f(y|\theta)\), the data \(y\) is given as follows;

\[y=(H_1,H_2,H_3,...;F_!,F_2,...;N_L,N_I) \] where \(H_c \in \mathbb{N}\) is a number of hits with \(c\)-th confidence level and \(F_c \in \mathbb{N}\) is a number of false alarms with \(c\)-th confidence level and \(N_L\) is a number of lesions (signal) and \(N_I\) is a number of images (trial).

Work flow of Modeling with conventional notations

Let \((\theta,\theta')=(\theta_1,\theta_2,\theta_3,...\theta; \theta'_1,\theta'_2,....)\) denote model parameters.

Then, it natural to assume that

\[ H_c \sim \text{Binomial}(p_c(\theta),N_L),\\ F_c \sim \text{Poisson}(q_c(\theta)).\\ \] To determine the hit rate \(p_c(\theta)\) and false alarm rate \(q_c(\theta)\), we use two density functions. denoted \(P_{c,\theta'}\) and \(Q_{c,\theta'}\) such that

\[ p_c(\theta) = \int_{\theta_c}^{\theta_{c+1}}P_{}(x|\theta')dx,\\ q_c(\theta) = \int_{\theta_c}^{\theta_{c+1}}Q_{}(x|\theta')dx. \]

Note that the upper and lower values of integral are common the both of hit and false rate. In the next section we shall determine these functions \(P_{\theta}\) and \(Q_{\theta}\). We call \(P_{\theta}\) signal distribution and \(Q_{\theta}\) noise distribution.

Note that if random variable is distributed by \(P_{\theta}\) or \(Q_{\theta}\), then traditional, it is called latent variable in this context.

In the next section, we will see that \(P_{\theta}\) is a Gaussian and \(Q_{\theta} = Q_{}=N_I \times \frac{d \log \Phi(z)}{dz}\) where the \(\Phi\) is the cumulative distribution function of the canonical Gaussian and \(N_I\) is a number of images (radiographs).

Of course it is not a priory determined, and the deducing process of these functions is not precisely logical. Thus I do not like the following explanation in my mathematical manner. But …

Anyway, I do not use the conventional notation \(\theta\) but use

\((\mu,\sigma, z_1, z_2,...,z_{C})\) instead of \(\theta\) as following manner:

\[ p_c(\theta) = \int_{z_c}^{z_{c+1}}P_{}(x|\mu,\sigma)dx,\\ q_c(\theta) = \int_{z_c}^{z_{c+1}}Q_{}(x)dx. \]

Or more concretely,

\[ p_c(\theta) = \int_{z_c}^{z_{c+1}}\text{Gaussian}_{}(x|\mu,\sigma)dx,\\ q_c(\theta) = \int_{z_c}^{z_{c+1}}N_I \times \frac{d \log \Phi(z)}{dz}dz. \]

In the following, we shall see how these hit rate and false alarm rate are determined. Of course, the two Gaussian and differential logarithmic Gaussian can be changed and this change will generate various FROC models. The author did not changes these two distribution, since the author’s health condition is very very very very very very bad. Really, a few years later, I will be die. This R script will be my last work.

FROC model is not so fitted for some family of data, then we need to change the above two distribution.

Please find what dataset is not so good for fitting via the GUI fit_GUI_simple() or fit_GUI().

Modeling: Single reader and single modality

Statistical modeling is equivalent to calculating the probability of arising a data \((H_c,F_c) \in \mathbb{N} \times\mathbb{N}, c=1,2,...,5\) for number of lesions \(N_L \in \mathbb{N}\) and number of images \(N_I\in \mathbb{N}\).

FROC data is very simple, that is, data are hits \(H_c\) and false alarms \(F_c\).

First we focus on the false positive fraction per image. In traditional statistics, it is natural to assume that false alarms are distributed by the Poisson, that is,

\[ \frac{F_c+F_{c+1}+...+F_C}{N_I} \sim \text{Poisson}(\lambda_c),\] where \(\lambda_c\) is non-negative real number representing a false alarm rate. When someone see the left hand side, he might consider what the hell?, does not know such a quantity, but I am afraid that it is a famous quantity in ROC or FROC analysis and it is called False Positive Fraction (FPF) per image. So, please love it.

However, the false positive fraction, that is, the left hand side is not a natural number, thus instead of the above, we assume the following;

\[F_c+F_{c+1}+...+F_C \sim \text{Poisson}(\lambda_cN_I)\]

or equivalently (see, Appendix),

\[F_{c } \sim \text{Poisson} ( (\lambda _{c} -\lambda _{c+1} )\times N_{I} ),\]

where \(\lambda_c\) is a non negative number.

Especially,

\[F_{C } \sim \text{Poisson} ( (\lambda _{C} -0 )\times N_{I} ),\]

that is, \(\lambda_{C+1}=0\).

We may say \(\lambda_c\) is a false alarm rate per image for generating the \(c\)-th FPF or False Positives per Image (FPI).

Hits

Next, we shall show the hit. Because each hit is generated by some lesion, we can count the number of hits in each lesion. We denote the number of hits to the \(l^{\text{th}}\) lesion with the \(c^{\text{th}}\) confidence level by \(H_{c,l}\).

It is natural to assume the following Bernoulli assumption: \[ H_{c,l}\sim \text{Bernoulli} ( p_{c} ), \] where \(p_c\) is a hit rate.

Summarizing for subscript \(l\) yields \[ H_{c } \sim \text{Binomial} ( p_{c}, N_{L} ), \] This is a statistical model for hits.

Determine the hits rate \(p_c\) and false alarms rate \(\lambda_c\)

To determine the rate \(p_c, \lambda_c\) we use two latent random variable which will be denoted by \(Y,W\). The word latent indicates that it cannot be observed nor measured from FROC trial.

We may consider the latent Gaussian random variable is such as some biomarker. The roll of biomarker has two, one is the same as the confidence level and the other one is the latent variable.

The author first consider that we use two distributions, one is associated with each lesions and the another one is with each images. But now, I think, it is wrong or redundant.

Anyway, assume the following signal distribution for latent random variable \(Y\) so that

\[ Y \sim \text{Normal}(\mu,\sigma ^2) \\ % X \sim \text{Normal}(0,1) \\ \] and thresholds \(z_1 < z_2 < ... < z_{c} < ... < z_C\) where \(z_c \in \mathbb{R}\).

Let us denote the density function of Gaussian of mean \(m\) and variance \(v\) by \(N_{m,v}()\), that is:

\[ N_{m,v}(x)=\frac{1}{\sqrt{2\pi v}}\exp\frac{-(x-m)^2}{2v}. \]

Then we consider this latent Gaussian variable determine the hit rate \(p_c\) as the following manner.

\[p_{c}=\text{Prob}( z_c < Y <z_{c+1} )\\ = \int ^{z_{c+1 }}_{z_{c}} N_{m,v}(z)dz\\ =\Phi (\frac{z_{c +1}-\mu}{\sigma})-\Phi (\frac{z_{c}-\mu}{\sigma}). \]

In particular, we use the following definition for the most highest confidence level.

\[p_{C}= 1-\Phi (\frac{z_{c}-\mu}{\sigma}), \]

instead of

\[p_{C}= \Phi (\frac{z_{C +1}-\mu}{\sigma}) -\Phi (\frac{z_{C}-\mu}{\sigma}), \] with theoretically infinity but numerically very large number \(z_{C+1}\). Do not confuse our model does not use the \(z_{C+1}\) which cause the divergent transition issues, see Appendix for more detail. I know, reader thinks that what the hell, the author is very stupid and very redundant explanation. But I am afraid that the author is really stupid dirty old man without no money and no hair, and this special treatment for the highest confidence level is very important to avoid the divergent transition issue causing bias of MCMC sampling, pointed out by the Betanaplha.

Why does we define hit rate as above ? Intuitively, this definition of hit rate says that each lesion (signal) assigned some latent variable \(Y\) and if it falls into the interval \(z_c < Y <z_{c+1}\), then reader thinks it is his suspicious positive. It shows that latent variable merely decides the rate and not the number of hit. Thus if we consider the latent Gaussian has i.i.d. and associated with lesion, then if value falls between \(z_c\) and \(z_{c+1}\), it cannot be said that it generates hit. But it may say that it would be hit in the probability \(p_c\).

Next we shall introduce the differential logarithmic Gaussian distribution to determine the false alarm rate \(\lambda_c\).

Recall that False alarm is distributed as follows;

\[F_{c } \sim \text{Poisson} ( (\lambda _{c} -\lambda _{c+1} )\times N_{I} ),\]

and thus setting \(q_c:=(\lambda _{c} -\lambda _{c+1} )\times N_{I}\), we obtain

\[F_{c } \sim \text{Poisson} ( q_c),\]

OK, pretty crowd, what we have to do next to

We introduce a new latent random variable \(W\) such that

\[ q_c =\text{Prob}( z_c < W <z_{c+1} ). \]

We assume that \(W\) is defined so that the following relation holds;

\[ q_c = N_I \times \int ^{z_{c+1 }}_{z_{c}}d \log \Phi ( z ) \] and especially for the highest confidence level \(C\), we define it by \[ q_C = N_I \times \int ^{\infty}_{z_{c}}d \log \Phi ( z ) \] Note that in programming phase, never use the number \(Z_{C+1}\) which is ideally infinity but numerically very large number which cause the divergent transition issue.

Thus, we may say that false alarm rate is determined by the density function \(N_I \times\frac{ d \log \Phi ( z )}{dz}\) (not probability density), which we shall call differential logarithmic Gaussian distribution with scale \(N_I\) or briefly logarithmic Gaussian. Consequently, we obtain the Bayesian model;

\[H_{c } \sim \text{Binomial} ( p_{c}, N_{L} ),\] \[F_{c } \sim \text{Poisson} (q_c),\] \[q_c = N_I \times \int ^{z_{c+1 }}_{z_{c}}d \log \Phi ( z ),\] \[p_{c} = \int ^{z_{c+1 }}_{z_{c}} N_{m,v}(z)dz\] In this model, \(z_{c},c=1,\cdots,C\), \(\mu\), and \(\sigma\) are the parameters to be estimated.

Or equivalently,

\[H_{c } \sim \text{Binomial} ( p_{c}, N_{L} ),\] \[F_{c } \sim \text{Poisson} ( (\lambda _{c} -\lambda _{c+1} )\times N_{I} ),\] \[\lambda _{c} = - \log \Phi ( z_{c } ),\] \[p_{c} =\Phi (\frac{z_{c +1}-\mu}{\sigma})-\Phi (\frac{z_{c}-\mu}{\sigma}). \] In this model, \(z_{c},c=1,\cdots,C\), \(\mu\), and \(\sigma\) are the parameters to be estimated.

Thus we can say that the classical bi-normal assumption is redundant in the FROC trial and only we need two distribution, one is the signal distribution and the other is logarithmic Gaussian. So, classical FROC theory use the bi normal assumption which use canonical Gaussian for noise and signal distribution as above. But the noise distribution is wrong or redundant. As shown, we have to use the logarithmic Gaussian for the noise distribution.

Note that \(N_I \times\log \Phi ( z )\) is independent on the model parameter. Thus it is perfectly fixed, and we can observe how signal distribution varys according to data. So, in classical signal detection theory, two Gaussian is plotted, but in FROC theory we should plot two different distribution, one is the Gaussian and the other is the logarithmic Gaussian.

Of course, the logarithmic Gaussian is not necessarily required, that is, we may change the distribution if necessary. The author found that some dataset, the fitting is very bad, thus, in such data, we should change the Gaussian or logarithmic Gaussian for more suitable one.

Why we use the differential logarithmic Gaussian?

We shall show how the logarithmic Gaussian appears. This is an heuristic approach by Dev Chakraborty in 1989 paper, We shall follow his logic, slowly and step by step.

Let \(X\) be a real valued random variable distributed by the canonical Gaussian, that is,

\[ X \sim \text{Normal}(0,1) \\ \]

and we call this the noise distribution.

Intuitively, we consider that if \(X\) falls between \(z_c<X<z_{c+1}\), then it generates one false alarm with confidence level \(c\).

Using the noise distribution, it natural to thinks that the probability of that the false positive is not zero with confidence level is greater than \(c\) is

\[\text{Prob}( z_c < X ) =1- \Phi (z_c). \]

On the other hand, by the Poisson, it can be calculated (see Appendix)

\[\mathbb{P} \bigg\{ \frac{F_c+F_{c+1}+...+F_C}{N_I} \neq 0 \bigg\} = 1 - \mathbb{P} \bigg\{ \frac{F_c+F_{c+1}+...+F_C}{N_I} =0 \bigg\} \\ =1-e^{-\lambda_c}.\]

Now, we assume that

\[\text{Prob}( z_c < X ) =\mathbb{P} \bigg\{ \frac{F_c+F_{c+1}+...+F_C}{N_I} \neq 0 \bigg\} . \]

Then, we obtain \[ \lambda_c = - \log \Phi (z_c). \]

Consequently, we can rewrite the Bayesian model;

\[H_{c } \sim \text{Binomial} ( p_{c}, N_{L} ),\] \[F_{c } \sim \text{Poisson} ( (\lambda _{c} -\lambda _{c+1} )\times N_{I} ),\] \[\lambda _{c} = - \log \Phi ( z_{c } ),\] \[p_{c} =\Phi (\frac{z_{c +1}-\mu}{\sigma})-\Phi (\frac{z_{c}-\mu}{\sigma}). \] In this model, \(z_{c},c=1,\cdots,C\), \(\mu\), and \(\sigma\) are the parameters to be estimated.

\(\color{green}{\textit{Priors}}\)

“My teeth! When I submit this model to Journals, then any reviewer cannot understand this prior. So, it is hard for someone but it run in this package, so please carefully check my theory! Never think the author is stupid. If you think so then you are stupid. :-D”

“My teeth! When I submit this model to Journals, then any reviewer cannot understand this prior. So, it is hard for someone but it run in this package, so please carefully check my theory! Never think the author is stupid. If you think so then you are stupid. :-D”

Prior for monotonicity of thresholds

Thresholds \(z_1, z_2, ...., z_C\) should satisfy the monotonicity condition \(z_1 < z_2 < .... < z_C\).

To do so, we use the prior that \[z_2 - z_1 \sim \text{Uniform}(0,\infty) \\ z_3 - z_2 \sim \text{Uniform}(0,\infty) \\ :\\ :\\ z_C - z_{C-1} \sim \text{Uniform}(0,\infty) \\ \] where \(\text{Uniform}(0,\infty)\) means improper prior whose support is the interval \((0,\infty)\).

We introduce a new parameter: \[ dz_c:=z_{c+1}-z_{c}, \] where \(c=1,2,\cdots,C\). In practical modeling, we no longer use the previous section’s parameters \(z_2,z_3,\cdots, z_C\), using $ dz_1,dz_2,, dz_{C-1}$ instead. To include an order constraint for Bayesian models, we assume a non-regular uniform prior for \(dz_c\): \[ dz_c \sim \text{Uniform}(0, \infty), \] where \(\text{Uniform}(0,\infty)\) indicates flat improper priors whose integrated values are not one and its support is the interval \((0,\infty)\). These priors are equivalent in that we assume monotonicity in the thresholds, as follows: \[ z_{1}\leq z_{2}\leq z_{3} \leq \dots \leq z_{C}.\]

Finally, we obtain the following Bayesian FROC model in case of single reader and single modality.

\[\begin{eqnarray*} H_{c } & \sim &\text{Binomial} ( p_{c}, N_{L} ), \text{ for $c=1,2,...,C$.}\\ F_{c } & \sim &\text{Poisson}( (\lambda _{c} -\lambda _{c+1} )\times N_{L} ), \text{ for $c=1,2,...,C-1$.}\\ \lambda _{c}& =& - \log \Phi ( z_{c } ),\text{ for $c=1,2,...,C$.}\\ p_{c} &=&\Phi (\frac{z_{c +1}-\mu}{\sigma})-\Phi (\frac{z_{c}-\mu}{\sigma}), \text{ for $c=1,2,...,C-1$.}\\ p_C & =& 1-\Phi (\frac{z_{C}-\mu}{\sigma}),\\ F_{C} & \sim & \text{Poisson}( (\lambda _{C} - 0)N_I),\\ dz_c=z_{c+1}-z_{c} &\sim& \text{Uniform}(0,\infty), \text{ for $c=1,2,...,C-1$.}\\ \mu &\sim& \text{Uniform}(-\infty,\infty),\\ \sigma &\sim& \text{Uniform}(0,\infty),\\ \end{eqnarray*}\] Our model has parameters \(z_{1}, dz_1,dz_2,\cdots, dz_{C-1}\), \(\mu\), and \(\sigma\). Notation \(\text{Uniform}( -\infty,100000)\) means the improper uniform distribution of its support is the unbounded interval \(( -\infty,100000)\).

The last three equation are priors. In past,

It might be redundant, but I think reader’s back ground cannot be not inferred, so, I write down the following redundant descriptions. If \(C=5\), then it can be written without abbreviation as follows;

In the case of \(C=5\), we exclude the subscript notation \(c\) from the above model. I hope it helps reader. Provides the above model without abbreviation as follows;

\[\begin{eqnarray*} H_{1 } & \sim &\text{Binomial} ( p_{1}, N_{L} ) \\ H_{2 } & \sim &\text{Binomial} ( p_{2}, N_{L} ) \\ H_{3 } & \sim &\text{Binomial} ( p_{3}, N_{L} ) \\ H_{4 } & \sim &\text{Binomial} ( p_{4}, N_{L} ) \\ H_{5 } & \sim &\text{Binomial} ( p_{4}, N_{L} ) \\ F_{1 } & \sim &\text{Poisson}( (\lambda _{1} -\lambda _{2} )\times N_{I} ), \\ F_{2 } & \sim &\text{Poisson}( (\lambda _{2} -\lambda _{3} )\times N_{I} ), \\ F_{3 } & \sim &\text{Poisson}( (\lambda _{3} -\lambda _{4} )\times N_{I} ), \\ F_{4 } & \sim &\text{Poisson}( (\lambda _{4} -\lambda _{5} )\times N_{I} ), \\ F_{5 } & \sim &\text{Poisson}( (\lambda _{5} -0 )\times N_{I} ), \text{Be careful !!:'-D}\\ \lambda _{1}& =& - \log \Phi ( z_{1 } ),\\ \lambda _{2}& =& - \log \Phi ( z_{2 } ),\\ \lambda _{3}& =& - \log \Phi ( z_{3 } ),\\ \lambda _{4}& =& - \log \Phi ( z_{4 } ),\\ \lambda _{5}& =& - \log \Phi ( z_{5 } ),\\ p_{1} &:=&\Phi (\frac{z_{2}-\mu}{\sigma})-\Phi (\frac{z_{1}-\mu}{\sigma}), \\ p_{2} &:=&\Phi (\frac{z_{3}-\mu}{\sigma})-\Phi (\frac{z_{2}-\mu}{\sigma}), \\ p_{3} &:=&\Phi (\frac{z_{4}-\mu}{\sigma})-\Phi (\frac{z_{3}-\mu}{\sigma}), \\ p_{4} &:=&\Phi (\frac{z_{5}-\mu}{\sigma})-\Phi (\frac{z_{4}-\mu}{\sigma}), \\ p_5 &:=& 1-\Phi (\frac{z_{5}-\mu}{\sigma}),\text{Be careful !!:'-D}\\ dz_1=z_{2}-z_{1} &\sim& \text{Uniform}(0,\infty), \\ dz_2=z_{3}-z_{2} &\sim& \text{Uniform}(0,\infty), \\ dz_3=z_{4}-z_{3} &\sim& \text{Uniform}(0,\infty), \\ dz_4=z_{5}-z_{4} &\sim& \text{Uniform}(0,\infty), \\ \mu &\sim& \text{Uniform}(-\infty,\infty),\\ \sigma &\sim& \text{Uniform}(0,\infty),\\ \end{eqnarray*}\] Our model has parameters \(z_{1}, dz_1,dz_2,\cdots, dz_{C-1}\), \(\mu\), and \(\sigma\). Notation \(\text{Uniform}( -\infty,100000)\) means the improper uniform distribution of its support is the unbounded interval \(( -\infty,100000)\).

R code to fit the above model to data

This is a basic example which shows how to fit a model to data dataList of single reader and single modality.

Note that in the following R scripts, a list object dataList representing the following FROC data;

Number of Confidence Level Number of Hits Number of False alarms
3 = definitely present 97 1
2 = equivocal 32 14
1 = questionable 31 74

In the last R script p_value_of_the_Bayesian_sense_for_chi_square_goodness_of_fit(fit) we calculates p value of the chi square goodness of fit by Bayesian manner, so we shall briefly recall how it calculates the p value ( for more details, see Gelmann’s book Bayesian Data Analysis).

In the following, we shall use the general notation. Let \(y_{\text{obs}}\) be observed data and \(f(y|\theta)\) be a model (likelihood) for a future data \(y\). We write a prior and a posterior distributions by \(\pi(\theta)\) and \(\pi(\theta|y) \propto f(y|\theta)\pi(\theta)\). The posterior predictive distribution is defined by \(p(y|y_{\text{obs}}) := \int f(y|\theta)\pi(\theta|y_{\text{obs}}) d\theta\)

In our case, the data \(y\) is a pair of hits and false alarms, that is \(y=(H_1,H_2, \dots H_C; F_1,F_2, \dots F_C)\) and \(\theta = (z_1,dz_1,dz_2,\dots,dz_{C-1},\mu, \sigma)\). We shall define the \(\chi^2\) discrepancy ( goodness of fit statistics ) to validate that the model fit the data.

\[ T(y,\theta) := \sum _{c=1,\cdots,C} \biggr( \frac{\bigl\{H_c-N_L\times p_c \bigr\}^2}{N_L\times p_c}+ \frac{\bigl\{F_c-(\lambda _{c} -\lambda _{c+1} )\times N_{I}\bigr\}^2}{(\lambda _{c} -\lambda _{c+1} )\times N_{I} }\biggr). \]

for single reader and single modality.

\[ T(y,\theta) := \sum _{r=1, \cdots,R} \sum _{m=1,\cdots,M} \sum _{c=1,\cdots,C} \biggr( \frac{\bigl\{H_{c,m,r}-N_L\times p_{c,m,r}\bigr\}^2}{N_L\times p_{c,m,r}}+ \frac{\bigl\{F_{c,m,r}-(\lambda _{c} -\lambda _{c+1} )\times N_{L}\bigr\}^2}{(\lambda _{c} -\lambda _{c+1} )\times N_{L} }\biggr). \]

for multiple reader and multiple modality, where false positive fraction is per lesion instead of per image. Of course it is very easy to implement per image hierarchical Bayesian model, so the author is kind of goofing off in implementing per image model.

Note that \(p_c\) and \(\lambda _{c}\) depend on \(\theta\). In this statistic, the number of degrees of freedom is \(C-2\).

Classical frequentist methods, the parameter \(\theta\) is a fixed estimates, e.g. the maximal likelihood estimator, however, in Bayesian context, the parameter is not deterministic, so, by integrating with the posterior predictive distribution, we can get the posterior predictive \(\chi^2\) value and its p-value. Let \(y_{\text{obs}}\) be observed data. Then the posterior predictive p value is defined by

\[ p \text{ value of $y_{\text{obs}}$ in Theory} = \int \int dy d\theta I_{T(y,\theta) > T(y_{\text{obs}},\theta)}f(y|\theta)\pi(\theta|y_{\text{obs}}) \]

In the following, we show how to calculate the double Integral. Suppose that \(\theta _1, \theta_2,\cdots,\theta_N\) are samples from the posterior distribution via Hamiltonian Monte Carlo simulation, we obtain a sequence of models (likelihoods) ; \(f(y|\theta_1),f(y|\theta_2),\cdots, f(y|\theta_N)\). Drawing the samples \(y_1,y_2,\cdots, y_N\) so that each \(y_i\) is a sample from the distribution whose density function is \(f(y|\theta_i)\), i.e., in conventional notation, we may write;

\[ y_1 \sim f(\cdot|\theta_1),\\ y_2 \sim f(\cdot|\theta_2),\\ \cdots,\\ y_N \sim f(\cdot|\theta_N). \] and it is desired one, that is, we can interpret that the samples \(y_1,y_2,\cdots, y_N\) is drawing from the posterior predictive distributions. Using the law of large number, we can calculate the noble integral of the p value by

\[ p \text{ value approximation to calculate numerically} =\frac{1}{N} \sum I_{T(y_i,\theta_i) > T(y_{\text{obs}},\theta_i)} \]

…oh ache,.. ache ache my body is now very bad ,,, I hate prurigo nodularis.

I exposed some surfactants made by Flower King (fictitious name), then my body become bad, and many many very many prurigo noudularis appeared my whole body, and I struggled it for 1 years and 17 months. I want to say to live in safe, the bio sarfanct is very important, OK ? Bio !! not petroleum. Please make and develop bio sarfactant for the future life.

What distribution determine the false alarm rate?

In this section, we show some distribution determines the false alarm rate.

Recall that the hit rate is determined by the signal distributions, that is,

\[p_{c}=\text{Prob}( z_c < Y <z_{c+1} )\\ =\Phi (\frac{z_{c +1}-\mu}{\sigma})-\Phi (\frac{z_{c}-\mu}{\sigma}). \] where

\[ Y \sim \text{Normal}(\mu,\sigma ^2). \]

Now, False alarm is distributed as follows;

\[F_{c } \sim \text{Poisson} ( (\lambda _{c} -\lambda _{c+1} )\times N_{I} ),\]

and thus the rate is \(q_c:=(\lambda _{c} -\lambda _{c+1} )\times N_{I}\)

What we want is to define a random variable \(W\) such that

\[ q_c =\text{Prob}( z_c < W <z_{c+1} ). \]

Using \[\lambda _{c} = - \log \Phi ( z_{c } ),\] we obtain

\[ q_c = (\log \Phi ( z_{c+1 } ) - \log \Phi ( z_{c } )) \times N_I \] Thus we can obtains

\[ q_c = N_I \times \int ^{z_{c+1 }}_{z_{c}}d \log \Phi ( z ) \]

Thus, we may say that false alarm rate is determined by the probability density function \(N_I \times\frac{ d \log \Phi ( z )}{dz}\), which we shall call differential logarithmic Gaussian distribution with scale \(N_I\) or briefly logarithmic Gaussian. Thus we can say that the classical bi-normal assumption is redundant in the FROC trial and only we need two distribution, one is the signal distribution and the other is logarithmic Gaussian, so, classical theory use the canonical Gaussian for signal, but it is wrong or redundant. As shown, we have to use the logarithmic Gaussian for the noise distribution.

Note that \(N_I \times\log \Phi ( z )\) is independent on the model parameter. Thus it is perfectly fixed, and we can observe how signal distribution varys according to data. So, in classical signal detection theory, two Gaussian is plotted, but in FROC theory we should plot two different distribution, one is the Gaussian and the other is the logarithmic Gaussian.

So, we use the not only bi-normal Gaussian, but also, logarithmic Gaussian,

\(\color{green}{\textit{Multiple reader and Multiple case }}\)

“X-ray of my teeth!!”

“X-ray of my teeth!!”

Note that Multiple reader and multiple \(\color{red}{\text{"case"}}\) (MRMC) implies that multiple reader and multiple \(\color{red}{\text{"modality"}}\) !!

So, I hate the word MRMC, it should be \(\color{red}{\text{"MRMM"}}\) !!

Data for MRMC

Example.

2 readers, 2 modalities and 3 confidence levels.

Confidence Level Modality ID Reader ID Number of Hits Number of False alarms
3 = definitely present 1 1 \(H_{3,1,1}\) \(F_{3,1,1}\)
2 = equivocal 1 1 \(H_{2,1,1}\) \(F_{2,1,1}\)
1 = questionable 1 1 \(H_{1,1,1}\) \(F_{1,1,1}\)
3 = definitely present 1 2 \(H_{3,1,2}\) \(F_{3,1,2}\)
2 = equivocal 1 2 \(H_{2,1,2}\) \(F_{2,1,2}\)
1 = questionable 1 2 \(H_{1,1,2}\) \(F_{1,1,2}\)
3 = definitely present 2 1 \(H_{3,2,1}\) \(F_{3,2,1}\)
2 = equivocal 2 1 \(H_{2,2,1}\) \(F_{2,2,1}\)
1 = questionable 2 1 \(H_{1,2,1}\) \(F_{1,2,1}\)
3 = definitely present 2 2 \(H_{3,2,2}\) \(F_{3,2,2}\)
2 = equivocal 2 2 \(H_{2,2,2}\) \(F_{2,2,2}\)
1 = questionable 2 2 \(H_{1,2,2}\) \(F_{1,2,2}\)

where, each component \(H\) and \(F\) are non negative integers. By the multi-index notation, for example, \(H_{3,2,1}\) means the hit or the \(1\)-st reader over all images taken by \(2\)-nd modality with reader’s confidence level is \(3\).

This package has example data, for example, the following object in this package is an MRMC dataset:

To create such data, use the one of the followings:
  • dataset_creator_new_version()
  • create_dataset()
  • convertFromJafroc()

The third function convertFromJafroc() converts the Jafroc formulation to the authors formulation.

Data of Multiple reader and multiple modality from R console

Sorry to change the subject but I have pains in my arms, brain, my chest, got a itchy in my whole body, especially face.

So, go back to the statistical modeling, ah, .. OK. Next, we shall show the hierarchical Bayesian Model to compare modalities, i.e., compare imaging methods such as MRI, CT, PET, … etc, or the another context modality would means the efficacy of treatments. Anyway, in the following we will show how to model.

Mathematical philosophy said that the expression of equation is not important, but the property or relation between notions is more important. So, in the following, I explain the hierarchical model which is very complex and the complex representation will hide the essence of the modeling. To avoid it, the author briefly show the essence or idea of the hierarchical modeling.

Essence of modeling of the hierarchical Bayesian Model.

In this brief section we use the notation \(f(y|\theta)\) for a likelihood function. That is data is \(y\) and model parameter is denoted by \(\theta\).

The data format of MRMC case, the hits and false alarms are calculated for each reader and modality, thus the data has the form \(y_{m,r}\) representing the hits and false alarms for the \(m\)-th modality and the \(r\)-th reader. So, we estimates the model parameter \(\theta_{m,r}\) representing AUCs for the \(m\)-th modality and the \(r\)-th reader. To obtain the AUCs \(\theta_{m}\) depending on modality only, we use the hyper parameter, such as

\[ \theta_{m,r} \sim \text{Normal}( \theta_m, \text{variance})\].

So, parameter \(\theta_m\) depends only on the \(m\)-th modality. So, using the posterior of \(\theta_m\), we obtains the following basic characteristics between the \(m\)-th and \(m'\)-th modalities:

  • \(\mathbb{P}(\theta_m > \theta_{m'} )\)
  • \(\theta_m - \theta_{m'}\)

where, the former is the probability of the event that the AUC of \(m\)-th modality is greater than that of \(m'\)-th. The latter is the difference of AUC between the \(m\)-th and \(m'\)-th modalities.

Hierarchical model

The author of this model thinks it is better to show it without any explanation, since it is obvious;

\[\begin{eqnarray*} H_{c,m,r} & \sim &\text{Binomial }( p_{c,m,r}, N_L ),\\ F_{c,m,r} &\sim& \text{Poisson }( ( \lambda _{c} - \lambda _{c+1})N_L ),\\ \lambda _{c}& =& - \log \Phi (z_{c }),\\ p_{c,m,r} &:=&\Phi (\frac{z_{c +1}-\mu_{m,r}}{\sigma_{m,r}})-\Phi (\frac{z_{c}-\mu_{m,r}}{\sigma_{m,r}}), \\ \color{red}{p_C} & =& 1-\Phi (\frac{z_{C}-\mu_{m,r}}{\sigma_{m,r}}),\\ \color{red}{F_{C,m,r}} & \sim &\text{Poisson } ( (\lambda _{C} - 0)N_I),\\ A_{m,r}&:=&\Phi (\frac{\mu_{m,r}/\sigma_{m,r}}{\sqrt{(1/\sigma_{m,r})^2+1}}), \\ A_{m,r}&\sim&\text{Normal} (A_{m},\sigma_{r}^2), \\ dz_c&:=&z_{c+1}-z_{c},\\ dz_c, \sigma_{m,r} &\sim& \text{Uniform}(0,\infty),\\ z_{c} &\sim& \text{Uniform}( -\infty,100000),\\ A_{m} &\sim& \text{Uniform}(0,1).\\ \end{eqnarray*}\] Our new model has parameters \(z_{1}, dz_1,dz_2,\cdots, dz_{C}\), \(A_{m}\), \(\sigma_{r}\), \(\mu_{m,r}\), and \(\sigma_{m,r}\).

R code to fit the above model to data

Work Flow for MRMC case

Work Flow with R script

Prepare data

Fitting

Draw Curves

\(\color{green}{\textit{Visualization }}\) of data and fitted model

FPF and TPF

Recall that for the \(C=5\) step rating, FROC data is the pair \((H_c,F_c) \in \mathbb{N}^2,c=1,2,...,5\) of hit \(H_c\) for the \(c\)-th confidence level and false alarm \(F_c\) with number of lesions \(N_L\) and number of images \(N_I\).

we define \(C=5\) points as follows: \[ (x_{c},y_{c}):=( \sum_{c\leq c' \leq C } F_{c^\prime}/N_{I}, \sum_{c\leq c' \leq C } H_{c^\prime}/N_{L}) \in \mathbb{R}^2, \] where \(c\) is the \(c^{\text{th}}\) confidence level, and we call these points the pair of False positive fraction (FPF) per image and True positive fraction (TPT) per lesion.

Some reader may think what the hell? The author is stupid. I am afraid that the author is actually stupid dirty old man with prurigo nodularis. Anyway, I show the another description for FPF and TPT as follows. Suppose the three step rating case:

FPF and TPF for \(C=3\)
Number of Confidence Level Number of Hits Number of False alarms
3 = definitely present \(H_{3}=97\) \(F_{3}=1\)
2 = equivocal \(H_{2}=32\) \(F_{2}=14\)
1 = questionable \(H_{1}=31\) \(F_{1}=74\)

In this case, the pair of FPF and TPF \((x_{c},y_{c})\) is give as follows;

\(C=3\)
\((x_{1},y_{1})=\) \(\biggr( \frac{F_{3}+F_2+F_1}{N_I},\) \(\frac{H_{3}+H_2+H_1}{N_L}\biggr)\)
\((x_{2},y_{2})=\) \(\biggr(\frac{F_{3}+F_2}{N_I},\) \(\frac{H_{3}+H_2}{N_L}\biggr)\)
\((x_{3},y_{3})=\) \(\biggr(\frac{F_{3}}{N_I},\) \(\frac{H_{3}}{N_L}\biggr)\)

Plotting these \(C\)-points in \((x,y)\)-plain, we can visualize the FROC data. Connecting these points by line, we obtain the so called empirical FROC curve.

In the following figure, the three circle means \((x_{c},y_{c}),c=1,2,3\) and the curve is the FROC curve defined later.

" p value = 0.669, bad fitting"

" p value = 0.669, bad fitting"

I hate the word FPF and TPF. I want to use the word cumulative false positives (CFP) per image and cumulative true positives (CTP) per lesion. In my R program, I use CFP, CTP rather than FPF, TPF.

To tell the truth, I like the word cumulative false positives per images rather than FPF. It is my original word, but it means obvious and clear rather than FPF. Also I like the word cumulative true positives per lesions. The abbreviations of these two words are CFP and CTP in my paper.

FPF and TPF for \(C=5\)

Suppose that dataset is the following \(C=5\) case,

Confidence Level No. of Hits No. of False alarms
5 = definitely present \(H_{5}\) \(F_{5}\)
4 = probably present \(H_{4}\) \(F_{4}\)
3 = questionable \(H_{3}\) \(F_{3}\)
2 = subtle \(H_{2}\) \(F_{2}\)
1 = very subtle \(H_{1}\) \(F_{1}\)

In this case, the pair of FPF and TPF \((x_{c},y_{c})\) is give as follows;

\(C=5\)
\((x_{1},y_{1})=\) \(\biggr( \frac{F_5+F_4+F_3+F_2+F_1}{N_I},\) \(\frac{H_5+H_4+H_3+H_2+H_1}{N_L}\biggr)\)
\((x_{2},y_{2})=\) \(\biggr( \frac{F_5+F_4+F_3+F_2}{N_I},\) \(\frac{H_5+H_4+H_3+H_2}{N_L}\biggr)\)
\((x_{3},y_{3})=\) \(\biggr( \frac{F_5+F_4+F_3}{N_I},\) \(\frac{H_5+H_4+H_3}{N_L}\biggr)\)
\((x_{4},y_{4})=\) \(\biggr( \frac{F_5+F_4}{N_I},\) \(\frac{H_5+H_4}{N_L}\biggr)\)
\((x_{5},y_{5})=\) \(\biggr( \frac{F_5}{N_I},\) \(\frac{H_5}{N_L}\biggr)\)

Is this redundant, ha,… my head ache will kill me,… bye bye, pretty crowd! I have got some aches, and one of then like hemorrhoids. What the hell! :’-D

per image or per lesion

For our purposes such as comparison of modality or fitting FROC curve, the following FPF per lesion instead of per image also make sense.

\[ (x_{c},y_{c}):=( \sum_{c\leq c' \leq C } F_{c^\prime}/N_{L}, \sum_{c\leq c' \leq C } H_{c^\prime}/N_{L}) \in \mathbb{R}^2, \]

The replacement of per image by per lesion is not essential, if reader execute fit_GUI() then reader can observe the change of number of image does not change the FROC curve or AFROC curve. Un- changeless of the curve shape implies that for the modality comparison issue we can use FPF per lesion. The fitting with FPF per lesion is implemented the following code;

So the AUCs of per image and per lesion coincide.

In classical FROC analysis, we always work with FPF per image, the author wishes to work with FPF per lesion, since the number of images are redundant parameter. Of course, noise and signal distribution is affected number of images, but such latent structure is not matter for modality comparison.

FROC curve and AFROC curve

We shall give a definition of FROC curve.

\(\color{green}{\textit{What is FROC curve? }}\)

“My teeth! FROC curve is a visulalization of our fitted model! :-D”

“My teeth! FROC curve is a visulalization of our fitted model! :-D”

Pre FROC curve

Definition of pre FROC curve.

To define the FROC curve, we first define pre FROC curve.

Define

\(y(t):=\mathbb{P}[Y>t]\) and \(x(t):=\mathbb{P}[X>t]\).

Then we obtain the pre FROC curve:

\[ \mathbb{R} \longrightarrow \mathbb{R}^2 \\ t \mapsto (x(t),y(t)). \]

Expression of pre FROC curve.

To calculate \(y(t):=\mathbb{P}[Y>t]\) and \(x(t):=\mathbb{P}[X>t]\) more explicitly, we recall that

\[X \sim \text{Normal}(0,1), \\ Y \sim \text{Normal}(\mu,\sigma^2),\] , which is sometimes called the bi-normal assumption. Note that if \(\mu\) is far from the mean of \(X\) (that is 0), then the observer performance will be greater, since the noise and signal are well separated each other.

Please keep in mind that the parameter \(\mu,\sigma^2\) is a part of parameter of our model which should be estimated.

Since \(\frac{Y -\mu}{\sigma}\) is distributed by the standard Gaussian, we can calculate the probability of the event that \(\frac{Y -\mu}{\sigma} > \frac{t -\mu}{\sigma}\) and it follows that

\[y(t):=\mathbb{P}[Y>t] = \mathbb{P}[\frac{Y -\mu}{\sigma} > \frac{t -\mu}{\sigma}] = 1-\Phi( \frac{t -\mu}{\sigma}) \]

On the other hand, \[x(t):=\mathbb{P}[X>t] = 1-\Phi( t),\] thus \[t = t(x) = \Phi ^{-1} (1-x(t) ).\] Substituting this, we get \[y(t) = 1-\Phi( \frac{t -\mu}{\sigma}) = 1-\Phi( \frac{ \Phi ^{-1} (1-x(t) ) -\mu}{\sigma})\]

Calculation for FROC curve.

In the FROC data, each hits and false alarms are counted for each confidence level.

we introduced the notations, \(N_L\), \(N_I\), \(H_c\), \(F_c\), \(C\). In the R console, these notations are represented by NL, NI, h, f, C.

If \(C=5\), then the dataset for FROC analysis is the follows;

Confidence Level No. of Hits No. of False alarms
5 = definitely present \(H_{5}\) \(F_{5}\)
4 = probably present \(H_{4}\) \(F_{4}\)
3 = equivocal \(H_{3}\) \(F_{3}\)
2 = probably absent \(H_{2}\) \(F_{2}\)
1 = questionable \(H_{1}\) \(F_{1}\)

We divide the set of 1 dimensional real numbers into \(z_1 < z_2 < .... < z_C\). Combining these thresholds with the parameter of the bi-normal assumptions \(\mu,\sigma^2\) , we obtain our model parameter. Other model parameters, for example \(\lambda_c\) can be deduced from thresholds by \(\lambda _c = -\log \Phi({z_c})\).

And we consider the hits rate are generated by \[p_{c} := \Phi (\frac{z_{c +1}-\mu_{}}{\sigma_{}})-\Phi (\frac{z_{c}-\mu_{}}{\sigma_{}})=\mathbb{P}[z_c <Y_l<z_{c+1}], \]

We also define the sequence \(\lambda_1 < \lambda_2 < .... < \lambda_C\) such that \[F_c+...+F_C \sim \text{Poisson}(\lambda_cN_I)\] ,where \(N_I\) is the number of images. This assumption is called the Poisson assumption in FROC context. The following condition combine the bi-normal assumption and the Poisson assumption:

\[ \mathbb{P}[X < z_1]= \mathbb{P}[F_1+F_2+F_3+...+F_C =0]\\ \mathbb{P}[X < z_2]= \mathbb{P}[F_2+F_3...+F_C =0]\\ .....\\ \mathbb{P}[X < z_c]= \mathbb{P}[F_c+...+F_C =0]\\ ...\\ \mathbb{P}[X< z_C]= \mathbb{P}[ F_C =0]\\ \] which is equivalent that

\[\Phi(z_c) = e^{-\lambda_c}\]

for all \(c\).

Or equivalently,

\[1-x(z_c) = e^{-\lambda_c}\]

Thus we can get a correspondence between Poisson rate \(\lambda\) and the threshold \(z\).

Recall that FROC curves parameter \(t\) is the range of the Gaussian random variable, we use \(z\) instead of \(t\), then the curve is \[y(t) =y(z)= 1-\Phi( \frac{t -\mu}{\sigma}) = 1-\Phi( \frac{ \Phi ^{-1} (1-x(t) ) -\mu}{\sigma})\\ = 1-\Phi( \frac{ \Phi ^{-1} (1-x(z) ) -\mu}{\sigma})\\ = 1-\Phi( \frac{ \Phi ^{-1} (e^{-\lambda} ) -\mu}{\sigma})\\ \] In FROC analysis, the parameter of the FROC curve is taken by \(\lambda\).

Where the last equality, we use the relation \(1-x(z) = e^{-\lambda}\) which is the analogy \({}^{\dagger}\) (see Appendix) of \(1-x(z_c) = e^{-\lambda_c}\).

Thus we can obtain the FROC curve as the following;

Definition of FROC curve

From the above arguments, we obtains the following definition of FROC curve:

\[ \mathbb{R} \longrightarrow \mathbb{R}^2 \\ \lambda \mapsto (\lambda,y(\lambda) ) \]

where \(y(\lambda) = 1-\Phi( \frac{ \Phi ^{-1} (e^{-\lambda} ) -\mu}{\sigma})\) and \(\mu,\sigma\) are parameter representing the mean and standard deviation of signal distribution.

Property of the FROC curve

\[ y^{\text{FROC}}(\lambda):= 1-\Phi( \frac{ \Phi ^{-1} (e^{-\lambda} ) -\mu}{\sigma})\\ x^{\text{FROC}}(\lambda) := \lambda \]

Note that FROC curve can be interpreted as the curve of the Expectation of the pair of TPF and FPF, where we use two abbreviations FPF = False Positive Fraction and TPF = True Positive Fraction. These words are widely used in the ROC theory and thus we omit the definition.

In mathematical philosophy, the expression is not important. The most important thing in mathematical expression is the property. So, in the case of the definition of the notion of the FROC curve, the above equations to represents the FROC curve is not important, so, reader may forget it !! I also forget the expressions. However we never forget about the property that the FROC curve is expectation pair of the FPF and TPF. That is, we can write that

\[ \mathbb{E}[ \sum _{ c \leq c' \leq C }^C H_{c'}/N_L ] = y^{\text{FROC}}(\lambda_c) \\ \mathbb{E}[ \sum _{ c \leq c' \leq C}^C F_{c'}/N_I ] = x^{\text{FROC}}(\lambda_c) \\ \]

for all \(c=1,2,...,C\).

In fact,

\[\begin{eqnarray*} (x^{\text{FROC}}(\lambda_c),y^{\text{FROC}}(\lambda_c))&:=&\bigl(\lambda_c , \mathbb{P} \bigr\{ Y > z_c \bigl\} \bigr),\\ &=&\bigl(\lambda_c , \sum_{c\leq c' \leq C } \mathbb{P} \bigr\{ z_{c^\prime+1}> Y > z_{c^\prime} \bigl\} \bigr),\\ &=&\bigl(\mathbb E [ \sum_{c\leq c' \leq C } F_{c^\prime}]/N_I , \mathbb E [\sum_{c\leq c' \leq C } H_{c^\prime}]/N_L \bigr),\\ \end{eqnarray*}\]

where \(Y\) is a random variable distributed by the signal distribution in the bi-normal assumption.

Consequently, we can say that

\[\color{red}{\textit{FROC curve is an expected curve for the FPF and TPF. }}\]

I think this equality is more important than the expression of the definition of FROC curve.

From this formula, we can obtain the interpretation of the FROC curves as expected pairs of cumulative false positives per image and cumulative hits per lesion. Therefore, under this interpretation, we can evaluate how our model fits the data by plotting the cumulative false positives per image and cumulative hits per lesion calculated from an actual dataset. More precisely, Recall that we defined the pairs of FPF and TPF as follows: \[\begin{eqnarray*} (x_{c},y_{c}):=( \sum_{c\leq c' \leq C } F_{c^\prime}/N_{I}, \sum_{c\leq c' \leq C } H_{c^\prime}/N_{L}) \in \mathbb{R}^2,\\ \end{eqnarray*}\] where \(c\) is the \(c^{\text{th}}\) confidence level, and we also call these points the cumulative false positive-cumulative true positive (CFP-CTP) points over \((N_I, N_L)\). If the model is correct, then the FROC curve passes through the expected CFP-CTP points and provides insight into \[\color{blue}{\textit{ how the model makes sense as a model. }}\]

Appendix Definition of FROC curves

Note that AUC of FROC is infinity. So, in this package AUC means the next AFROC curve’s AUC.

\(\Phi()\) denotes the cumulative distribution function of the Gaussian distribution with mean 0 and variance 1. \(\Phi^{-1}()\) is the inverse mapping of \(\Phi()\).

Let \((x,y)\) be a Cartesian coordinate system on a plane. Then, for any non-negative real number \(\lambda\), the FROC curve is defined as \[ (x(\lambda),y(\lambda)) := \bigl(\lambda , 1-\Phi(b\Phi ^{-1}(\exp(-\lambda)) -a ) \bigr) \in \mathbb{R}^2 \] Or equivalently,

\[ (x(\lambda),y(\lambda)) := \bigl(\lambda , \text{Prob} \bigr\{ Y > \Phi ^{-1}(\exp(-\lambda)) \bigl\} \bigr) \in \mathbb{R}^2. \]

where \(Y\) is a signal distribution, \(a =\mu/\sigma\) and \(b=\sigma\).
It is easy to see that these two definition is same.

Definition of AFROC curves

Let \((\xi, \eta)\) be a Cartesian coordinate system for the square \(0\leq \xi \leq 1, 0\leq \eta \leq 1\). Then, for any non-negative real number \(\lambda\), we define an AFROC curve as \[(\xi(\lambda), \eta(\lambda)) := \bigl(1-\exp(-\lambda) , 1-\Phi(b\Phi ^{-1}(\exp(-\lambda)) -a ) \bigr) \in \mathbb{R}^2. \]

AUC: Calculate the Area under the AFROC curve

If beginner of FROC analysis, should skip this section, but keep in mind the following rough shape of the are under the AFROC curve;

\[\text{AUC} = \Phi\biggr(\frac{a}{\sqrt{1+b^2}}\biggr)\].

No need to remember its precise form. To tell the truth, I forgot it now \(( \check{} \omega \check{})\), and never recall it again. Even if I programmed this package, I always forgot it. So I never force reader to remember it, since I perfectly forgot it! In this section, we prove the above equation. To tell the truth the author calculates this section 2 years later after making this package. So, no need to understand this section.

The author thinks this section will let reader go to bed and says that I’ll never be back. Even if terminator, he will never be back in this section. To tell the truth, the author also thinks I’ll never be back. Anyway, we should be terminator or Batman (I like Michael Keaton) to understand this section. Note that I am not a Spider man generation but I think Spider man is good too. Ha,.. what I says … I do not know,…please leave me alone.

Note that terminator 2 is great, but I am afraid that 3,4,5,… is not so,… James Cameron is best!

On the other hand, 1-st season Dark Angel is best, but 2-nd season is not so,…

Anyway, in the following, the author provides a lullaby in which the calculation of the area under the AFROC curve is shown. If reader cannot sleep only this section, then I suggest the following heavy book, so heavy, now my arm aches cannot take such heavy book, ha,…aches make my life poor. Anyway, for similar calculation, we refer to the following;

  • Dev Chakraborty; Observer Performance Methods for Diagnostic Imaging, Foundations, Modeling, and Applications with R-Based Examples ; pp.116-119

In order to spare the read additional labor we have been careful to give all calculations, which are seemed redundant for someone. In order to calculate AUC in a simple manner, we need a notion of convolution.

Let \(f\) and \(g\) be two real valued function defined on the real numbers. Then the convolution of \(f\) and \(g\) is defined by

\[ (f\otimes g)(x):=\int_{\mathbb{R}}f(x-y)g(y)dy. \] The Gaussian density has simple algebraic structure for convolution.

Let us denote the density function of Gaussian with mean \(m\) and variance \(v\) by \(N_{m,v}()\), that is:

\[ N_{m,v}(x)=\frac{1}{\sqrt{2\pi v}}\exp\frac{-(x-m)^2}{2v}. \]

Then

\[ N_{m,v}\otimes N_{m',v'}=N_{m+m',v+v'}. \]

My mathematical heart gets excited, ha,.. algebraization (not Alexander Grothendieck manner but,..) is,… ha beautiful, I want to get married with this formula.

We also use the following trivial relation;

\[ N_{0,v}(x)=\frac{1}{\sqrt{v}}N_{0,1}(\frac{x}{\sqrt{v}}). \]

Recall that we define AFROC curve for any non-negative real number \(\lambda\), as follows: \[(x(\lambda), y(\lambda)) := \bigl(1-\exp(-\lambda) , 1-\Phi(b\Phi ^{-1}(\exp(-\lambda)) -a ) \bigr) \in \mathbb{R}^2. \]

The AUC is the following Lebesgue-Stieltjes integral

\[ \int y(\lambda) dx(\lambda). \]

Since \(\lambda\) moves from \(0\) to \(\infty\), the \(x(\lambda) =1-\exp(-\lambda)\) moves from \(x=0\) to \(x=1\),

\[ \int y(\lambda)) dx(\lambda) = \int_{x=0}^1 \biggl\{ 1-\Phi(b\Phi ^{-1}( 1- x ) -a ) \biggr\} dx\]

Set \(\xi:=\Phi ^{-1}( 1- x )\), then \(N_{0,1}(\xi)d\xi=-dx\) and hence

\[\int_{\xi=+\infty}^{\xi=-\infty} \biggl\{ 1-\Phi(b\xi -a ) \biggr\}N_{0,1}( \xi )d\xi,\]

and immediately,

\[1-\int_{\xi=+\infty}^{\xi=-\infty} \biggl\{ \Phi(b\xi -a ) \biggr\}N_{0,1}( \xi )d\xi,\] so,

\[1-\int_{\xi=+\infty}^{\xi=-\infty} \biggl\{ \int^{\eta=b\xi -a }_{\eta=-\infty}N_{0,1}(\eta) \biggr\}N_{0,1}( \xi )d\xi d\eta.\] Using coordinate transformation \((X,Y)=:(\xi,\eta-b\xi)\) with its Jacobian \(\partial (\xi,\eta)/\partial (X,Y)=1\),

\[1-\int_{X=+\infty}^{X=-\infty} \biggl\{ \int^{Y= -a }_{Y=-\infty}N_{0,1}(Y-bX) \biggr\}N_{0,1}( X )dX dY.\] To apply the convolution formula, we change the integrand such as \[1-\int_{X=+\infty}^{X=-\infty} \biggl\{ \int^{Y= -a }_{Y=-\infty}N_{0,1}(b[Y/b-X]) \biggr\}N_{0,1}( X )dX dY,\] and

\[1-\int_{X=+\infty}^{X=-\infty} \biggl\{ \int^{Y= -a }_{Y=-\infty}\frac{1}{b}N_{0,1/b^2}(Y/b-X) \biggr\}N_{0,1}( X )dX dY.\] We can rewrite it as notation of convolution;

\[1- \int^{Y= -a }_{Y=-\infty}\frac{1}{b}(N_{0,1/b^2}\otimes N_{0,1})( Y/b ) dY.\]

By the convolution formula, \[1- \int^{Y= -a }_{Y=-\infty}\frac{1}{b}N_{0,1+1/b^2}(Y/b) dY,\]

that is,

\[1- \int^{Y= -a }_{Y=-\infty}\frac{1}{b}N_{0,\frac{1+b^2}{b^2}}(Y/b) dY.\] Using the trivial relation of two Gaussians,

\[1- \int^{Y= -a }_{Y=-\infty}\frac{1}{b} \sqrt{\frac{b^2}{1+b^2}}N_{0,1}(\sqrt{\frac{b^2}{1+b^2}}Y/b) dY,\] that is, \[1- \int^{Y= -a }_{Y=-\infty} \sqrt{\frac{1}{1+b^2}}N_{0,1}(\sqrt{\frac{1}{1+b^2}}Y) dY\] Set \(Z:=\sqrt{\frac{1}{1+b^2}}Y\), then \[1- \int^{Y= -a \frac{1}{\sqrt{1+b^2}}}_{Y=-\infty}N_{0,1}(Z) dZ.\]

Using the definition of \(\Phi()\),

\[1- \Phi\biggr(\frac{-a}{\sqrt{1+b^2}}\biggr)\]

Using the definition of \(\Phi()\) again, finial, we obtain the desired expression of AUC, that is, the area under the AFROC curve (Not confuse with FROC curve), \[\text{AUC} = \Phi\biggr(\frac{a}{\sqrt{1+b^2}}\biggr)\].

So, my lullaby finished. Many people had gone or be sleeping?,..go to bed? I also want to sleep, here is freedom of expression, no reviewer, ha, Is there someone? help,… Anyway Good night!

Proof of the convolution of two Gaussians

Let \(f\) and \(g\) be two real valued function defined on the real numbers. Then the convolution of \(f\) and \(g\) is defined by

\[ (f\otimes g)(x):=\int_{\mathbb{R}}f(x-y)g(y)dy. \] The Gaussian density has simple algebraic structure for convolution.

Recall that we denoted the density function of Gaussian with mean \(m\) and variance \(v\) by \(N_{m,v}()\), that is:

\[ N_{m,v}(x)=\frac{1}{\sqrt{2\pi v}}\exp\frac{-(x-m)^2}{2v}. \]

In the calculation of AUC, we used the following;

\[ N_{m,v}\otimes N_{m',v'}=N_{m+m',v+v'}. \]

In this section, we shall prove this formula.

Let \((\Omega, \frak{B},\mathbb{P})\) be a probability space. Let \(X\) be a read valued random variable defined on \(\Omega\). Then, for any \(x\in \mathbb{R}\), distribution function \(F\) is defined by;

\[ F(x) := \mathbb{P}(X<x). \]

For any Borel subset \(E\) of \(\mathbb{R}\) we define the push forward measure \(\mathbb{P}^X\) by

\[ \mathbb{P}^X(E) := \mathbb{P}(X\in E) \]

which is measure on target space of \(X\) induced by \(\mathbb{P}\).

We define \[\widehat{F}(\xi) := \int^\infty_{-\infty} e^{\sqrt{-1}\xi x}dF(x)=\int^\infty_{-\infty} e^{\sqrt{-1}\xi x}\mathbb{P}^X(dx)\] When \(F=N_{m,v}\), we denote \(\widehat{F}\) by \(\widehat{F}_{m,v}\).

I am tired,..I stop it,…

UNDER CONSTRUCTION

Appendix: \(1-x(z) = e^{-\lambda}\) which is the analogy \({}^{\dagger}\)

In this section, we explain where this equality comes from. From the previous section we use the discrete rating \(1,2,...,c,...,C\). But now we shall consider continuous rating case. Since continuous rating contains discrete rating, once we obtain the \(1-x(z) = e^{-\lambda}\) for continuous rating, then we can use it in discrete rating case. We shall denote reader’s continuous rating by \(\gamma \in \mathbb{R}\) instead of \(c =1,2,...,C\) for discrete case. For example, \(F_\gamma\) means the number of false positives with confidence level is greater than \(\gamma\). Similarly, \(H_\gamma\) means the number of false positives with confidence level is greater than \(\gamma\). So, if we denote the cumulative false positives (FPF) with confidence level greater than \(\gamma\) by \(f_\gamma\), then we may write the cumulative false positives per image in continuous rating case as

\[ f_\gamma = \int_\gamma ^\infty F_\gamma d \gamma/N_I, \] which is analogy of the discrete case: \[ \frac{F_c+F_{c+1}+...+F_C}{N_I}, \] where \(N_I\) denotes the number of images.

The analog of \[F_c+...+F_C \sim \text{Poisson}(\lambda_cN_I), \]

is

\[ \int_\gamma ^\infty F_\gamma d \gamma\sim \text{Poisson}(\lambda _\gamma N_I). \]

The analog of the followings discrete case;

\[ \mathbb{P}[X < z_1]= \mathbb{P}[F_1+F_2+F_3+...+F_C =0],\\ \mathbb{P}[X < z_2]= \mathbb{P}[F_2+F_3...+F_C =0],\\ .....\\ \mathbb{P}[X < z_c]= \mathbb{P}[F_c+...+F_C =0],\\ ...\\ \mathbb{P}[X< z_C]= \mathbb{P}[ F_C =0],\\ \]

is

\[ \mathbb{P}[X < z_ \gamma] = \mathbb{P} [\int_\gamma ^\infty F_\gamma d \gamma \neq0 ], \]

or equivalently

\[ \mathbb{P}[X < z_ \gamma] = 1- \mathbb{P} [\int_\gamma ^\infty F_\gamma d \gamma =0 ]. \] From which, we obtain

\[1-\Phi(z_ \gamma) = e^{-\lambda_ \gamma}\].

for all \(\gamma\) and hence

\[1-\Phi(z) = e^{-\lambda}\].

In the previous notation, it is

\[1-x(z) = e^{-\lambda}.\]

\(\color{green}{\textit{Statistical Test }}\)

Bayesian ANOVA via Bayes factor

Reference: “A Default Bayesian Hypothesis Test for ANOVA Designs”

Ruud Wetzels, Raoul P.P.P.Grasman, and Eric-Jan Wagenmakers

“X-ray of my teeth! I cannot understand why the result is not desired one, so I have to say this program in under construction !”

“X-ray of my teeth! I cannot understand why the result is not desired one, so I have to say this program in under construction !”

We can test the following null hypothesis for an FROC data in case of the multiple reader and multiple modality:

\[ H_0: \text{All modalities are same.} \]

What is Bayesian ANOVA?
  1. Make a null hypothesis \(H_0\) and its alternative \(H_1\).
  2. Make two models \(\mathcal{M}_0\) constructed under \(H_0\) and \(\mathcal{M}_1\) made under \(H_1\).
  3. Model selection based on WAIC or Bayes factor
  4. If \(\mathcal{M}_1\) is selected, then the null hypothesis \(H_0\) is rejected.

Note that p value are not used.

Bayesian ANOVA is problematic

E.g.,

Similarly,

but …. you know.. frequentist statistical test is also problematic.

Null Hypothesis Model vs Alternative Hypothesis Model

In this package, there are two hierarchical Bayesian model. One is constructed under the null hypothesis that all modalities are same, and the another is constructed under the alternative hypothesis that at least one modality is not same with the other modality, and we call the former the null hypothesis model or simply the null model, the latter the alternative model. By comparing the null model and alternative model based on Bayes factor, we can test the null hypothesis.

The alternative model is same the model which is already shown in the previous section.

In the following we show the null model.

Null Hypothesis model

\[\begin{eqnarray*} H_{c,m,r} & \sim &\text{Binomial }( p_{c,m,r}, N_L ),\\ F_{c,m,r} &\sim& \text{Poisson }( ( \lambda _{c} - \lambda _{c+1})N_L ),\\ \lambda _{c}& =& - \log \Phi (z_{c }),\\ p_{c,m,r} &:=&\Phi (\frac{z_{c +1}-\mu_{m,r}}{\sigma_{m,r}})-\Phi (\frac{z_{c}-\mu_{m,r}}{\sigma_{m,r}}), \\ p_C & =& 1-\Phi (\frac{z_{C}-\mu_{m,r}}{\sigma_{m,r}}),\\ F_{C,m,r} & \sim &\text{Poisson } ( (\lambda _{C} - 0)N_I),\\ A_{m,r}&:=&\Phi (\frac{\mu_{m,r}/\sigma_{m,r}}{\sqrt{(1/\sigma_{m,r})^2+1}}), \\ A_{m,r}&\sim&\text{Normal} ( \color{red}{A},\sigma_{r}^2), \\ dz_c&:=&z_{c+1}-z_{c},\\ dz_c, \sigma_{m,r} &\sim& \text{Uniform}(0,\infty),\\ z_{c} &\sim& \text{Uniform}( -\infty,100000),\\ \color{red}{A} &\sim& \text{Uniform}(0,1).\\ \end{eqnarray*}\] Our new model has parameters \(z_{1}, dz_1,dz_2,\cdots, dz_{C}\), \(A\), \(\sigma_{r}\), \(\mu_{m,r}\), and \(\sigma_{m,r}\).

The R scripts for fitting the null model.

Alternative hypothesis model

The following model is the same in the previous sections, so we reprint here;

\[\begin{eqnarray*} H_{c,m,r} & \sim &\text{Binomial }( p_{c,m,r}, N_L ),\\ F_{c,m,r} &\sim& \text{Poisson }( ( \lambda _{c} - \lambda _{c+1})N_L ),\\ \lambda _{c}& =& - \log \Phi (z_{c }),\\ p_{c,m,r} &:=&\Phi (\frac{z_{c +1}-\mu_{m,r}}{\sigma_{m,r}})-\Phi (\frac{z_{c}-\mu_{m,r}}{\sigma_{m,r}}), \\ p_C & =& 1-\Phi (\frac{z_{C}-\mu_{m,r}}{\sigma_{m,r}}),\\ F_{C,m,r} & \sim &\text{Poisson } ( (\lambda _{C} - 0)N_I),\\ A_{m,r}&:=&\Phi (\frac{\mu_{m,r}/\sigma_{m,r}}{\sqrt{(1/\sigma_{m,r})^2+1}}), \\ A_{m,r}&\sim&\text{Normal} ( \color{red}{ A_{m} },\sigma_{r}^2), \\ dz_c&:=&z_{c+1}-z_{c},\\ dz_c, \sigma_{m,r} &\sim& \text{Uniform}(0,\infty),\\ z_{c} &\sim& \text{Uniform}( -\infty,100000),\\ \color{red}{ A_{m} } &\sim& \text{Uniform}(0,1).\\ \end{eqnarray*}\] Our new model has parameters \(z_{1}, dz_1,dz_2,\cdots, dz_{C}\), \(A_{m}\), \(\sigma_{r}\), \(\mu_{m,r}\), and \(\sigma_{m,r}\).

Distinction of null model and alternative model

The parameter \(\color{red}{ A_{m} }\) are used in the alternative model instead of the parameter \(\color{red}{ A}\) in the null hypothesis model.

R script for ANOVA

In the following, we provide R scripts which implement the Bayesian ANOVA to test the null hypothesis that each modalities are same.

The internal of the code, the functions in the package bridgesampling are running.

R often crashes when the codes run, sorry,… I do not know why ? It is heavy ??

Bayes Factor

R code for comparison of two objects of class stanfit

Let fitH0 and fitH0 be stanfit objects. The following code compare these two objects by the Bayes factor.

Limitation (no need to read)

Our modeling begin from the hits and false alarms obtained reducing the primitive FROC data, and the later is more informative, and the so-called figure or merit (FOM) is calculated with more informative dataset instead of the author’s dataset.

I have to say our hierarchical model is a toy. The starting point of this model has very problematic since the data of hits and false alarms are obtained by reducing the information of FROC trials. The Jafroc data formulation contains the more precisely information and the notion of figure of merit is obtained by using this precise information. Thus without changing our modeling starting point, our model have to be said toy. However, This is the first attempt in the world to analyze the FROC by Bayesian model, I think it is sufficient. This is only hobby, I am not a statistician nor radiologist nor academic post. I am only unemployed dirty old man. Good Luck for future research and my future. ..ha i have a ache in my arms, right arms, so I no longer write the R scripts or .. ha,….

\(\color{green}{\textit{Validation }}\) of our model

Simulation Based Calibration (SBC)

Talts, S., Betancourt, M., Simpson, D., Vehtari, A., and Gelman, A. (2018). Validating Bayesian Inference Algorithms with Simulation-Based Calibration. arXiv preprint arXiv:1804.06788

SBC is a method to detect bias in MCMC sampling. If we identify prior of model, then SBC algorithm return pseudo uniform distribution, and if the pseudo uniform distribution is exactly uniform distribution, then we cannot conclude the MCMC sampling has bias, and if pseudo uniform distribution is far from uniformity, then we reject the hypothesis that MCMC sampling has no bias. In 2019 July, 15 the author implemented the SBC using rstan’s function sbc for the case of single reader and single modality.

Example code for SBC

Excuting the above R code, the histograms are drawn for each parameters of model. Then we will see how our model has correct MCMC sampling.

chi square goodness of fit 1

Let \(\hat{\theta} = \int \theta \pi(\theta|D_0)d\theta\) where \(\pi(\theta|D_0)\) is a posterior measure for the given data \(D_0\) and model parameter \(\theta\). Then, we calculate the chi square in the following manner:

\[\chi^2(D_0) = \chi^2(D_0| \hat{\theta}).\]

It is very easy to calculate and the value has very compatible with the data plot and estimated FROC curve, thus I trust this value.

I did not know such quantities in any book, so it might be not so good in theoritical perspective, but I use this since its calculation is very easy from a fitted model object of class stanfit.

chi square goodness of fit 2

Posterior predictive p value.

p value of \(\chi^2\) goodness of fit
“X-ray of my teeth!! Our MCMC sample contains bias ?”

“X-ray of my teeth!! Our MCMC sample contains bias ?”

First, we outline the p value of the chi square goodness of fit statistics for Bayesian version. In frequentist p values are calculated with data \(D\) and an estimated parameter \(\theta\) such as maximal likelihood estimates (MLEs) \(\hat{\theta}_{\text{MLE}}(D)\), so we may write \(\chi^2(D)=\chi^2(D|\hat{\theta}_{\text{MLE}}(D))\) in freqentist context. However, in Bayesian context, the parameter of model is not deterministic, so, the p values in Bayesian context also depend on the MCMC samples of parameter of model. To obtain the p values in deterministic way in Bayesian context, we integrate a p values by the posterior predictive distribution. The posterior predictive distribution \(\pi(D|D_0)\) for given data \(D_0\) is defined by the posterior mean of the likelihood;

\[\pi(D|D_0) := \int f(D|\theta)\pi(\theta|D_0)d\theta \times \text{const}\]

where $ f(D|)$ denotes likelihood and \(\pi(\theta|D_0)\) denotes posterior. So, \(\pi(D|D_0)\) is a probability measure on the set of all data-sets \(\{ D|D\text{ is data} \}\) under the given data \(D_0\). I lost my hair,… ha,… why? I love my hair, .. but my hair leave me alone, … why ? why?

There are possible three chi squares depend only on given data \(D_0\).

  • \(\chi^2(D_0) = \int \chi^2(D|\theta) f(D|\theta)\pi(\theta|D_0)d\theta dD\)

  • \(\chi^2(D_0) = \int \chi^2(D_0|\theta) \pi(\theta|D_0)d\theta\)

  • \(\chi^2(D_0) = \chi^2(D_0|\int \theta \pi(\theta|D_0)d\theta )\)

I use the first and the third quantity.

I am not sure their precise distinction. If someone is familiar with it, please tell me :-)

For the detail of p value, please see the Gelman’s book “ Bayesian Data Analysis”.

The package BayesianFROC implement the calculation of the posterior probability p value for the chi square goodness of fit statistics for a single reader and single modality FROC model.

In this vignette, we explain how to run the function which calculate the posterior predictive p value.

  • Prepare data in case of a single reader and single modality

  • Get fitted model object of class stanfitExtended by fitting model to data via the function fit_Bayesian_FROC() in this package.

  • Calculate the posterior predictive p value via p_value_of_the_Bayesian_sense_for_chi_square_goodness_of_fit()

So all we have to do is simple, that is, merely run only two functions.

  • fit_Bayesian_FROC()
  • p_value_of_the_Bayesian_sense_for_chi_square_goodness_of_fit()

The resulting object contained p value.

Using pipe operator %>% from the package stringr, the whole code can be written as follows;

Example Code to Calculate the Posterior Predictive P value

Outputs of p_value_of_the_Bayesian_sense_for_chi_square_goodness_of_fit()
  • p value
  • Chi squares

In the following, we shall show what the it means in the printed result in the R console when we execute the above code to evaluate p-value.

The last number 0.916375 is the desired p value of the chi square in the Bayesian context according to Gelmann’s book.

  • The first column means the sample ID of the Hamiltonian Monte Carlo Simulation without warming up.
  • The second column means the value of the chi square with observed data and the corresponding MCMC sample.
  • The Third column means the value of the chi square with replicated data and an MCMC sample, I forget,
  • The fourth column means the inequality that the second column < the third column. So these inequality is required for calculating p value of the Bayesian seance according to Gelmann’s book.
  • More TRUE in the 4-th column means better fitting model to data
  • p-value is proportion to the number of TRUE in the 4-th column.

Running the above R script, the graphic devices appears and shows the replicated FROC data from the posterior predictive distribution are drawn. The sample are used to calculate the p-value in the Bayesian sense, we needs samples form the* posterior predictive distribution*.

Example of p value for \(\chi^2\) goodness of fit

Note that p-value is depend on only data. So, odd data cause the small p value, which will rejects the null hypothesis that fitting is good.

Example; \(\color{red}{\textit{ Good }}\) fitting to data dataList.Chakra.1

Intuitively, all data points drawn by circle are on FROC curve, thus we may say \(\color{red}{\textit{ good }}\) fitting. \(\color{red}{\textit{P value is 0.925. }}\)

" p value = 0.925, good fitting"

" p value = 0.925, good fitting"

Example; \(\color{red}{\textit{ Bad }}\) fitting to data BayesianFROC::dataList.High

Intuitively, two data points drawn by circle are not on FROC curve, thus we may say \(\color{red}{\textit{ bad }}\) fitting. \(\color{red}{\textit{P value is 0.669. }}\) which is smaller than the above 0.925, indicating fitting is not better than above.

" p value = 0.669, bad fitting"

" p value = 0.669, bad fitting"

\(\color{blue}{\textit{The above two examples show that p value is }} \color{red}{\textit{ compatible }} \color{blue}{\textit{with our intuition. }}\)

R code: p value of \(\chi^2\) goodness of fit statistic

Appendix: Theory of P value in Bayesian sense

In the following, we shall use the general notation. Let \(y_{\text{obs}}\) be observed data and \(f(y|\theta)\) be a model (likelihood) for a future data \(y\). We write a prior and a posterior distributions by \(\pi(\theta)\) and \(\pi(\theta|y) \propto f(y|\theta)\pi(\theta)\). The posterior predictive distribution is defined by \(p(y|y_{\text{obs}}) := \int f(y|\theta)\pi(\theta|y_{\text{obs}}) d\theta\)

In our case, the data \(y\) is a pair of hits and false alarms, that is \(y=(H_1,H_2, \dots H_C; F_1,F_2, \dots F_C)\) and \(\theta = (z_1,dz_1,dz_2,\dots,dz_{C-1},\mu, \sigma)\). We shall define the \(\chi^2\) discrepancy ( goodness of fit statistics ) to validate that the model fit the data.

\[ T(y,\theta) := \sum _{c=1,\cdots,C} \biggr( \frac{[H_c-N_L\times p_c]^2}{N_L\times p_c}+ \frac{[F_c-(\lambda _{c} -\lambda _{c+1} )\times N_{I}]^2}{(\lambda _{c} -\lambda _{c+1} )\times N_{I} }\biggr). \]

for single reader and single modality case.

\[ T(y,\theta) := \sum _{r=1}^R \sum _{m=1}^M \sum _{c=1}^C \biggr( \frac{[H_{c,m,r}-N_L\times p_{c,m,r}]^2}{N_L\times p_{c,m,r}}+ \frac{[F_{c,m,r}-(\lambda _{c} -\lambda _{c+1} )\times N_{L}]^2}{(\lambda _{c} -\lambda _{c+1} )\times N_{L} }\biggr). \]

for multiple reader and multiple case, where the false positive fraction is per lesion instead of per image.

Note that \(p_c\) and \(\lambda _{c}\) depend on \(\theta\). In this statistic, the number of degrees of freedom is \(C-2\).

Classical frequentist methods, the parameter \(\theta\) is a fixed estimates, e.g. the maximal likelihood estimator, however, in Bayesian context, the parameter is not deterministic, so, by integrating with the posterior predictive distribution, we can get the posterior predictive \(\chi^2\) value and its p-value. Let \(y_{\text{obs}}\) be observed data. Then the posterior predictive p value is defined by

\[ p \text{ value of $y_{\text{obs}}$} = \int \int dy d\theta I_{T(y,\theta) > T(y_{\text{obs}},\theta)}f(y|\theta)\pi(\theta|y_{\text{obs}}) \]

In the following, we show how to calculate the double Integral. Suppose that \(\theta _1, \theta_2,\cdots,\theta_N\) are samples from the posterior distribution via Hamiltonian Monte Carlo simulation, we obtain a sequence of models (likelihoods) ; \(f(y|\theta_1),f(y|\theta_2),\cdots, f(y|\theta_N)\). Drawing the samples \(y_1,y_2,\cdots, y_N\) so that each \(y_i\) is a sample from the distribution whose density function is \(f(y|\theta_i)\), i.e., in conventional notation, we may write;

\[ y_1 \sim f(\cdot|\theta_1),\\ y_2 \sim f(\cdot|\theta_2),\\ \cdots,\\ y_N \sim f(\cdot|\theta_N). \]

and it is desired one, that is, we can interpret that the samples \(y_1,y_2,\cdots, y_N\) is drawing from the posterior predictive distributions. Using the law of large number, we can calculate the noble integral of the p value by

\[ p \text{ value} =\frac{1}{N} \sum I_{T(y_i,\theta_i) > T(y_{\text{obs}},\theta_i)} \]

Validation: Replicate various data-sets from a fixed true distribution
“X-ray of my teeth!!”

“X-ray of my teeth!!”

I implement whether estimates contains bias or not. But, now, I exclude the dataset that the its fitting has not pass the R hat criterion, so I have to say the bias is calculated in convergence case only, and it is not suitable, so I will exclude it. But program shows the convergence rate and almost all procedure had converged, so the exclusion has not strong effect on the bias calculation.

Validation: A function for validations.

For the user it is sufficient to know only one function validation.dataset_srsc_for_different_NI_NL(). This function replicate many data-sets from known distributions which user specified before execution. In FROC data, the number of lesions and images corresponds samples size. So, large number of image gives us more small bias which will be confirmed by the function validation.dataset_srsc_for_different_NI_NL().

The following, the important variables of validation.dataset_srsc_for_different_NI_NL() are shown.

  • NLvector: A vector, whose each component means No. of lesions. We fit models to the replicated data with these fixed lesions specified this vector NLvector . For, example, if NLvector=c(100,1000) then the first we replicate various hits and false alarms under the assumption of 100 lesions, and evaluates the bias. Next we do same but changing 1000 lesions instead.
  • ratio: The ratio of image:lesions. It bother for me to input both the images and lesions. Thus I construct so that it is sufficient to input only one of them, that is if lesion is specified, then the number of images are automatically generated by satisfying this ratio
  • replicate.datset: For fixed number of lesions, images, the dataset of hits and false alarms are replicated, and the number of replicated data sets are specified by this variable.
  • mean.truth: The mean of signal distribution.
  • sd.truth: The standard deviation of signal distribution.
  • z.truth: The threshold of the bi-normal assumption
  • ite: The Hamiltonian Monte Carlo iterations used in each fitting for replicated data-sets.

Output: Error of estimates calculated by replicated data-sets

Each number in table means the errors of parameter, i.e., the mean values of estimates minus true parameter over all replications. We can see that the number of images and lesions become more large, then these error tends to zero. That is, in FROC models, the number of lesions and images corresponds sample size, so if these values are larger, then the hits and false alarms also become larger, and it leads us to smaller biases.

       validation.dataset_srsc_for_different_NI_NL()
Name.of.Parameters 1-th model 2-th model 3-th model 4-th model
Number of Images 200.0000000 20000.0000000 2.00000e+06 2.0000e+08
Number of Lesions 100.0000000 10000.0000000 1.00000e+06 1.0000e+08
z[ 1 ] 0.1055186 0.0103358 1.01020e-03 1.0720e-04
z[ 2 ] 0.1956183 0.0132404 1.20850e-03 1.3030e-04
z[ 3 ] 1.1307551 0.0309471 2.52580e-03 2.9210e-04
mean.of.signal -0.4843169 -0.0463222 -4.95890e-03 -4.2730e-04
sd.of.signal 3.8036830 0.1160011 1.07218e-02 1.0843e-03
AUC -0.0348429 -0.0041814 -4.49200e-04 -4.0100e-05
dz[ 1 ] 0.0900996 0.0029046 1.98300e-04 2.3100e-05
dz[ 2 ] 0.9351368 0.0177067 1.31730e-03 1.6170e-04
p[ 1 ] -0.0384759 -0.0019671 -1.90400e-04 -1.9200e-05
p[ 2 ] -0.0119591 -0.0015342 -1.72000e-04 -1.5200e-05
p[ 3 ] -0.0121365 -0.0026324 -2.75400e-04 -2.5300e-05
lambda[ 1 ] -0.0384759 -0.0019671 -1.90400e-04 -1.9200e-05
lambda[ 2 ] -0.0119591 -0.0015342 -1.72000e-04 -1.5200e-05
lambda[ 3 ] -0.0121365 -0.0026324 -2.75400e-04 -2.5300e-05

Simulation Based Calibration (SBC);Under construction

This is a statistical test for the null hypothesis that the MCMC sampling contains no bias.

From SBC algorithm runs, then the pseudo uniform distribution are obtained.

  • if it is exactly uniform distribution, then we cannot say that the MCMC contains bias,

  • If pseudo uniform distribution is differ from uniformity, then we reject the null hypothesis and we may consider that MCMC sample contains bias.

Appendix

Appendix: Terminology
Appendix: 2
Appendix: Basic words
Basic words Truth = Positive Truth = negative
Reader’s positive TP FP
Reader’s negative FN TN

We only focus TP and FP, thus we call it hit and false alarms, respectively. That is:

Basic words Truth = Positive Truth = negative
Reader’s positive hit False alarm
Reader’s negative FN TN
Appendix

In this section, I show the most important things.

I worked some company, then as a work, someone let me use some washing soap, whose component is

Then, when I exposed it for (4 h; 1 year 18 month ago), then I felt Irritated stimulus in my whole body, shock pains. Then my body become bad, especially, my skin shows my pain as an prurigo nodularis. Now I am prescribed the following steroid of rand very strong and protopic and Bilanoa tablet.

I regret to use it, since in my private life, I never have used such things.

Reader should not use such a things. My all body, brain, eyes, teeth, leg, hands, breathing, elbow, finger, anus, karasuma, scalp, blood vessels, thinning hair, …. etc.

Irritated stimulus is very strong fist month. Especially, in brain, the stimulus continue 10 months. Now, 1 year and 18 months passed from the exposure, but my body is still very bad.

The traces of prurigo nodularis make me very sad.

So, we should not use any petroleum detergent, causes environmental hormone, very strong effect on our body.

The most important things are bio surfactant. We should establish the technology of bio surfactant.

Appendix: These model are made in very past…(:’ 3)

2017 May ~ 2017 August, the author had made theses Bayesian FROC models.

For detail, please see my pre-print on Arxiv: \((';\omega {}; ')\)

" Bayesian Models for free- response receiver operating characteristic analysis"

Appendix; Divergent Transition

Note that we use the following definition

\[p_{C}= 1-\Phi (\frac{z_{c}-\mu}{\sigma}), \]

instead of

\[p_{C}= \Phi (\frac{z_{C +1}-\mu}{\sigma}) -\Phi (\frac{z_{c}-\mu}{\sigma}), \] with theoretically infinity but numerically very large number \(z_{C+1}\).

One would think that this note is redundant, but this is very important to avoid the divergent transition issues which is a bias in MCMC sampling. The author first make a stan file with the latter, then divergent transition occurs. To avoid the divergent transitions, the centering is the one known methods, but in my case, I think it does not relates the centering. Now, my package implements the former.

In our model, author write down the highest confidence level separately. Because if not, then it cause the divergent transition issue. In the past, I wrote the model with the assumption that the highest threshold \(z_{C+1}\) is the very large number (theoretically it is infinity) and whose prior is the uniform distribution with very large support, then it cause the divergent transitions almost all iterations in the MCMC. Also, I use the target formulation to avoid Jacobin warnings, such warnings also difficult for me to solve or understand what it say or how overcome.

One may thinks my models are very complex. In fact many people will hate my explanation, my models. One reason why my model is so complex is the FROC statistical model is very complex and without the reader’s effort, cannot understand. Further more, the description of my model is changed to avoid the divergent transition issues, it make my model more complex one. Further I had to overcome Jacobian issues, which formulation is not intuitive for me.

I am very tired to explain my model and also disappeared my ability :’-D

These model are made in two years ago, 2017.May ~ 2017. August.

I briefly explain the divergent transition in my model. In theoretical perspective, it natural to introduce the highest decision threshold as an very large number, which is a parameter of model. Unfortunately, it is distributed by the uniform distributions with very large support. Then once we introduce such theoretically infinity parameter in the numerical program, it case the divergent transition issues.

Appendix: About the author

Hello!! Everyone !! I am around thirty years old. When I am 33? years old, that is 2017.12.28, I had been exposed to a cleanser about 4 hour, then my body become very poor physical condition and whole my body has the so-called prurigo nodularis. When I was exposed, then I felt a stimulating sensation, such as “tiktik”. Now, I have Bilanoa, Tacrolimus, ANTEBATE. So,Exposure developed atopic dermatitis. Please be careful!! The exposure cause various pain 24 hours out of 24 all the time at all hours (of the day and night)

Appendix: Future research direction

Appendix:

\(\color{green}{\textit{ Addition of Poisson distributions }}\)

If \(X \sim \text{Poisson}(\lambda_X)\) and \(Y \sim \text{Poisson}(\lambda_Y)\), then \(X+Y \sim \text{Poisson}(\lambda_X + \lambda_Y)\). We use this relation in the above of the false alarm context.

Appendix:

\(\color{green}{\textit{ Calculation of Poisson distributions }}\)

If some random variable \(N\) satisfy \(N \sim \text{Poisson}(\lambda)\) then \(\mathbb{P}(N=0)= e^{-\lambda}\) (redundant ?).

Appendix