Install and Load the package

To install the package from CRAN, type:

install.packages("bcROCsurface")

Next, load the package.

library(bcROCsurface)
## Loading required package: nnet
## Loading required package: rgl
## Loading required package: boot
## Loading required package: parallel

Diagnosis of EOC

To illustrate the use of the package bcROCsurface, we consisder an example, which presents the evaluation process for biomaker CA125 in the diagnosis of epithelial ovarian cancer (EOC).

Load the data set

data(EOC)

The data have 278 observations on the following 6 variables:

head(EOC)
##   D.full V  D        CA125       CA153 Age
## 1      3 1  3  3.304971965  1.42822875  41
## 2      1 0 NA  0.112479444  0.11665310  52
## 3      2 1  2  2.375011262 -0.04096794  50
## 4      1 0 NA -0.001545381  0.32111633  66
## 5      1 0 NA  0.278200345 -0.14283052  52
## 6      2 0 NA  0.167645382  0.81470563  50

In data set, CA125 and CA153 are two biomarkers, Age is the age of the patients. The variable V is the verification status; 1 and 0 indicates verified and non-verified subject, respectively. D.full is disease status, which consist of three classes, says, 1, 2, 3. These levels correspond to benign disease, early stage (I and II) and late stage (III and IV). On the other hand, D is missing disease status.

Obtaining ROC surface and VUS

FULL Estimator

The ROC surface and VUS are only applied when an monotone increasing ordering is of interest. Thus, before estimate ROC and also VUS, we have to be sure that the ordering of disease classes is monotone incresasing (with respect to the diagnostic test values). In order to do that, the function preDATA() is usefull.

Dfull <- preDATA(EOC$D.full, EOC$CA125)
## The sample means of diagostic test based on three classes.
## ( 1 ) 1 : 0.192 
## ( 2 ) 2 : 1.81 
## ( 3 ) 3 : 3.214 
## The sample median of diagostic test based on three classes.
## ( 1 ) 1 : 0.014 
## ( 2 ) 2 : 1.499 
## ( 3 ) 3 : 3.381 
## The ordering based on median: 1 < 2 < 3

On the other hand, we describe the full diease status D.full as the binary matrix having three columns, correspondings to three classes of the disease status. Each row corresponds to a trinomial vector, in which, 1 indicates the subject belongs to class j with j = 1,2,3. The function preDATA() also done this work.

head(Dfull$D)
## [1] 3 1 2 1 1 2
head(Dfull$Dvec)
##      D1 D2 D3
## [1,]  0  0  1
## [2,]  1  0  0
## [3,]  0  1  0
## [4,]  1  0  0
## [5,]  1  0  0
## [6,]  0  1  0

We construct the ROC surface of full data, and estimate the VUS.

Dvec.full <- Dfull$Dvec
ROCs(method = "full", T = EOC$CA125, Dvec = Dvec.full, ncp = 30, ellipsoid = TRUE,
     cpst = c(-0.56, 2.31))
## Hmm, look likes the full data
## Number of observation: 278 
## The verification status is not available
## You are working on FULL or Complete Case approach
## The diagnostic test: CA125 
## Processing ....
## Warning in par3d(userMatrix = structure(c(1, 0, 0, 0, 0,
## 0.342020143325668, : font family "sans" not found, using "bitmap"
## DONE
## ===============================================================
## Some values of TCFs:
##                  TCF1  TCF2  TCF3
## (0.4 , 2.627)   0.694 0.493 0.688
## (0.718 , 2.627) 0.769 0.418 0.688
## (0.4 , 2.945)   0.694 0.537 0.636
## (0.718 , 2.945) 0.769 0.463 0.636
## (0.4 , 1.991)   0.694 0.343 0.805
## (0.718 , 1.991) 0.769 0.269 0.805
## 
## Some information for Ellipsoidal Confidence Region(s):
## Confidence level: 0.95 
## TCFs at (-0.56, 2.31) are:
##  TCF1  TCF2  TCF3 
## 0.209 0.642 0.727 
## ===============================================================

Here, we consider the full data, so we only need to put the arguments T and Dvec, and method is full.

The FULL estimator of VUS is obtained by the following command:

vus("full", T = EOC$CA125, Dvec = Dvec.full, ci = TRUE, parallel = TRUE)
## Hmm, look likes the full data
## The verification status is not available
## You are working on FULL or Complete Case approach
## Number of observation: 278
## The diagnostic test: CA125 
## Processing .... 
## DONE
## 
## CALL: vus(method = "full", T = CA125, Dvec = Dvec.full, ci = TRUE, 
##     parallel = TRUE)
##  
## Estimate of VUS: 0.5663 
## Standard error: 0.0365 
## 
## Intervals :
## Level        Normal                Logit        
## 95%   ( 0.4946,  0.6379 )   ( 0.4937,  0.6360 ) 
## Estimation of Standard Error and Intervals are based on Bootstrap with 250 replicates
##
## Testing the null hypothesis H0: VUS = 1/6 
##             Test Statistic   P-value    
## Normal-test         10.596 < 2.2e-16 ***
## ---
## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

FI and MSI Estimator

Now, we compute the FI and MSI estimator with missing data. First, we need to estimate the disease probabilities by using multnomial logistic model. In bcROCsurface package, this work is done by using rhoMLogit() function.

Dna <- preDATA(EOC$D, EOC$CA125)
## There are missing disease status.
## The sample means of diagostic test based on three classes.
## ( 1 ) 1 : 0.555 
## ( 2 ) 2 : 2.112 
## ( 3 ) 3 : 3.347 
## The sample median of diagostic test based on three classes.
## ( 1 ) 1 : 0.286 
## ( 2 ) 2 : 1.972 
## ( 3 ) 3 : 3.439 
## The ordering based on median: 1 < 2 < 3

Dvec.na <- Dna$Dvec
D.na <- Dna$D
rho.out <- rhoMLogit(D.na ~ CA125 + CA153 + Age, data = EOC, test = TRUE, trace = TRUE)
## # weights:  15 (8 variable)
## initial  value 195.552987 
## iter  10 value 134.148494
## final  value 133.798546 
## converged
## Fitting the disease model by using multinomial logistic model via nnet package.
## FORMULAR: Disease ~ CA125 + CA153 + Age 
## 
## ====================================================================
## The p-value calculation for the regression coefficients:
##                     1         2
## (Intercept) 1.708e-05 0.0008529
## CA125       1.303e-08 0.0417717
## CA153       6.543e-02 0.0216723
## Age         1.635e-03 0.0031442
## ====================================================================

The following command provides the ROC surface of FI esimator:

ROCs(method = "fi", T = EOC$CA125, Dvec = Dvec.na, V = EOC$V, rhoEst = rho.out, ncp = 30,
     ellipsoid = TRUE, cpst = c(-0.56, 2.31))
## Hmm, look likes the incomplete data
## Number of observation: 278 
## 64% of the subjects receive disease verification. 
## You required estimate ROC surface using FI approach 
## The diagnostic test: CA125 
## The ellipsoidal confidence region(s) of TCFs are also constructed 
## Processing ....
## Warning in par3d(userMatrix = structure(c(1, 0, 0, 0, 0,
## 0.342020143325668, : font family "sans" not found, using "bitmap"
## DONE
## ===============================================================
## Some values of TCFs:
##                  TCF1  TCF2  TCF3
## (1.036 , 2.945) 0.820 0.362 0.606
## (0.718 , 2.945) 0.747 0.433 0.606
## (1.354 , 2.945) 0.876 0.291 0.606
## (1.036 , 2.627) 0.820 0.312 0.637
## (0.4 , 2.945)   0.670 0.492 0.606
## (0.718 , 2.627) 0.747 0.383 0.637
## 
## Some information for Ellipsoidal Confidence Region(s):
## Confidence level: 0.95 
## TCFs at (-0.56, 2.31) are:
##  TCF1  TCF2  TCF3 
## 0.195 0.586 0.680 
## ===============================================================