# Introduction to subtee and Usage Instructions

#### 2022-03-22

In this vignette we showcase the estimation functions of the subtee package. We provide three ways for estimating the treatment effects in subgroups: unadjusted estimation (unadj), model averaging (modav), and using bootstrap bias adjustment (bagged).

## Analyzing the data

We use the prostate cancer dataset that was used in Rosenkranz (2016) to illustrate the usage of the package. The dataset consists of n=475 subjects randomized to a control group or diethylstilbestrol. The considered endpoint is survival time in months. There are six subgroup defining variables to consider: existence of bone metastasis (BM), disease stage (3 or 4), performance (PF), history of cardiovascular events (HX), age, and weight. While age and weight are continuous covariates, they are dichotomized (age $$\leq$$ 65, $$>$$ 65 and weight $$\leq$$ 100, $$>$$ 100) for obtaining subgroups as in Rosenkranz (2016).

The considered endpoint is survival time in months and Cox proportional hazards models are fitted. We first start producing the treatment effect estimates for all subgroups, using the unadj, modav and bagged functions.

library(ggplot2)
library(subtee)
################################################################################
# We use the dataset from Rosenkranz (2016) https://onlinelibrary.wiley.com/doi/abs/10.1002/bimj.201500147
# to illustrate the methods proposed in this work.
# The data comes from a clinical trial of an prostate cancer
# treatment
# Data is loaded from Royston, Patrick, and Willi Sauerbrei.
# Multivariable model-building: a pragmatic approach to
# regression anaylsis based on fractional polynomials for
# modelling continuous variables. Vol. 777. John Wiley & Sons, 2008.
# https://www.imbi.uni-freiburg.de/Royston-Sauerbrei-book
prca = get_prca_data()
#> Downloading remote dataset.

## subbuild function

We first use the subbuild function to create the subgroup defining binary covariates. This function takes the dataset as a first argument, and then a series of expressions to define the subgroup indicator variables (see ?subbuild for more options on how to generate binary subgroup indicators based on a data-set). Note that we also use the option dupl.rm = TRUE to remove duplicate subgroups. The output of the subbuild is a data.frame that might then be concatenated with the original dataset to be used in the other functions. This step can be ommited if the original dataset already contains the subgroup defining indicator variables.

cand.groups <- subbuild(prca, dupl.rm = TRUE,
BM == 1, PF == 1, HX == 1,
STAGE == 4, AGE > 65, WT > 100)
#>   BM == 1 PF == 1 HX == 1 STAGE == 4 AGE > 65 WT > 100
#> 1       0       0       0          0        1        0
#> 2       0       0       1          0        1        1
#> 3       0       1       1          0        1        0
#> 4       0       0       0          0        1        0
#> 5       0       0       0          0        1        0
#> 6       0       0       0          0        1        0
fitdat <- cbind(prca, cand.groups)
subgr.names <- names(cand.groups)
prog <- as.formula(paste(" ~ ", paste0("", names(cand.groups),"", collapse = " + ")))

Before investigating how the treatment effect differs across the subgroups we first fit the overall model, adjusting for the subgroup indicators only as prognostic covariates. Since we have survival endpoint, we use coxph from the survival package as fitting function.

library(survival)
form <- as.formula(paste("Surv(SURVTIME,CENS) ~ RX +", paste0("", names(cand.groups),"", collapse = " + ")))
coxph(form, data=fitdat, ties = "breslow")
#> Call:
#> coxph(formula = form, data = fitdat, ties = "breslow")
#>
#>                 coef exp(coef) se(coef)      z        p
#> RX           -0.1833    0.8325   0.1129 -1.624  0.10434
#> BM == 1     0.4777    1.6124   0.1713  2.789  0.00529
#> PF == 1     0.4367    1.5477   0.1734  2.519  0.01178
#> HX == 1     0.4373    1.5486   0.1124  3.892 9.96e-05
#> STAGE == 4  0.2052    1.2278   0.1293  1.587  0.11254
#> AGE > 65    0.2587    1.2952   0.1525  1.697  0.08979
#> WT > 100   -0.1987    0.8198   0.1139 -1.745  0.08103
#>
#> Likelihood ratio test=56.89  on 7 df, p=6.291e-10
#> n= 475, number of events= 338

We see that the new treatment leads to better outcomes when compared to control, as the overall treatment effect (RX) is negative. However, its confidence interval covers the no-effect value of 0.

## unadj function

Unadjusted subgroup treatment effect estimates are obtained via the unadj function. We fit the models including the six subgroup indicators as prognostic factors as well, which are added through the covars argument as a formula. The unadj function loops through the $$P$$ variables specified in the subgr argument, fitting the models

$$$M_p:\ \lambda_{pi}(t)= \lambda_{p0}(t) \exp\left\{\beta_p z_i + (\gamma_p + \delta_p z_i)s_{pi} + \sum_{k = 1}^{K} \tau_k x_{ik} \right\} \label{model.cox}$$$

for $$p=1,...,P$$. In this example, we make use of the ... option to pass the option ties = "breslow" to coxph.

### Unadjusted estimates
data = fitdat, covars = prog,
event = "CENS", fitfunc = "coxph", ties = "breslow")
#> Trt. Effect Estimates
#>         Group     Subset      LB    trtEff       UB
#> 1     BM == 1   Subgroup -1.1949 -0.785335 -0.37579
#> 2     BM == 1 Complement -0.2442 -0.040729  0.16273
#> 3     PF == 1   Subgroup -0.3965  0.118040  0.63257
#> 4     PF == 1 Complement -0.4212 -0.224282 -0.02734
#> 5     HX == 1   Subgroup -0.2547  0.006503  0.26774
#> 6     HX == 1 Complement -0.6199 -0.363698 -0.10754
#> 7  STAGE == 4   Subgroup -0.6526 -0.376238 -0.09986
#> 8  STAGE == 4 Complement -0.2748 -0.027431  0.21998
#> 9    AGE > 65   Subgroup -0.2517 -0.052995  0.14568
#> 10   AGE > 65 Complement -1.4451 -0.962738 -0.48037
#> 11   WT > 100   Subgroup -0.5730 -0.274251  0.02451
#> 12   WT > 100 Complement -0.3622 -0.126989  0.10819
#>
#> Difference in Trt. Effect vs Complement
#>        Group        LB trtEffDiff       UB
#> 1    BM == 1 -1.201671    -0.7446 -0.28754
#> 2    PF == 1 -0.203438     0.3423  0.88808
#> 3    HX == 1  0.007603     0.3702  0.73280
#> 4 STAGE == 4 -0.718343    -0.3488  0.02073
#> 5   AGE > 65  0.394367     0.9097  1.42512
#> 6   WT > 100 -0.524689    -0.1473  0.23016
#>
#> Subgroup Models fitted with "coxph"
#> Effect estimates in terms of the log-hazard ratios

The output shows first the treatment effect estimates (trtEff) in the subgroups and corresponding lower and upper bounds of the unadjusted confidence intervals (LB and UB respectively). A second table is then displayed with the information on the difference in treatment effect in subgroups vs. their complements. The treatment effects are on same scale as returned by the linear predictictor of the specified fitfunc. In this example, the treatment effects are expressed in terms of the log-hazard ratios.

Using the unadjusted estimates for subgroups leads to the conclusion that there may be subgroups with differential treatment effect. In particular, patients with bone metastasis and patients younger than 65 may have a differential benefit from the treatment. This is also observed when looking at the interaction effects between these covariates and treatment. These results needs to be interpreted with caution as this is only an exploratory analysis.

## modAv function

We use the modAv function to obtain the model averaging estimates. In this case, we use the same options as in the unadj function, so we used the default values to set all the models with equal prior weights and no prior weight for the overall model.

### ModelAveraging estimates
res_modav = modav(resp = "SURVTIME", trt = "RX", subgr = subgr.names,
data = fitdat, covars = prog,
event = "CENS", fitfunc = "coxph", ties = "breslow")
res_modav
#> Trt. Effect Estimates
#>         Group     Subset      LB  trtEff        UB
#> 1     BM == 1   Subgroup -1.0089 -0.3182 -0.082706
#> 2     BM == 1 Complement -0.3633 -0.1577  0.080986
#> 3     PF == 1   Subgroup -0.4800 -0.2247 -0.006514
#> 4     PF == 1 Complement -0.3813 -0.1889  0.004497
#> 5     HX == 1   Subgroup -0.3421 -0.1533  0.044871
#> 6     HX == 1 Complement -0.4253 -0.2228 -0.029070
#> 7  STAGE == 4   Subgroup -0.4663 -0.2493 -0.048809
#> 8  STAGE == 4 Complement -0.3614 -0.1547  0.086863
#> 9    AGE > 65   Subgroup -0.3016 -0.0915  0.121691
#> 10   AGE > 65 Complement -1.3799 -0.7415 -0.086948
#> 11   WT > 100   Subgroup -0.3732 -0.1769  0.021429
#> 12   WT > 100 Complement -0.3909 -0.2040 -0.016736
#>
#> Difference in Trt. Effect vs Complement
#>        Group        LB trtEffDiff       UB
#> 1    BM == 1 -0.994109   -0.08220 -0.02315
#> 2    PF == 1 -0.291039    0.01613  0.03506
#> 3    HX == 1  0.017764    0.06258  0.12345
#> 4 STAGE == 4 -0.386656   -0.03536 -0.01087
#> 5   AGE > 65  0.017744    0.67341  1.35540
#> 6   WT > 100 -0.001293    0.01531  0.09971
#>
#> Subgroup Models fitted with "coxph"
#> Effect estimates in terms of the log-hazard ratios

Model averaging has the effect of shrinking the estimates towards the overall treatment effect. Therefore, it may help in adjusting for potential “random high bias”. In this sense, we observe that the treatment effects for the subgroups BM == 1 and Age < 65 are closer to the overall treament effect when using model averaging. All confidence intervals for treatment effects in subgroups cover the overall treatment effect.

## bagged function

We obtain the bagged estimates using the bagged function. Note that we should also provide to the function how the subgroup is selected (select.by = "BIC") and the number of bootstrap to use (B = 2000). We also let the default option stratify = TRUE, which controls that the bootstrap samples have the same number of subjects in treatment and controls arms as the original dataset.

### Bagged estimates
set.seed(321231) # set seed for reproducible results in the bootstrap samples
res_bagged = bagged(resp = "SURVTIME", trt = "RX", subgr = subgr.names,
data = fitdat, covars = prog,
event = "CENS", fitfunc = "coxph", ties = "breslow",
select.by = "BIC", B = 200) #B = 2000)
res_bagged
#> Trt. Effect Estimates
#>      Group     Subset      LB   trtEff     UB
#> 1 AGE > 65   Subgroup -0.2778 -0.07497 0.1279
#> 2 AGE > 65 Complement -1.7810 -0.83476 0.1114
#>
#> Difference in Trt. Effect vs Complement
#>      Group       LB trtEffDiff   UB
#> 1 AGE > 65 -0.07011     0.7598 1.59
#>
#>  AGE > 65 is the selected subgroup.
#>  It was selected in 50% of 200 bootstrap samples.
#>
#> Subgroup Models fitted with "coxph"
#> Effect estimates in terms of the log-hazard ratios

The bootstrap methods provides bias-adjusted estimates, which corrects for the bias that is introduced when selecting a subgroup. Therefore, it only makes sense to display the results of the selected subgroup. The user however may explore the results of the bootstrap samples, which are included in the output object.

The subgroup defined by AGE>65 is the most selected in the bootstrap samples. This allows to ‘correct’ the estimates for selection bias. We observe a similar pattern as with model averaging. After using bootstrap, the treatment effect is closer to that on the overall population and its confidence interval even covers it.