The Generalized Berk-Jones statistic was developed to perform set-based inference in genetic association studies. It is an alternative to tests such as the Sequence Kernel Association Test (SKAT), Generalized Higher Criticism (GHC), and Minimum p-value (minP).
GBJ is a generalization of the Berk-Jones (BJ) statistic, which offers - in a certain sense - asymptotic power guarantees for detection of rare and weak signals. GBJ modifies BJ to account for correlation between factors in a set. GBJ has been demonstrated to outperform other tests when signals are moderately sparse (more precisely, when the number of signals is between d1/4 and d1/2, where d is the number of factors in the set).
Other advantages include: 1. Analytic p-value calculation (no need for permutation inference). 2. Can be applied to individual-level genotype data or GWAS summary statistics. 3. No tuning parameters. Accepts standard inputs (similar to glm() function).
We show a simple example for testing the association between a set of 50 SNPs (which could be, for example, from the same gene or pathway) and a binary outcome.
library(GBJ) set.seed(1000) # Case-control study, 1000 subjects cancer_status <- c(rep(1,500), rep(0,500)) # We have 50 SNPs each with minor allele frequency of 0.3 in this example genotype_data <- matrix(data=rbinom(n=1000*50, size=2, prob=0.3), nrow=1000) age <- round( runif(n=1000, min=30, max=80) ) gender <- rbinom(n=1000, size=1, prob=0.5) # Fit the null model, calculate marginal score statistics for each SNP # (asymptotically equivalent to those calculated by, for example, PLINK) null_mod <- glm(cancer_status~age+gender, family=binomial(link="logit")) log_reg_stats <- calc_score_stats(null_model=null_mod, factor_matrix=genotype_data, link_function="logit") # Run the test GBJ(test_stats=log_reg_stats$test_stats, cor_mat=log_reg_stats$cor_mat) #> $GBJ #>  1.43984 #> #> $GBJ_pvalue #>  0.330911 #> #> $err_code #>  0
We may not have convinced you that GBJ is the best option for your application. If that is the case, then you may still be interested in trying the Berk-Jones (BJ), Generalized Higher Criticism (GHC), Higher Criticism (HC), or Minimum p-value (minP) tests, which can be run with the same inputs, i.e. GHC(test_stats=score_stats, cor_mat=cor_Z) to run the GHC. We also have developed an omnibus test which information from multiple different methods. Please see the vignette for more details.