Overview

HaploReg (https://pubs.broadinstitute.org/mammals/haploreg/haploreg.php) (Ward and Kellis 2011), RegulomeDB (http://www.regulomedb.org) (Boyle et al. 2012), and LDlink (https://analysistools.nci.nih.gov/LDlink/) (Machiela and Chanock 2015) are web-based tools that extract biological information such as eQTL, LD, LD matrices, motifs, etc. from large genomic projects such as ENCODE, the 1000 Genomes Project, Roadmap Epigenomics Project and others. This is sometimes called “post stage GWAS” analysis.

The R-package haploR was developed to query those tools (HaploReg, RegulomeDB, LDlink) directly from R in order to facilitate high-throughput genomic data analysis. Below we provide several examples that show how to work with this package.

Note: you must have a stable Internet connection to use this package.

Contact: ilya.zhbannikov@duke.edu for questions of usage the haploR or any other issues.

Motivation and typical analysis workflow

This package was inspired by the fact that many web-based post stage GWAS databases do not have Application Programing Interface (API) and, therefore, do not allow users to query them remotedly from R environment. In our research we used HaploReg and RegulomeDB web databases. These very useful web databases show information about linkage disequilibrium of query variants and variants which are in LD with them, expression quantitative trait loci (eQTL), motifs changed and other useful information. However, it is not easy to include this information into streamlined workflow since those tools also not offer API.

We developed a custom analysis pipeline which prepares data, performs genome-wide association (GWA) analysis and presents results in a user-friendly format. Results include a list of genetic variants (also known as ‘SNP’ or single nucleotide polymorphism), their corresponding p-values, phenotypes (traits) tested and other meta-information such as LD, alternative allele, minor allele frequency, motifs changed, etc. Of course, we could go throught the list of SNPs having genome-wide significant p-values (1e-8) and submit each of those SNPs to web-based tools manually, one-by-one, but it is time-consuming process and will not be fully automatic (which ruins one of the pipeline’s paradigms). This is especially difficult if the web site does not offer downloading results.

Therefore, we developed haploR, a user-friendly R package that connects to the web tool from R environment with methods POST/GET and downloads results in a suitable R format. This package siginificantly saved our time in developing reporting system for our internal genomic analysis pipeline and now we would like to present haploR to the research community.

Example of typical analysis workflow is shown below.