Introduction to BioInstaller

Jianfeng Li

2018-01-24

Introduction

BioInstaller is a downloader and installer of bio-softwares and bio-databases. The inspiration for this project comes from various types of convenient package manager, such as pip for Python package, install.packages for R package, biocLite for Bioconductor R package, etc.

Why we do not have an integrated bioinformatics database and software package manager?

In fact, there are already some tools can complete part of the work:

Conda and BioConda have done a lot of work and we can use them to conveniently install some of bioinformatics softwares. But there are still many problems with these package managers, such as version updating not timely, incompatible to some precompiled programs, little support for the database and other non-software files.

docker is another kind very promising tool to complete the migration of the analytical environment. But the root authority is required that it’s difficult for you to always get root privileges.

Futhermore, learning how to install and compile bioinformatics softwares is still necessary, because these ‘unpleasant’ experience will help you to improve the ability to debug and modify programs.

As for me, when starting some NGS analysis work in a new computer or operating system, I have to spend much time and energy to establish a complete set of softwares and dependent files and set the corresponding configuration file.

BioInstaller can help us to download, install and manage a variety of bioinformatics tools and databases more easily and systematically.

What’s more, BioInstaller provides a different way to download and install your files, softwares and databases for others, more detail can be found in another vignette Examples of Templet Configuration File.

Feature:

Core function in BioInstaller

library(BioInstaller)
set.biosoftwares.db(tempfile())
# Show all avaliable softwares/dependece in default inst/extdata/config/github/github.toml 
# and inst/extdata/config/nongithub/nongithub.toml
install.bioinfo(show.all.names = TRUE)
#>   [1] "abyss"                            "arnapipe"                        
#>   [3] "asap"                             "backspin"                        
#>   [5] "bamtools"                         "bamutil"                         
#>   [7] "bcftools"                         "bearscc"                         
#>   [9] "bedtools"                         "bowtie"                          
#>  [11] "bowtie2"                          "breakdancer"                     
#>  [13] "brie"                             "bwa"                             
#>  [15] "chronqc"                          "cnvkit"                          
#>  [17] "cnvnator"                         "dart"                            
#>  [19] "delly"                            "facets"                          
#>  [21] "fastp"                            "fastq_tools"                     
#>  [23] "fastx_toolkit"                    "freebayes"                       
#>  [25] "fsclvm"                           "giggle"                          
#>  [27] "github_demo"                      "hisat2"                          
#>  [29] "htseq"                            "igraph"                          
#>  [31] "isop"                             "jvarkit"                         
#>  [33] "libgtextutils"                    "lofreq"                          
#>  [35] "macs"                             "mdseq"                           
#>  [37] "mimosca"                          "multiqc"                         
#>  [39] "oases"                            "olego"                           
#>  [41] "oncotator"                        "outrigger"                       
#>  [43] "picard"                           "pindel"                          
#>  [45] "pxz"                              "raceid"                          
#>  [47] "radia"                            "rca"                             
#>  [49] "resm"                             "rhat"                            
#>  [51] "rum"                              "samtools_old"                    
#>  [53] "sclvm"                            "scnorm"                          
#>  [55] "seqtk"                            "seurat"                          
#>  [57] "singlesplice"                     "sleuth"                          
#>  [59] "somaticsniper"                    "sparsehash"                      
#>  [61] "speedseq"                         "star"                            
#>  [63] "strawberry"                       "tmap"                            
#>  [65] "tophat2"                          "tracer"                          
#>  [67] "trimgalore"                       "trinityrnaseq"                   
#>  [69] "varscan2"                         "vcflib"                          
#>  [71] "vcftools"                         "vep"                             
#>  [73] "zifa"                             "absolute"                        
#>  [75] "annovar"                          "armadillo"                       
#>  [77] "atlas2"                           "bcl2fastq"                       
#>  [79] "beagle"                           "blast"                           
#>  [81] "blat"                             "bzip2"                           
#>  [83] "cesa"                             "cnvnator_samtools"               
#>  [85] "contest"                          "curl"                            
#>  [87] "demo_2"                           "edena"                           
#>  [89] "ensemble_grch37_reffa"            "ensemble_grch38_reffa"           
#>  [91] "fastqc"                           "fatotwobit"                      
#>  [93] "fusioncatcher"                    "fusioncatcher_reffa"             
#>  [95] "gatk"                             "gatk4"                           
#>  [97] "gatk_bundle"                      "gmap"                            
#>  [99] "gridss"                           "hapseg"                          
#> [101] "hisat2_reffa"                     "htslib"                          
#> [103] "igv"                              "imagej"                          
#> [105] "interproscan"                     "liftover"                        
#> [107] "lzo"                              "lzop"                            
#> [109] "mapsplice2"                       "marina"                          
#> [111] "meerkat"                          "miniconda2"                      
#> [113] "miniconda3"                       "mutect"                          
#> [115] "mutsig"                           "mutsig_dependence"               
#> [117] "mutsig_reffa"                     "ngs_qc_toolkit"                  
#> [119] "novoalign"                        "paradigm"                        
#> [121] "pcre"                             "pigz"                            
#> [123] "prada"                            "prinseq"                         
#> [125] "r"                                "reditools"                       
#> [127] "rmats"                            "rmats_reffa"                     
#> [129] "root"                             "samstat"                         
#> [131] "samtools"                         "snpeff"                          
#> [133] "solexaqa"                         "sqlite"                          
#> [135] "sratools"                         "srnanalyzer"                     
#> [137] "ssaha2"                           "strelka"                         
#> [139] "subread"                          "svtoolkit"                       
#> [141] "tvc"                              "ucsc_reffa"                      
#> [143] "ucsc_utils"                       "vadir"                           
#> [145] "vcfanno"                          "velvet"                          
#> [147] "xz"                               "zlib"                            
#> [149] "db_atcircdb"                      "db_biosystems"                   
#> [151] "db_cancer_hotspots"               "db_cgi"                          
#> [153] "db_circbase"                      "db_circnet"                      
#> [155] "db_circrnadb"                     "db_civic"                        
#> [157] "db_cscd"                          "db_denovo_db"                    
#> [159] "db_dgidb"                         "db_differentialnet"              
#> [161] "db_diseaseenhancer"               "db_disgenet"                     
#> [163] "db_docm"                          "db_drugbank"                     
#> [165] "db_ecodrug"                       "db_eggnog"                       
#> [167] "db_exorbase"                      "db_expression_atlas"             
#> [169] "db_exsnp"                         "db_fantom_cage_peaks"            
#> [171] "db_fantom_co_expression_clusters" "db_fantom_enhancers"             
#> [173] "db_fantom_motifs"                 "db_fantom_ontology"              
#> [175] "db_fantom_tss_classifier"         "db_funcoup"                      
#> [177] "db_gtex"                          "db_hgnc"                         
#> [179] "db_hpo"                           "db_inbiomap"                     
#> [181] "db_interpro"                      "db_intogen"                      
#> [183] "db_lncediting"                    "db_medreaders"                   
#> [185] "db_mndr"                          "db_msdd"                         
#> [187] "db_omim_open"                     "db_omim_private"                 
#> [189] "db_oncotator"                     "db_pancanqtl"                    
#> [191] "db_proteinatlas"                  "db_rbp_var"                      
#> [193] "db_rddpred"                       "db_remap"                        
#> [195] "db_remap2"                        "db_rsnp3"                        
#> [197] "db_rvarbase"                      "db_seecancer"                    
#> [199] "db_seeqtl"                        "db_snipa3"                       
#> [201] "db_srnanalyzer"                   "db_superdrug2"                   
#> [203] "db_tumorfusions"                  "db_varcards"                     
#> [205] "db_annovar_1000g"                 "db_annovar_1000g_sqlite"         
#> [207] "db_annovar_avsift"                "db_annovar_avsnp"                
#> [209] "db_annovar_avsnp_sqlite"          "db_annovar_brvar"                
#> [211] "db_annovar_cadd"                  "db_annovar_cadd_sqlite"          
#> [213] "db_annovar_cancer_hotspots"       "db_annovar_cg"                   
#> [215] "db_annovar_civic_gene_summaries"  "db_annovar_clinvar"              
#> [217] "db_annovar_clinvar_sqlite"        "db_annovar_cosmic"               
#> [219] "db_annovar_cosmic_sqlite"         "db_annovar_cscd"                 
#> [221] "db_annovar_darned_sqlite"         "db_annovar_dbnsfp"               
#> [223] "db_annovar_dbnsfp_sqlite"         "db_annovar_dbscsnv"              
#> [225] "db_annovar_dbscsnv_sqlite"        "db_annovar_dhs_gene_connectivity"
#> [227] "db_annovar_disgenet"              "db_annovar_docm"                 
#> [229] "db_annovar_eigen"                 "db_annovar_eigen_sqlite"         
#> [231] "db_annovar_ensgene"               "db_annovar_epi_genes"            
#> [233] "db_annovar_esp6500siv2"           "db_annovar_exac03"               
#> [235] "db_annovar_exac03_sqlite"         "db_annovar_fathmm"               
#> [237] "db_annovar_gdi_score"             "db_annovar_gerp"                 
#> [239] "db_annovar_gme"                   "db_annovar_gme_sqlite"           
#> [241] "db_annovar_gnomad"                "db_annovar_gnomad_sqlite"        
#> [243] "db_annovar_gtex_eqtl_egenes"      "db_annovar_gtex_eqtl_pairs"      
#> [245] "db_annovar_gwava"                 "db_annovar_gwava_sqlite"         
#> [247] "db_annovar_hgnc"                  "db_annovar_hrcr1"                
#> [249] "db_annovar_hrcr1_sqlite"          "db_annovar_icgc21"               
#> [251] "db_annovar_icgc_sqlite"           "db_annovar_intervar"             
#> [253] "db_annovar_intervar_sqlite"       "db_annovar_intogen"              
#> [255] "db_annovar_kaviar"                "db_annovar_knowngene"            
#> [257] "db_annovar_ljb26_all"             "db_annovar_lncediting_sqlite"    
#> [259] "db_annovar_loftool_scores"        "db_annovar_mcap"                 
#> [261] "db_annovar_mcap_sqlite"           "db_annovar_mitimpact"            
#> [263] "db_annovar_nci60"                 "db_annovar_nci60_sqlite"         
#> [265] "db_annovar_normal_pool"           "db_annovar_omim_genemap2"        
#> [267] "db_annovar_popfreq"               "db_annovar_popfreq_sqlite"       
#> [269] "db_annovar_radar_sqlite"          "db_annovar_rddpred_sqlite"       
#> [271] "db_annovar_rediportal_sqlite"     "db_annovar_refgene"              
#> [273] "db_annovar_regsnpintron"          "db_annovar_revel"                
#> [275] "db_annovar_revel_sqlite"          "db_annovar_rvis_esv_score"       
#> [277] "db_annovar_seeqtl"                "db_annovar_snp"                  
#> [279] "db_annovar_tall_somatic_genes"    "db_annovar_tmcsnpdb"             
#> [281] "db_annovar_varcards"              "db_annovar_varcards_sqlite"      
#> [283] "db_ucsc_cytoband"                 "db_ucsc_dnase_clustered"         
#> [285] "db_ucsc_ensgene"                  "db_ucsc_knowngene"               
#> [287] "db_ucsc_refgene"                  "db_ucsc_tfbs_clustered"          
#> [289] "db_blast_env_nr"                  "db_blast_est_human"              
#> [291] "db_blast_est_mouse"               "db_blast_est_others"             
#> [293] "db_blast_gss"                     "db_blast_htgs"                   
#> [295] "db_blast_human_genomic"           "db_blast_landmark"               
#> [297] "db_blast_mouse_genomic"           "db_blast_nr"                     
#> [299] "db_blast_nt"                      "db_blast_other_genomic"          
#> [301] "db_blast_pataa"                   "db_blast_patnt"                  
#> [303] "db_blast_pdbaa"                   "db_blast_pdbnt"                  
#> [305] "db_blast_ref_prok_rep_genomes"    "db_blast_ref_viroids_rep_genomes"
#> [307] "db_blast_ref_viruses_rep_genomes" "db_blast_refseq_genomic"         
#> [309] "db_blast_refseq_protein"          "db_blast_refseq_rna"             
#> [311] "db_blast_refseqgene"              "db_blast_sts"                    
#> [313] "db_blast_swissprot"               "db_blast_taxdb"                  
#> [315] "db_blast_tsa_nr"                  "db_blast_tsa_nt"                 
#> [317] "db_blast_vector"

# Fetching versions of softwares
install.bioinfo('samtools', show.all.versions = TRUE)
#> INFO [2018-01-24 19:29:53] Fetching samtools versions....
#>  [1] "1.6"        "1.5"        "1.4.1"      "1.4"        "1.3.1"     
#>  [6] "1.3"        "1.2"        "1.1"        "1.0"        "0.2.0-rc12"
#> [11] "0.2.0-rc11" "0.2.0-rc10" "0.2.0-rc9"  "0.2.0-rc8"  "0.2.0-rc7" 
#> [16] "0.2.0-rc6"  "0.2.0-rc5"  "0.2.0-rc4"  "0.2.0-rc3"  "0.2.0-rc2" 
#> [21] "0.2.0-rc1"  "0.1.20"     "0.1.19"     "0.1.18"     "0.1.17"    
#> [26] "0.1.16"     "0.1.15"     "0.1.14"     "0.1.13"     "master"

# Install 'demo' with debug infomation
download.dir <- sprintf('%s/demo_2', tempdir())
install.bioinfo('demo', download.dir = download.dir, verbose = TRUE)
#> INFO [2018-01-24 19:29:55] Debug:name:demo
#> INFO [2018-01-24 19:29:55] Debug:destdir:
#> INFO [2018-01-24 19:29:55] Debug:db:/tmp/Rtmpf6U2g1/filed6087e1b32f2
#> INFO [2018-01-24 19:29:55] Debug:github.cfg:/tmp/RtmpZICqdL/Rinstd5f859170595/BioInstaller/extdata/config/github/github.toml
#> INFO [2018-01-24 19:29:55] Debug:nongithub.cfg:/tmp/RtmpZICqdL/Rinstd5f859170595/BioInstaller/extdata/config/nongithub/nongithub.toml
#> INFO [2018-01-24 19:29:55] Debug:nongithub.cfg:/tmp/RtmpZICqdL/Rinstd5f859170595/BioInstaller/extdata/config/db/db_main.toml
#> INFO [2018-01-24 19:29:55] Debug:nongithub.cfg:/tmp/RtmpZICqdL/Rinstd5f859170595/BioInstaller/extdata/config/db/db_annovar.toml
#> INFO [2018-01-24 19:29:55] Debug:nongithub.cfg:/tmp/RtmpZICqdL/Rinstd5f859170595/BioInstaller/extdata/config/db/db_blast.toml
#> INFO [2018-01-24 19:29:55] Fetching demo versions....
#> INFO [2018-01-24 19:29:55] Install versions:GRCh37
#> INFO [2018-01-24 19:29:55] Now start to install demo in /tmp/Rtmpf6U2g1/demo_2.
#> INFO [2018-01-24 19:29:55] Running before install steps.
#> INFO [2018-01-24 19:29:55] Now start to download demo in /tmp/Rtmpf6U2g1/demo_2.
#> INFO [2018-01-24 19:29:56] Running install steps.
#> INFO [2018-01-24 19:29:56] Running after install successful steps.
#> INFO [2018-01-24 19:29:56] Running change.info for demo and be saved to /tmp/Rtmpf6U2g1/filed6087e1b32f2
#> INFO [2018-01-24 19:29:56] Debug:Install by Github configuration file: 
#> INFO [2018-01-24 19:29:56] Debug:Install by Non Github configuration file: demo
#> INFO [2018-01-24 19:29:56] Installed successful list: demo
#> $fail.list
#> [1] ""
#> 
#> $success.list
#> [1] "demo"

# Download demo source code
download.dir <- sprintf('%s/demo_3', tempdir())
install.bioinfo('demo', download.dir = download.dir,
  download.only = TRUE, verbose = TRUE)
#> INFO [2018-01-24 19:29:56] Debug:name:demo
#> INFO [2018-01-24 19:29:56] Debug:destdir:
#> INFO [2018-01-24 19:29:56] Debug:db:/tmp/Rtmpf6U2g1/filed6087e1b32f2
#> INFO [2018-01-24 19:29:56] Debug:github.cfg:/tmp/RtmpZICqdL/Rinstd5f859170595/BioInstaller/extdata/config/github/github.toml
#> INFO [2018-01-24 19:29:56] Debug:nongithub.cfg:/tmp/RtmpZICqdL/Rinstd5f859170595/BioInstaller/extdata/config/nongithub/nongithub.toml
#> INFO [2018-01-24 19:29:56] Debug:nongithub.cfg:/tmp/RtmpZICqdL/Rinstd5f859170595/BioInstaller/extdata/config/db/db_main.toml
#> INFO [2018-01-24 19:29:56] Debug:nongithub.cfg:/tmp/RtmpZICqdL/Rinstd5f859170595/BioInstaller/extdata/config/db/db_annovar.toml
#> INFO [2018-01-24 19:29:56] Debug:nongithub.cfg:/tmp/RtmpZICqdL/Rinstd5f859170595/BioInstaller/extdata/config/db/db_blast.toml
#> INFO [2018-01-24 19:29:57] Fetching demo versions....
#> INFO [2018-01-24 19:29:57] Install versions:GRCh37
#> INFO [2018-01-24 19:29:57] Now start to download demo in /tmp/Rtmpf6U2g1/demo_3.
#> INFO [2018-01-24 19:29:57] demo be downloaded in /tmp/Rtmpf6U2g1/demo_3 successful
#> [1] TRUE

# Set download.dir and destdir (destdir like /usr/local 
# including bin, lib, include and others), 
# destdir will work if install step {{destdir}} be used
download.dir <- sprintf('%s/demo_source', tempdir())
destdir <- sprintf('%s/demo', tempdir())
install.bioinfo('demo', download.dir = download.dir, destdir = destdir)
#> INFO [2018-01-24 19:29:58] Debug:name:demo
#> INFO [2018-01-24 19:29:58] Debug:destdir:/tmp/Rtmpf6U2g1/demo
#> INFO [2018-01-24 19:29:58] Debug:db:/tmp/Rtmpf6U2g1/filed6087e1b32f2
#> INFO [2018-01-24 19:29:58] Debug:github.cfg:/tmp/RtmpZICqdL/Rinstd5f859170595/BioInstaller/extdata/config/github/github.toml
#> INFO [2018-01-24 19:29:58] Debug:nongithub.cfg:/tmp/RtmpZICqdL/Rinstd5f859170595/BioInstaller/extdata/config/nongithub/nongithub.toml
#> INFO [2018-01-24 19:29:58] Debug:nongithub.cfg:/tmp/RtmpZICqdL/Rinstd5f859170595/BioInstaller/extdata/config/db/db_main.toml
#> INFO [2018-01-24 19:29:58] Debug:nongithub.cfg:/tmp/RtmpZICqdL/Rinstd5f859170595/BioInstaller/extdata/config/db/db_annovar.toml
#> INFO [2018-01-24 19:29:58] Debug:nongithub.cfg:/tmp/RtmpZICqdL/Rinstd5f859170595/BioInstaller/extdata/config/db/db_blast.toml
#> INFO [2018-01-24 19:29:58] Fetching demo versions....
#> INFO [2018-01-24 19:29:58] Install versions:GRCh37
#> INFO [2018-01-24 19:29:58] Now start to install demo in /tmp/Rtmpf6U2g1/demo.
#> INFO [2018-01-24 19:29:58] Running before install steps.
#> INFO [2018-01-24 19:29:58] Now start to download demo in /tmp/Rtmpf6U2g1/demo_source.
#> INFO [2018-01-24 19:29:59] Running install steps.
#> INFO [2018-01-24 19:29:59] Running after install successful steps.
#> INFO [2018-01-24 19:29:59] Running change.info for demo and be saved to /tmp/Rtmpf6U2g1/filed6087e1b32f2
#> INFO [2018-01-24 19:29:59] Debug:Install by Github configuration file: 
#> INFO [2018-01-24 19:29:59] Debug:Install by Non Github configuration file: demo
#> INFO [2018-01-24 19:29:59] Installed successful list: demo
#> $fail.list
#> [1] ""
#> 
#> $success.list
#> [1] "demo"

Storing useful information of databases and softwares

It takes time to find the routes of the softwares and databases after downloading and installing them, what’s worse is that you would be in really dire straits if you didn’t save the useful information.

Fortunately, version, path, source code path and update time will be saved in BIO_SOFWARES_DB_ACTIVE database, a YAML format file, if you did that work with BioInstaller.

temp.db <- tempfile()
set.biosoftwares.db(temp.db)
is.biosoftwares.db.active(temp.db)
#> [1] TRUE

# Install 'demo' quite
download.dir <- sprintf('%s/demo_1', tempdir())
install.bioinfo('demo', download.dir = download.dir, verbose = FALSE)
#> $fail.list
#> [1] ""
#> 
#> $success.list
#> [1] "demo"
config <- get.info('demo')
config
#> $installed
#> [1] TRUE
#> 
#> $source.dir
#> [1] "/tmp/Rtmpf6U2g1/demo_1"
#> 
#> $bin_dir
#> [1] "/tmp/Rtmpf6U2g1/demo_1"
#> 
#> $executable_files
#> [1] ""
#> 
#> $install.dir
#> [1] "/tmp/Rtmpf6U2g1/demo_1"
#> 
#> $version
#> [1] "GRCh37"
#> 
#> $last.update.time
#> [1] "2018-01-24 19:30:00"
#> 
#> attr(,"config")
#> [1] "demo"
#> attr(,"configtype")
#> [1] "yaml"
#> attr(,"file")
#> [1] "/tmp/Rtmpf6U2g1/filed608139ce4bc"

config <- configr::read.config(temp.db)
config$demo$comments <- 'This is a demo.'
params <- list(config.dat = config, file.path = temp.db)
do.call(configr::write.config, params)
#> [1] TRUE
get.info('demo')
#> $installed
#> [1] "TRUE"
#> 
#> $source.dir
#> [1] "/tmp/Rtmpf6U2g1/demo_1"
#> 
#> $bin_dir
#> [1] "/tmp/Rtmpf6U2g1/demo_1"
#> 
#> $executable_files
#> [1] ""
#> 
#> $install.dir
#> [1] "/tmp/Rtmpf6U2g1/demo_1"
#> 
#> $version
#> [1] "GRCh37"
#> 
#> $last.update.time
#> [1] "2018-01-24 19:30:00"
#> 
#> $comments
#> [1] "This is a demo."
#> 
#> attr(,"config")
#> [1] "demo"
#> attr(,"configtype")
#> [1] "ini"
#> attr(,"file")
#> [1] "/tmp/Rtmpf6U2g1/filed608139ce4bc"
del.info('demo')
#> [1] TRUE

Install softwares from local source

BioInstaller can be used to install softwares from local source. To install github softwares, a cloned directory were required, and nongithub softwares can be installed from decompressed directory or a compressed archive.

download.dir <- sprintf('%s/github_demo_local', tempdir())
install.bioinfo('github_demo', download.dir = download.dir, download.only = TRUE, verbose = FALSE)
#> cloning into '/tmp/Rtmpf6U2g1/github_demo_local'...
#> Receiving objects:  16% (1/6),    0 kb
#> Receiving objects:  33% (2/6),    0 kb
#> Receiving objects:  50% (3/6),    0 kb
#> Receiving objects:  66% (4/6),    0 kb
#> Receiving objects:  83% (5/6),    0 kb
#> Receiving objects: 100% (6/6),    0 kb, done.
#> [1] TRUE
install.bioinfo('github_demo', local.source = download.dir)
#> INFO [2018-01-24 19:30:03] Debug:name:github_demo
#> INFO [2018-01-24 19:30:03] Debug:destdir:
#> INFO [2018-01-24 19:30:03] Debug:db:/tmp/Rtmpf6U2g1/filed608139ce4bc
#> INFO [2018-01-24 19:30:03] Debug:github.cfg:/tmp/RtmpZICqdL/Rinstd5f859170595/BioInstaller/extdata/config/github/github.toml
#> INFO [2018-01-24 19:30:03] Debug:nongithub.cfg:/tmp/RtmpZICqdL/Rinstd5f859170595/BioInstaller/extdata/config/nongithub/nongithub.toml
#> INFO [2018-01-24 19:30:03] Debug:nongithub.cfg:/tmp/RtmpZICqdL/Rinstd5f859170595/BioInstaller/extdata/config/db/db_main.toml
#> INFO [2018-01-24 19:30:03] Debug:nongithub.cfg:/tmp/RtmpZICqdL/Rinstd5f859170595/BioInstaller/extdata/config/db/db_annovar.toml
#> INFO [2018-01-24 19:30:03] Debug:nongithub.cfg:/tmp/RtmpZICqdL/Rinstd5f859170595/BioInstaller/extdata/config/db/db_blast.toml
#> INFO [2018-01-24 19:30:04] Fetching github_demo versions....
#> INFO [2018-01-24 19:30:04] Install versions:master
#> INFO [2018-01-24 19:30:05] Now start to install github_demo in /tmp/Rtmpf6U2g1/github_demo.
#> INFO [2018-01-24 19:30:05] Running before install steps.
#> INFO [2018-01-24 19:30:05] Running install steps.
#> INFO [2018-01-24 19:30:05] Running after install successful steps.
#> INFO [2018-01-24 19:30:05] Running change.info for github_demo and be saved to /tmp/Rtmpf6U2g1/filed608139ce4bc
#> INFO [2018-01-24 19:30:05] Debug:Install by Github configuration file: github_demo
#> INFO [2018-01-24 19:30:05] Debug:Install by Non Github configuration file: 
#> INFO [2018-01-24 19:30:05] Installed successful list: github_demo
#> $fail.list
#> [1] ""
#> 
#> $success.list
#> [1] "github_demo"

download.dir <- sprintf('%s/demo_local', tempdir())
install.bioinfo('demo_2', download.dir = download.dir, download.only = TRUE, verbose = FALSE)
#> [1] TRUE
install.bioinfo('demo_2', download.dir = download.dir, local.source = sprintf('%s/GRCh37_MT_ensGene.txt.gz', download.dir), decompress = TRUE)
#> INFO [2018-01-24 19:30:06] Debug:name:demo_2
#> INFO [2018-01-24 19:30:06] Debug:destdir:
#> INFO [2018-01-24 19:30:06] Debug:db:/tmp/Rtmpf6U2g1/filed608139ce4bc
#> INFO [2018-01-24 19:30:06] Debug:github.cfg:/tmp/RtmpZICqdL/Rinstd5f859170595/BioInstaller/extdata/config/github/github.toml
#> INFO [2018-01-24 19:30:06] Debug:nongithub.cfg:/tmp/RtmpZICqdL/Rinstd5f859170595/BioInstaller/extdata/config/nongithub/nongithub.toml
#> INFO [2018-01-24 19:30:06] Debug:nongithub.cfg:/tmp/RtmpZICqdL/Rinstd5f859170595/BioInstaller/extdata/config/db/db_main.toml
#> INFO [2018-01-24 19:30:06] Debug:nongithub.cfg:/tmp/RtmpZICqdL/Rinstd5f859170595/BioInstaller/extdata/config/db/db_annovar.toml
#> INFO [2018-01-24 19:30:06] Debug:nongithub.cfg:/tmp/RtmpZICqdL/Rinstd5f859170595/BioInstaller/extdata/config/db/db_blast.toml
#> INFO [2018-01-24 19:30:07] Fetching demo_2 versions....
#> INFO [2018-01-24 19:30:07] Install versions:GRCh37
#> INFO [2018-01-24 19:30:07] Now start to install demo_2 in /tmp/Rtmpf6U2g1/demo_local.
#> INFO [2018-01-24 19:30:07] Running before install steps.
#> INFO [2018-01-24 19:30:08] Running install steps.
#> INFO [2018-01-24 19:30:08] Running after install successful steps.
#> INFO [2018-01-24 19:30:08] Running change.info for demo_2 and be saved to /tmp/Rtmpf6U2g1/filed608139ce4bc
#> INFO [2018-01-24 19:30:08] Debug:Install by Github configuration file: 
#> INFO [2018-01-24 19:30:08] Debug:Install by Non Github configuration file: demo_2
#> INFO [2018-01-24 19:30:08] Installed successful list: demo_2
#> $fail.list
#> [1] ""
#> 
#> $success.list
#> [1] "demo_2"

Craw all versions of softwares or databases

BioInstaller provide a craw.all.version function to try download all avaliable URL files in nongithub part.

download.dir <- sprintf('%s/craw_all_versions', tempdir())
craw.all.versions('demo', download.dir = download.dir)
#> INFO [2018-01-24 19:30:08] Fetching demo versions....

Get meta information of softwares and databases

# Get all meta source files
meta_files <- get.meta.files()
meta_files
#> $db_meta_file
#> [1] "/tmp/RtmpZICqdL/Rinstd5f859170595/BioInstaller/extdata/config/db/db_meta.toml"
#> 
#> $github_meta_file
#> [1] "/tmp/RtmpZICqdL/Rinstd5f859170595/BioInstaller/extdata/config/github/github_meta.toml"
#> 
#> $nongithub_meta_file
#> [1] "/tmp/RtmpZICqdL/Rinstd5f859170595/BioInstaller/extdata/config/nongithub/nongithub_meta.toml"
#> 
#> $web_meta_file
#> [1] "/tmp/RtmpZICqdL/Rinstd5f859170595/BioInstaller/extdata/config/web/web_meta.toml"

# Get all of meta informaton in BioInstaller
meta <- get.meta()
meta
#> $db_meta_file
#> [1] "/tmp/RtmpZICqdL/Rinstd5f859170595/BioInstaller/extdata/config/db/db_meta.toml"
#> 
#> $github_meta_file
#> [1] "/tmp/RtmpZICqdL/Rinstd5f859170595/BioInstaller/extdata/config/github/github_meta.toml"
#> 
#> $nongithub_meta_file
#> [1] "/tmp/RtmpZICqdL/Rinstd5f859170595/BioInstaller/extdata/config/nongithub/nongithub_meta.toml"
#> 
#> $web_meta_file
#> [1] "/tmp/RtmpZICqdL/Rinstd5f859170595/BioInstaller/extdata/config/web/web_meta.toml"
#> 
#> $db
#> $db$cfg_meta
#> $db$cfg_meta$avaliable_cfg
#> [1] "db_annovar.toml" "db_blast.toml"   "db_main.toml"   
#> 
#> $db$cfg_meta$cfg_dir
#> [1] "@>@system.file('extdata', 'config/db', package = 'BioInstaller')@<@"
#> 
#> $db$cfg_meta$prefix_url
#> [1] "https://raw.githubusercontent.com/JhuangLab/BioInstaller/master/inst/extdata/config/db/"
#> 
#> 
#> $db$item
#> $db$item$atcircdb
#> $db$item$atcircdb$description
#> [1] "Circular RNA not only functions as a potential competitive target for miRNA, but also regulates transcription and interacts with RNA-binding proteins. Because of the structural stability of the circular form, these molecules are promising candidates for intervening in a number of biological pathways, and may be a high value tool for pharmaceutical research in human and photosynthesis in plant.Based on our previous research, we systematically investigated 622 RNA-Seq samples from 87 indepedent studies hosted at NCBI SRA, and extracted all related circular RNAs. To improve the prediction accuracy, we not only applied a straightforward metric to screen and rank the circular RNA, but also incorporated exon boundaries as well as circular RNA candidates from previous studies into this resource to provide robust evidence for experimental biologists. In regards of the interaction between miRNA and circular RNAs, we utilized psRNAtarget and TAPIR to evaluate the statistical significance. Together, this database will host all predicted and validated Arabidopsis circular RNAs, and provide valuable and comprehensive information for studying this newly emerging non-coding RNA."
#> 
#> $db$item$atcircdb$publication
#> [1] "Ye J, Wang L, Li S, Zhang Q, Zhang Q, Tang W, Wang K, Song K, Sablok G, Sun X*, Zhao H*; AtCircDB: a tissue-specific database for Arabidopsis circular RNAs. Brief Bioinform 2017 bbx089. doi: 10.1093/bib/bbx089."                                                     
#> [2] "Sun X, Wang L, Ding J, Wang Y, Wang J, Zhang X, Che Y, Liu Z, Zhang X, Ye J, Wang J, Sablok G, Deng Z, Zhao H. Integrative analysis of Arabidopsis thaliana transcriptomics reveals intuitive splicing mechanism for circular RNA. FEBS Lett. 2016. 590(20):3510-3516. "
#> 
#> $db$item$atcircdb$url
#> [1] "http://genome.sdau.edu.cn/circRNA/index.php"
#> 
#> 
#> $db$item$biosystems
#> $db$item$biosystems$description
#> [1] "A biosystem, or biological system, is a group of molecules that interact in a biological system. One type of biosystem is a biological pathway, which can consist of interacting genes, proteins, and small molecules. Another type of biosystem is a disease, which can involve components such as genes, biomarkers, and drugs. A number of databases provide diagrams showing the components and products of biological pathways along with corresponding annotations and links to literature. The NCBI BioSystems Database was developed as a complementary project to (1) serve as a centralized repository of data; (2) connect the biosystem records with associated literature, molecular, and chemical data throughout the EntrezBI BioSystems record for arachidonic acid metabolism, for example, displays the name and description of the biosystem along with a thumbnail image of the pathway diagram that links to the full size illustration on the source database's web site. In addition, the BioSystems record lists and categorizes the genes, proteins, and small molecules involved in the biological system, along with related biosystems and citations, and allows instant retrieval of the those data sets through a wide range of Links. Integrating the data in this way makes it possible to search across all the pathways to answer broad questions such as the \\\"how to\\\" examples shown below. The companion FLink icon FLink tool, in turn, allows you to input a list of proteins, genes, or small molecules and retrieve a ranked list of biosystems. The NCBI BioSystems Database currently contains records from several source databases: KEGG, BioCyc (including its Tier 1 EcoCyc and MetaCyc databases, and its Tier 2 databases), Reactome, the National Cancer Institute's Pathway Interaction Database, WikiPathways, and Gene Ontology (GO). The BioSystems database includes several types of records such as pathways, structural complexes, and functional sets, and is desiged to accomodate other record types, such as diseases, as data become available. Through these collaborations, the BioSystems database facilitates access to, and provides the ability to compute on, a wide range of biosystems data. Detailed diagrams and annotations for individual biosystems are then available on the web sites of the source databases."
#> 
#> $db$item$biosystems$publication
#> [1] "Geer L Y, Marchler-Bauer A, Geer R C, et al. The NCBI biosystems database[J]. Nucleic acids research, 2009, 38(suppl_1): D492-D496."
#> 
#> $db$item$biosystems$url
#> [1] "https://www.ncbi.nlm.nih.gov/biosystems"
#> 
#> 
#> $db$item$blast
#> $db$item$blast$description
#> [1] "All of blast required databases"
#> 
#> $db$item$blast$title
#> [1] "Basic Local Alignment Search Tool Databases"
#> 
#> $db$item$blast$url
#> [1] "ftp://ftp.ncbi.nih.gov/blast/db/"
#> 
#> 
#> $db$item$cancer_hotspots
#> $db$item$cancer_hotspots$description
#> [1] "This resource is maintained by the Kravis Center for Molecular Oncology at Memorial Sloan Kettering Cancer Center. It provides information about statistically significantly recurrent mutations identified in large scale cancer genomics data."
#> 
#> $db$item$cancer_hotspots$publication
#> [1] "Chang et al., Accelerating discovery of functional mutant alleles in cancer. Cancer Discovery, 10.1158/2159-8290.CD-17-0321 (2017)"                              
#> [2] "Chang et al., Identifying recurrent mutations in cancer reveals widespread lineage diversity and mutational specificity. Nature Biotechnology 34, 155–163 (2016)"
#> 
#> $db$item$cancer_hotspots$tag
#> [1] "NGS"      "database"
#> 
#> $db$item$cancer_hotspots$title
#> [1] "A RESOURCE FOR STATISTICALLY SIGNIFICANT MUTATIONS IN CANCER"
#> 
#> 
#> $db$item$cgi
#> $db$item$cgi$description
#> [1] "Cancer Genome Interpreter is designed to support the identification of tumor alterations that drive the disease and detect those that may be therapeutically actionable. CGI relies on existing knowledge collected from several resources and on computational methods that annotate the alterations in a tumor according to distinct levels of evidence.\\nWith a list of genomic alterations and the cancer type as input, the CGI identifies validated driver alterations and annotates and classifies the remaining variants of unknown significance. Then, alterations that are biomarkers of drug response or interact with existing chemical compounds are identified according to current knowledge."
#> 
#> $db$item$cgi$publication
#> [1] "Cancer Genome Interpreter Annotates The Biological And Clinical Relevance Of Tumor Alterations. bioRxiv 140475; doi: https://doi.org/10.1101/140475"
#> 
#> $db$item$cgi$tag
#> [1] "NGS"      "database"
#> 
#> $db$item$cgi$title
#> [1] "Cancer Genome Interpreter"
#> 
#> 
#> $db$item$circbase
#> $db$item$circbase$description
#> [1] "Recently, several laboratories have reported thousands of circular RNAs (circRNAs) in animals. Numerous circRNAs are highly stable and have specific spatiotemporal expression patterns. Even though a function for circRNAs is unknown, these features make circRNAs an interesting class of RNAs as possible biomarkers and for further research. We developed a database and website, “circBase,” where merged and unified data sets of circRNAs and the evidence supporting their expression can be accessed, downloaded, and browsed within the genomic context. circBase also provides scripts to identify known and novel circRNAs in sequencing data. The database is freely accessible through the web server at http://www.circbase.org/."
#> 
#> $db$item$circbase$publication
#> [1] "Glažar P, Papavasileiou P, Rajewsky N. circBase: a database for circular RNAs[J]. Rna, 2014, 20(11): 1666-1670."
#> 
#> $db$item$circbase$url
#> [1] "http://circrna.org/"
#> 
#> 
#> $db$item$circnet
#> $db$item$circnet$description
#> [1] "Circular RNAs (circRNAs) represent a new type of regulatory noncoding RNA that only recently has been identified and cataloged. Emerging evidence indicates that circRNAs exert a new layer of post-transcriptional regulation of gene expression. In this study, we utilized transcriptome sequencing datasets to systematically identify the expression of circRNAs (including known and newly identified ones by our pipeline) in 464 RNA-seq samples, and then constructed the CircNet database (http://circnet.mbc.nctu.edu.tw/) that provides the following resources: (i) novel circRNAs, (ii) integrated miRNA-target networks, (iii) expression profiles of circRNA isoforms, (iv) genomic annotations of circRNA isoforms (e.g., 282,948 exon positions), and (v) sequences of circRNA isoforms. The CircNet database is to our knowledge the first public database that provides tissue-specific circRNA expression profiles and circRNA-miRNA-gene regulatory networks. It not only extends the most up to date catalog of circRNAs but also provides a thorough expression analysis of both previously reported and novel circRNAs. Furthermore, it generates an integrated regulatory network that illustrates the regulation between circRNAs, miRNAs and genes."
#> 
#> $db$item$circnet$publication
#> [1] "Liu Y C, Li J R, Sun C H, et al. CircNet: a database of circular RNAs derived from transcriptome sequencing data[J]. Nucleic acids research, 2016, 44(D1): D209-D215."
#> 
#> $db$item$circnet$url
#> [1] "http://circnet.mbc.nctu.edu.tw/"
#> 
#> 
#> $db$item$circrnadb
#> $db$item$circrnadb$description
#> [1] "circRNADb (version1.0.0), circular RNA (or circRNA) Database, is a comprehensive database for human circular RNAs with protein-coding annotations. It is freely available for non-commercial use. The latest version of this circRNA database contains 32,914 exonic circRNAs with 16,328 protein-coding annotations, of which 46 circRNAs from 37 genes were found to have their corresponding proteins expressed according mass spectrometry. circRNADb can be a valuable resource for large-scale studies of circRNA in humans."
#> 
#> $db$item$circrnadb$publication
#> [1] "Chen X, Han P, Zhou T, et al. circRNADb: a comprehensive database for human circular RNAs with protein-coding annotations[J]. Scientific reports, 2016, 6."
#> 
#> $db$item$circrnadb$url
#> [1] "http://202.195.183.4:8000/circrnadb/circRNADb.php"
#> 
#> 
#> $db$item$civic
#> $db$item$civic$description
#> [1] "Realizing precision medicine will require this information to be centralized, debated and interpreted for application in the clinic. CIViC is an open access, open source, community-driven web resource for Clinical Interpretation of Variants in Cancer. Our goal is to enable precision medicine by providing an educational forum for dissemination of knowledge and active discussion of the clinical significance of cancer genome alterations. For more details refer to the 2017 CIViC publication in Nature Genetics."
#> 
#> $db$item$civic$publication
#> [1] "Griffith, Malachi, et al. \\\"CIViC is a community knowledgebase for expert crowdsourcing the clinical interpretation of variants in cancer.\\\" Nature genetics 49.2 (2017): 170-174."
#> 
#> $db$item$civic$url
#> [1] "https://civic.genome.wustl.edu/home"
#> 
#> 
#> $db$item$cscd
#> $db$item$cscd$description
#> [1] "Circular RNA (circRNA) is a large group of RNA family extensively existed in cells and tissues. High-throughput sequencing provides a way to view circRNAs across different samples, especially in various diseases. However, there is still no comprehensive database for exploring the cancer-specific circRNAs. Researchers at Wuhan University collected 228 total RNA or polyA(-) RNA-seq samples from both cancer and normal cell lines, and identified 272 152 cancer-specific circRNAs. A total of 950 962 circRNAs were identified in normal samples only, and 170 909 circRNAs were identified in both tumor and normal samples, which could be further used as non-tumor background. The researchers constructed a cancer-specific circRNA database. To understand the functional effects of circRNAs, they predicted the microRNA response element sites and RNA binding protein sites for each circRNA. They further predicted potential open reading frames to highlight translatable circRNAs. To understand the association between the linear splicing and the back-splicing, the researchers also predicted the splicing events in linear transcripts of each circRNA. As the first comprehensive cancer-specific circRNA database, they believe CSCD could significantly contribute to the research for the function and regulation of cancer-associated circRNAs."
#> 
#> $db$item$cscd$publication
#> [1] "XiaS , Feng J, Chen K, Ma Y, Gong J, Cai FF, Jin Y, Gao Y, Xia L, Chang H, Wei L, Han L, He C. (2017) CSCD: a database for cancer-specific circular RNAs. Nucleic Acids Research"
#> 
#> $db$item$cscd$url
#> [1] "http://gb.whu.edu.cn/CSCD/"
#> 
#> 
#> $db$item$denovo_db
#> $db$item$denovo_db$description
#> [1] "denovo-db is a collection of germline de novo variants identified in the human genome. de novo variants are those present in children but not their parents (see figure to right). With the advancements in whole-exome and whole-genome sequencing we are now able to assess 1000s of these variants. To provide a landing place for de novo variation we created denovo-db, which has been assembled using the published literature. Many large exome and genome studies have focused on neurodevelopmental disorders and while we are very interested in these disorders we have not limited our database to only these phenotypes. The information types present in denovo-db have been refined to include what we think is highly relevant for genetic studies (for example basic functional annotation, CADD scores, and validation status). Our goal is to provide a compendium of all de novo variants to benefit the larger researcher community and to allow researchers to ask various scientific questions such as: 1. Which sites in the human genome have de novo mutations? 2. Which sites are highly mutable to de novo mutation? 3. What are features of de novo variants generally and in disease? 4. What kinds of phenotypes are represented by de novo variants?"
#> 
#> $db$item$denovo_db$publication
#> [1] "Turner T N, Yi Q, Krumm N, et al. denovo-db: a compendium of human de novo variants[J]. Nucleic acids research, 2017, 45(D1): D804-D811."
#> 
#> $db$item$denovo_db$url
#> [1] "http://denovo-db.gs.washington.edu/denovo-db"
#> 
#> 
#> $db$item$dgidb
#> $db$item$dgidb$description
#> [1] "The Drug-Gene Interaction database (DGIdb) mines existing resources that generate hypotheses about how mutated genes might be targeted therapeutically or prioritized for drug development. It provides an interface for searching lists of genes against a compendium of drug-gene interactions and potentially ‘druggable’ genes. DGIdb can be accessed at http://dgidb.org/."
#> 
#> $db$item$dgidb$publication
#> [1] "Griffith, M., et al. DGIdb: mining the druggable genome. Nat Methods 2013;10(12):1209-1210. "
#> 
#> $db$item$dgidb$url
#> [1] "http://dgidb.org/"
#> 
#> 
#> $db$item$diseaseenhancer
#> $db$item$diseaseenhancer$description
#> [1] "Genetic alterations/variants of enhancers make an essential contribution to disease progression. And more than 3 million of enhancers generated by international consortiums indicated that disease-associated enhancers will open a brand new view of pathophysiology.DiseaseEnhancer provides a comprehensive map of manually curated disease-associated enhancers, which includes 847 disease-associated enhancers in 143 human diseases, involving 896 unique enhancer-gene interactions. We also manually collected their dysregulated target genes and mechanistic-related information, such as the associated variant types (including single nucleotide variant, somatic mutation, indel and copy number alteration) and affected transcription factor bindings. Additional genome data were also integrated into DiseaseEnhancer to help characterize disease-associated enhancers."
#> 
#> $db$item$diseaseenhancer$publication
#> [1] "Zhang G, Shi J, Zhu S, et al. DiseaseEnhancer: a resource of human disease-associated enhancer catalog[J]. Nucleic Acids Research, 2017."
#> 
#> $db$item$diseaseenhancer$url
#> [1] "http://biocc.hrbmu.edu.cn/DiseaseEnhancer/"
#> 
#> 
#> $db$item$disgenet
#> $db$item$disgenet$description
#> [1] "DisGeNET is a discovery platform containing one of the largest publicly available collections of genes and variants associated to human diseases. DisGeNET integrates data from expert curated repositories, GWAS catalogues, animal models and the scientific literature. DisGeNET data are homogeneously annotated with controlled vocabularies and community-driven ontologies. Additionally, several original metrics are provided to assist the prioritization of genotype–phenotype relationships.\\nThe current version of DisGeNET (v5.0) contains 561,119 gene-disease associations (GDAs), between 17,074 genes and 20,370 diseases, disorders, traits, and clinical or abnormal human phenotypes, and 135,588 variant-disease associations (VDAs), between 83,002 SNPs and 9,169 diseases and phenotypes."
#> 
#> $db$item$disgenet$publication
#> [1] "DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants, Nucleic Acids Research, Volume 45, Issue D1, 4 January 2017, Pages D833–D839, https://doi.org/10.1093/nar/gkw943"
#> 
#> $db$item$disgenet$tag
#> [1] "NGS"      "database"
#> 
#> $db$item$disgenet$title
#> [1] "a database of gene-disease associations"
#> 
#> 
#> $db$item$docm
#> $db$item$docm$description
#> [1] "DoCM, the Database of Curated Mutations, is a highly curated database of known, disease-causing mutations that provides easily explorable variant lists with direct links to source citations for easy verification."
#> 
#> $db$item$docm$publication
#> [1] "A correspondence describing DoCM has been published in Nature Methods: DoCM: a database of curated mutations in cancer. Nature Methods (2016) doi:10.1038/nmeth.4000."
#> 
#> $db$item$docm$tag
#> [1] "NGS"      "database"
#> 
#> $db$item$docm$title
#> [1] "the Database of Curated Mutations"
#> 
#> 
#> $db$item$drugbank
#> $db$item$drugbank$description
#> [1] "The DrugBank database is a comprehensive, freely accessible, online database containing information on drugs and drug targets. As both a bioinformatics and a cheminformatics resource, DrugBank combines detailed drug (i.e. chemical, pharmacological and pharmaceutical) data with comprehensive drug target (i.e. sequence, structure, and pathway) information. Because of its broad scope, comprehensive referencing and unusually detailed data descriptions, DrugBank is more akin to a drug encyclopedia than a drug database. As a result, links to DrugBank are maintained for nearly all drugs listed in Wikipedia. DrugBank is widely used by the drug industry, medicinal chemists, pharmacists, physicians, students and the general public. Its extensive drug and drug-target data has enabled the discovery and repurposing of a number of existing drugs to treat rare and newly identified illnesses."
#> 
#> $db$item$drugbank$publication
#> [1] "Wishart D S, Knox C, Guo A C, et al. DrugBank: a comprehensive resource for in silico drug discovery and exploration[J]. Nucleic acids research, 2006, 34(suppl_1): D668-D672."                                                                                                                                                              
#> [2] "Wishart D S, Knox C, Guo A C, et al. DrugBank: a knowledgebase for drugs, drug actions and drug targets[J]. Nucleic acids research, 2007, 36(suppl_1): D901-D906."                                                                                                                                                                           
#> [3] "Knox C, Law V, Jewison T, et al. DrugBank 3.0: a comprehensive resource for ‘omics’ research on drugs[J]. Nucleic acids research, 2010, 39(suppl_1): D1035-D1041."                                                                                                                                                                           
#> [4] "Law V, Knox C, Djoumbou Y, et al. DrugBank 4.0: shedding new light on drug metabolism[J]. Nucleic acids research, 2013, 42(D1): D1091-D1097."                                                                                                                                                                                                
#> [5] "Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, Sajed T, Johnson D, Li C, Sayeeda Z, Assempour N, Iynkkaran I, Liu Y, Maciejewski A, Gale N, Wilson A, Chin L, Cummings R, Le D, Pon A, Knox C, Wilson M. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 2017 Nov 8. doi: 10.1093/nar/gkx1037."
#> 
#> $db$item$drugbank$url
#> [1] "https://www.drugbank.ca"
#> 
#> 
#> $db$item$ecodrug
#> $db$item$ecodrug$description
#> [1] "The ECOdrug database contains information on the Evolutionary Conservation Of human Drug targets in over 600 eukaryotic species The interface allows users to identify human drug targets to 1000+ legacy drugs and explore integrated orthologue predictions for the drug targets, transparently showing the confidence in the predictions both across methods and taxonomic groups."
#> 
#> $db$item$ecodrug$publication
#> [1] "Verbruggen B, Gunnarsson L, Kristiansson E, et al. ECOdrug: a database connecting drugs and conservation of their targets across species[J]. Nucleic Acids Research, 2017."
#> 
#> $db$item$ecodrug$url
#> [1] "http://www.ecodrug.org/"
#> 
#> 
#> $db$item$eggnog
#> $db$item$eggnog$description
#> [1] "eggNOG is a public resource that provides Orthologous Groups (OGs) of proteins at different taxonomic levels, each with integrated and summarized functional annotations. Developments since the latest public release include changes to the algorithm for creating OGs across taxonomic levels, making nested groups hierarchically consistent. This allows for a better propagation of functional terms across nested OGs and led to the novel annotation of 95 890 previously uncharacterized OGs, increasing overall annotation coverage from 67% to 72%. The functional annotations of OGs have been expanded to also provide Gene Ontology terms, KEGG pathways and SMART/Pfam domains for each group. Moreover, eggNOG now provides pairwise orthology relationships within OGs based on analysis of phylogenetic trees. We have also incorporated a framework for quickly mapping novel sequences to OGs based on precomputed HMM profiles. Finally, eggNOG version 4.5 incorporates a novel data set spanning 2605 viral OGs, covering 5228 proteins from 352 viral proteomes. All data are accessible for bulk downloading, as a web-service, and through a completely redesigned web interface. The new access points provide faster searches and a number of new browsing and visualization capabilities, facilitating the needs of both experts and less experienced users. eggNOG v4.5 is available at http://eggnog.embl.de."
#> 
#> $db$item$eggnog$publication
#> [1] "eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Jaime Huerta-Cepas, Damian Szklarczyk, Kristoffer Forslund, Helen Cook, Davide Heller, Mathias C. Walter, Thomas Rattei, Daniel R. Mende, Shinichi Sunagawa, Michael Kuhn, Lars Juhl Jensen, Christian von Mering, and Peer Bork. Nucl. Acids Res. (04 January 2016) 44 (D1): D286-D293. doi: 10.1093/nar/gkv1248"
#> 
#> $db$item$eggnog$url
#> [1] "http://eggnogdb.embl.de/#/app/home"
#> 
#> 
#> $db$item$exorbase
#> $db$item$exorbase$description
#> [1] "exoRBase is a repository of circular RNA (circRNA), long non-coding RNA (lncRNA) and messenger RNA (mRNA) derived from RNA-seq data analyses of human blood exosomes. Experimental validations from published literature are also included.exoRBase features the integration and visualization of RNA expression profiles based on normalized RNA-seq data spanning both normal individuals and patients with different diseases.\\nexoRBase aims to collect and characterize all long RNA species in human blood exosomes. The annotation, expression level and possible original tissues are provided. exoRBase will aid researchers in identifying molecular signatures in blood exosomes and will trigger new circulating biomarker discovery and functional implication for human diseases."
#> 
#> $db$item$exorbase$publication
#> [1] "Li S, Li Y, Chen B, et al. exoRBase: a database of circRNA, lncRNA and mRNA in human blood exosomes[J]. Nucleic Acids Research, 2017."
#> 
#> $db$item$exorbase$url
#> [1] "http://www.exorbase.org/exoRBase/toIndex"
#> 
#> 
#> $db$item$expression_atlas
#> $db$item$expression_atlas$description
#> [1] "Expression Atlas is an open science resource that gives users a powerful way to find information about gene and protein expression across species and biological conditions such as different tissues, cell types, developmental stages and diseases among others. Expression Atlas aims to help answering questions such as ‘where is a certain gene expressed?’ or ‘how does its expression change in a disease?’"
#> 
#> $db$item$expression_atlas$publication
#> [1] "Papatheodorou, I., et al. Expression Atlas: gene and protein expression across multiple studies and organisms. Nucleic Acids Res 2017."
#> 
#> $db$item$expression_atlas$url
#> [1] "https://www.ebi.ac.uk/gxa/home/"
#> 
#> 
#> $db$item$exsnp
#> $db$item$exsnp$description
#> [1] "Genome-wide association studies (GWAS) of human complex disease have identified a large number of disease associated genetic loci, distinguished by an altered frequency of specific single nucleotide polymorphisms (SNPs) among individuals with a particular disease, compared to controls. However, most of these risk loci do not provide direct information on the biological basis of a disease or on the underlying mechanisms. Recent genome-wide expression quantitative trait loci (eQTLs) association studies have provided information on genetic factors, especially SNPs, associated with gene expression variation. These eQTLs likely contribute to phenotype diversity and disease susceptibility, but interpretation is handicapped by low reproducibility of the expression results. Our primary goal is to establish a gold-standard list of consensus eQTLs by integrating publicly available data for specific human populations and cell types, so as to efficiently prioritize functional SNPs. We used linkage disequilibrium data from Hapmap and the 1000 Genome Project to integrate the results of eQTL studies. Separate gold-standard sets for various populations allowed us to investigate eQTLs which contribute to population-specific expression variation. Additionally, tissue-specific eQTL associations were identified by comparing eQTL data from six cell types: LCLs, B cells, Monocytes, Brain, Liver, and Skin. Moreover, to discover the role of these eQTLs play in human common diseases, we have integrated the current gold standard data with SNPs in disease risk loci from GWA studies of seven common human diseases."
#> 
#> $db$item$exsnp$publication
#> [1] "Yu CH, Pal LR, & Moult J. (2016). Consensus Genome-Wide Expression Quantitative Trait Loci and Their Relationship with Human Complex Trait Disease. OMICS, 20(7):400-14. PMID: 27428252"
#> [2] "Pal LR, Yu CH, Mount SM, & Moult J. (2015). Insights from GWAS: emerging landscape of mechanisms underlying complex trait disease. BMC Genomics, 16 Suppl 8:S4 PMID: 26110739"          
#> 
#> $db$item$exsnp$url
#> [1] "http://www.exsnp.org"
#> 
#> 
#> $db$item$fantom
#> $db$item$fantom$description
#> [1] "FANTOM is an international research consortium established by Dr. Hayashizaki and his colleagues in 2000 to assign functional annotations to the full-length cDNAs that were collected during the Mouse Encyclopedia Project at RIKEN. FANTOM has since developed and expanded over time to encompass the fields of transcriptome analysis. The object of the project is moving steadily up the layers in the system of life, progressing thus from an understanding of the ‘elements’ - the transcripts - to an understanding of the ‘system’ - the transcriptional regulatory network, in other words the ‘system’ of an individual life form."
#> 
#> $db$item$fantom$publication
#> [1] "Andersson R, Gebhard C, Miguel-Escalada I, et al. An atlas of active enhancers across human cell types and tissues[J]. Nature, 2014, 507(7493): 455-461. MLA"
#> [2] "Fantom Consortium. A promoter-level mammalian expression atlas[J]. Nature, 2014, 507(7493): 462-470."                                                        
#> 
#> $db$item$fantom$url
#> [1] "http://fantom.gsc.riken.jp"
#> 
#> 
#> $db$item$funcoup
#> $db$item$funcoup$description
#> [1] "This release of the FunCoup database (http://funcoup.sbc.su.se) is the fourth generation of one of the most comprehensive databases for genome-wide functional association networks. These functional associations are inferred via integrating various data types using a naive Bayesian algorithm and orthology based information transfer across different species. This approach provides high coverage of the included genomes as well as high quality of inferred interactions. In this update of FunCoup we introduce four new eukaryotic species: Schizosaccharomyces pombe, Plasmodium falciparum, Bos taurus, Oryza sativa and open the database to the prokaryotic domain by including networks for Escherichia coli and Bacillus subtilis. The latter allows us to also introduce a new class of functional association between genes - co-occurrence in the same operon. We also supplemented the existing classes of functional association: metabolic, signaling, complex and physical protein interaction with up-to-date information. In this release we switched to InParanoid v8 as the source of orthology and base for calculation of phylogenetic profiles. While populating all other evidence types with new data we introduce a new evidence type based on quantitative mass spectrometry data. Finally, the new JavaScript based network viewer provides the user an intuitive and responsive platform to further evaluate the results."
#> 
#> $db$item$funcoup$publication
#> [1] "Ogris, C., et al. FunCoup 4: new species, data, and visualization. Nucleic Acids Res 2017."                                                                               
#> [2] "Schmitt, T., Ogris, C., & Sonnhammer, E. L. (2013). FunCoup 3.0: database of genome-wide functional coupling networks. Nucleic Acids Research, 42(Database issue), D380-8"
#> [3] "Alexeyenko, A., Schmitt, T., E. L. (2012). Comparative interactomics with Funcoup 2.0. Nucleic Acids Research, 40(Database issue), D821-8"                                
#> [4] "Alexeyenko, A., & Sonnhammer, E. L. (2009). Global networks of functional coupling in eukaryotes from comprehensive data integration. Genome Research, 19(6), 1107-1116"  
#> 
#> $db$item$funcoup$url
#> [1] "http://funcoup.sbc.su.se/search/"
#> 
#> 
#> $db$item$gtex
#> $db$item$gtex$description
#> [1] "Correlations between genotype and tissue-specific gene expression levels will help identify regions of the genome that influence whether and how much a gene is expressed. GTEx will help researchers to understand inherited susceptibility to disease and will be a resource database and tissue bank for many studies in the future. The Genotype-Tissue Expression (GTEx) project aims to provide to the scientific community a resource with which to study human gene expression and regulation and its relationship to genetic variation. This project will collect and analyze multiple human tissues from donors who are also densely genotyped, to assess genetic variation within their genomes. By analyzing global RNA expression within individual tissues and treating the expression levels of genes as quantitative traits, variations in gene expression that are highly correlated with genetic variation can be identified as expression quantitative trait loci, or eQTLs. Despite the rapid progress achieved using genome-wide association studies (GWAS; See: http://www.genome.gov/26525384 ) to identify genetic changes associated with common human diseases, such as heart disease, cancer, diabetes, asthma, and stroke, a large majority of these genetic changes lies outside of the protein-coding regions of genes and often even outside of the genes themselves, making it difficult to discern which genes are affected and by what mechanism. The comprehensive identification of human eQTLs will greatly help to identify genes whose expression is affected by genetic variation, and will provide a valuable basis on which to study the mechanism of that gene regulation. The project will also involve consultation and research into the ethical, legal and social issues raised by the research, support for statistical methods development, and creation of a database to house existing and GTEx-generated eQTL data . The database will allow users to view and download computed eQTL results and provide a controlled access system for de-identified individual-level genotype, expression, and clinical data. The associated tissue repository will also serve as a resource for many additional kinds of analyses."
#> 
#> $db$item$gtex$publication
#> [1] "Consortium G. Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans[J]. Science, 2015, 348(6235):648-60."
#> [2] "Consortium G, Battle A, Brown C D, et al. Genetic effects on gene expression across human tissues[J]. Nature, 2017, 550(7675):204."                            
#> 
#> $db$item$gtex$url
#> [1] "https://www.gtexportal.org"
#> 
#> 
#> $db$item$hgnc
#> $db$item$hgnc$description
#> [1] "HGNC is responsible for approving unique symbols and names for human loci, including protein coding genes, ncRNA genes and pseudogenes, to allow unambiguous scientific communication. genenames.org is a curated online repository of HGNC-approved gene nomenclature, gene families and associated resources including links to genomic, proteomic and phenotypic information."
#> 
#> $db$item$hgnc$publication
#> [1] "Gray KA, Yates B, Seal RL, Wright MW, Bruford EA. genenames.org: the HGNC resources in 2015. Nucleic Acids Res. 2015 Jan;43(Database issue):D1079-85. doi: 10.1093/nar/gku1071. PMID:25361968"               
#> [2] "HGNC Database, HUGO Gene Nomenclature Committee (HGNC), EMBL Outstation - Hinxton, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK www.genenames.org."
#> 
#> $db$item$hgnc$url
#> [1] "https://www.genenames.org/"
#> 
#> 
#> $db$item$hpo
#> $db$item$hpo$description
#> [1] "The Human Phenotype Ontology (HPO) aims to provide a standardized vocabulary of phenotypic abnormalities encountered in human disease. Each term in the HPO describes a phenotypic abnormality, such as atrial septal defect. The HPO is currently being developed using the medical literature, Orphanet, DECIPHER, and OMIM. HPO currently contains approximately 11,000 terms (still growing) and over 115,000 annotations to hereditary diseases. The HPO also provides a large set of HPO annotations to approximately 4000 common diseases."
#> 
#> $db$item$hpo$publication
#> [1] "Kohler, S., et al. The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res 2014;42(Database issue):D966-974."
#> 
#> $db$item$hpo$url
#> [1] "http://human-phenotype-ontology.github.io"
#> 
#> 
#> $db$item$inbiomap
#> $db$item$inbiomap$description
#> [1] "InBio Map™ is a high coverage, high quality, convenient and transparent platform for investigating and visualizing protein-protein interactions. InBio Map™ and the corresponding InWeb_InBioMap PPI database are developed, owned and continuously maintained by Intomics A/S"
#> 
#> $db$item$inbiomap$publication
#> [1] "Li, T., et al. A scored human protein-protein interaction network to catalyze genomic interpretation. Nat Methods 2017;14(1):61-64."
#> 
#> $db$item$inbiomap$url
#> [1] "https://www.intomics.com/inbio/map"
#> 
#> 
#> $db$item$interpro
#> $db$item$interpro$description
#> [1] "InterPro provides functional analysis of proteins by classifying them into families and predicting domains and important sites. We combine protein signatures from a number of member databases into a single searchable resource, capitalising on their individual strengths to produce a powerful integrated database and diagnostic tool."
#> 
#> $db$item$interpro$publication
#> [1] "Apweiler R, Attwood T K, Bairoch A, et al. The InterPro database, an integrated documentation resource for protein families, domains and functional sites[J]. Nucleic acids research, 2001, 29(1): 37-40."
#> [2] "Mulder N, Apweiler R. InterPro and InterProScan: tools for protein sequence classification and comparison[J]. Comparative genomics, 2007: 59-70."                                                         
#> [3] "Jones P, Binns D, Chang H Y, et al. InterProScan 5: genome-scale protein function classification[J]. Bioinformatics, 2014, 30(9): 1236-1240."                                                             
#> 
#> $db$item$interpro$url
#> [1] "http://www.ebi.ac.uk/interpro"
#> 
#> 
#> $db$item$intogen
#> $db$item$intogen$description
#> [1] "IntOGen-mutations platform (http://www.intogen.org/mutations/) summarizes somatic mutations, genes and pathways involved in tumorigenesis. It identifies and visualizes cancer drivers, analyzing 4,623 exomes from 13 cancer sites. It provides support to cancer researchers, aids the identification of drivers across tumor cohorts and helps rank mutations for better clinical decision-making."
#> 
#> $db$item$intogen$publication
#> [1] "Rubio-Perez, C., Tamborero, D., Schroeder, MP., Antolín, AA., Deu-Pons,J., Perez-Llamas, C., Mestres, J., Gonzalez-Perez, A., Lopez-Bigas, N. In silico prescription of anticancer drugs to cohorts of 28 tumor types reveals novel targeting opportunities. Cancer Cell 27 (2015), pp. 382-396"
#> [2] "Gonzalez-Perez A, Perez-Llamas C, Deu-Pons J, Tamborero D, Schroeder MP, Jene-Sanz A, Santos A & Lopez-Bigas N IntOGen-mutations identifies cancer Nature Methods 2013; doi:10.1038/nmeth.2642"                                                                                                 
#> 
#> $db$item$intogen$tag
#> [1] "NGS"      "database"
#> 
#> $db$item$intogen$title
#> [1] "Integrative Onco Genomics"
#> 
#> 
#> $db$item$lncediting
#> $db$item$lncediting$description
#> [1] "RNA editing is a widespread post-transcriptional mechanism that can make discrete changes to specific nucleotide sequences within a RNA transcripts. RNA editing events can result in missense codon changes in mRNA, modulation of alternative splicing in mRNA, or modification of regulatory RNAs and their binding sites in small noncoding RNA, such as miRNA. Recent studies have developed computational methods to accurately detect more than 2 million A-to-I RNA editing from next-generation sequencing data in different species. However, the vast majority of these RNA sites are in noncoding regions of the genome and have unknown functional relevance. LNCediting provides a comprehensive resource for the functional prediction of RNA editing in long noncoding RNAs (lncRNAs)."
#> 
#> $db$item$lncediting$publication
#> [1] "Jing Gong†, Chunjie Liu†, Wei Liu, Yu Xiang, Lixia Diao, An-Yuan Guo∗ and Leng Han∗. Nucl. Acids Res. (2016). doi: 10.1093/nar/gkw835."
#> 
#> $db$item$lncediting$url
#> [1] "http://bioinfo.life.hust.edu.cn/LNCediting"
#> 
#> 
#> $db$item$medreaders
#> $db$item$medreaders$description
#> [1] "MeDReaders: A database for transcription factors that bind to methylated DNA"
#> 
#> $db$item$medreaders$publication
#> [1] "Wang G, Luo X, Wang J, et al. MeDReaders: a database for transcription factors that bind to methylated DNA[J]. Nucleic Acids Research, 2017."
#> 
#> $db$item$medreaders$url
#> [1] "http://medreader.org"
#> 
#> 
#> $db$item$mndr
#> $db$item$mndr$description
#> [1] "Accumulated evidences suggest diverse non-coding RNAs (ncRNAs) involved in a wide variety of diseases progression. Hence, we have updated the MNDR v2.0 database by integrating experimental and prediction diverse ncRNA-disease associations from manual literatures curation and other resources under one common framework. The new developments in MNDR v2.0 include (1) over 220-fold ncRNA-disease associations enhancement than previous version (including lncRNA, miRNA, piRNA, snoRNA and more than 1,400 diseases); (2) integrating experimental and prediction evidence from 14 resources and prediction algorithms for each ncRNA-disease association; (3) mapping disease name to the Disease Ontology and Medical Subject Headings (MeSH); (4) providing a confidence score for each ncRNA-disease association; and (5) an increase of species coverage to 6 mammals."
#> 
#> $db$item$mndr$publication
#> [1] "Cui T, Zhang L, Huang Y, et al. MNDR v2. 0: an updated resource of ncRNA–disease associations in mammals[J]. Nucleic Acids Research, 2017."                    
#> [2] "Wang Y, Chen L, Chen B, et al. Mammalian ncRNA-disease repository: a global view of ncRNA-mediated disease network[J]. Cell death & disease, 2013, 4(8): e765."
#> 
#> $db$item$mndr$url
#> [1] "http://www.rna-society.org/mndr"
#> 
#> 
#> $db$item$msdd
#> $db$item$msdd$description
#> [1] "MSDD provides two maps that enable users to download data by clicking on the appropriate area. The left map classifies data according to the organ and the right map displays the hotspot data."
#> 
#> $db$item$msdd$publication
#> [1] "Yue M, Zhou D, Zhi H, et al. MSDD: a manually curated database of experimentally supported associations among miRNAs, SNPs and human diseases[J]. Nucleic Acids Research, 2017."
#> 
#> $db$item$msdd$url
#> [1] "http://www.bio-bigdata.com/msdd"
#> 
#> 
#> $db$item$omim
#> $db$item$omim$description
#> [1] "OMIM is a comprehensive, authoritative compendium of human genes and genetic phenotypes that is freely available and updated daily. The full-text, referenced overviews in OMIM contain information on all known mendelian disorders and over 15,000 genes. OMIM focuses on the relationship between phenotype and genotype. It is updated daily, and the entries contain copious links to other genetics resources.\\nThis database was initiated in the early 1960s by Dr. Victor A. McKusick as a catalog of mendelian traits and disorders, entitled Mendelian Inheritance in Man (MIM). Twelve book editions of MIM were published between 1966 and 1998. The online version, OMIM, was created in 1985 by a collaboration between the National Library of Medicine and the William H. Welch Medical Library at Johns Hopkins. It was made generally available on the internet starting in 1987. In 1995, OMIM was developed for the World Wide Web by NCBI, the National Center for Biotechnology Information.\\n\\nOMIM is authored and edited at the McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, under the direction of Dr. Ada Hamosh."
#> 
#> $db$item$omim$publication
#> [1] "Hamosh A, Scott A F, Amberger J S, et al. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders[J]. Nucleic acids research, 2005, 33(suppl_1): D514-D517."                   
#> [2] "Amberger J, Bocchini C A, Scott A F, et al. McKusick's online Mendelian inheritance in man (OMIM®)[J]. Nucleic acids research, 2008, 37(suppl_1): D793-D796."                                                           
#> [3] "Amberger J S, Bocchini C A, Schiettecatte F, et al. OMIM. org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders[J]. Nucleic acids research, 2014, 43(D1): D789-D798."
#> 
#> $db$item$omim$url
#> [1] "https://omim.org/"
#> 
#> 
#> $db$item$pancanqtl
#> $db$item$pancanqtl$description
#> [1] "Expression quantitative trait loci (eQTLs) are regions of the genome containing DNA sequence variants that influence the expression level of one or more genes. PancanQTL aims to comprehensively provide cis-eQTLs (SNPs affect local gene expression) and trans-eQTLs (SNPs affect distant gene expression) in 33 cancer types from The Cancer Genome Atlas (TCGA)."
#> 
#> $db$item$pancanqtl$publication
#> [1] "Gong J, Mei S, Liu C, et al. PancanQTL: systematic identification of cis-eQTLs and trans-eQTLs in 33 cancer types[J]. Nucleic Acids Research, 2017."
#> 
#> $db$item$pancanqtl$url
#> [1] "http://bioinfo.life.hust.edu.cn/PancanQTL"
#> 
#> 
#> $db$item$proteinatlas
#> $db$item$proteinatlas$description
#> [1] "The Human Protein Atlas (HPA) is a Swedish-based program started in 2003 with the aim to map of all the human proteins in cells, tissues and organs using integration of various omics technologies, including antibody-based imaging, mass spectrometry-based proteomics, transcriptomics and systems biology. All the data in the knowledge resource is open access to allow scientists both in academia and industry to freely access the data for exploration of the human proteome. The Human Protein Atlas consists of three separate parts, each focusing on a particular aspect of the genome-wide analysis of the human proteins; the Tissue Atlas showing the distribution of the proteins across all major tissues and organs in the human body, the Cell Atlas showing the subcellular localization of proteins in single cells, and finally the Pathology Atlas showing the impact of protein levels for survival of patients with cancer. The Human Protein Atlas program has already contributed to several thousands of publications in the field of human biology and disease and it is selected by the organization ELIXIR (www.elixir-europe.org) as a European core resource due to its fundamental importance for a wider life science community. The HPA consortium is funded by the Knut and Alice Wallenberg Foundation."
#> 
#> $db$item$proteinatlas$publication
#> [1] "U..M et al, 2015. Tissue-based map of the human proteome. Science PubMed: 25613900 DOI: 10.1126/science.1260419"                 
#> [2] "Thul PJ et al, 2017. A subcellular map of the human proteome. Science. PubMed: 28495876 DOI: 10.1126/science.aal3321"            
#> [3] "Uhlen M et al, 2017. A pathology atlas of the human cancer transcriptome. Science. PubMed: 28818916 DOI: 10.1126/science.aan2507"
#> 
#> $db$item$proteinatlas$url
#> [1] "https://www.proteinatlas.org/"
#> 
#> 
#> $db$item$rbp_var
#> $db$item$rbp_var$description
#> [1] "RBP-Var is a database for annotation of functional variants which potentially influence RNA-protein interactions by changing RNA structure in the H. sapiens genome. It contains dbSNPs and RNA editing events in RBP bindig sites (rbSNVs), the change of RNA secondary structure induced by rbSNV, the rbSNV-induced gain/loss of binding sites of miRNA and potential functional rbSNVs which could impact RBP binding. In addition, RBP-Var also integrates GWAS data, eQTL data, ClinVar data, RNA expression and COSMIC data into selection of functional SNVs for genetic association studies."
#> 
#> $db$item$rbp_var$publication
#> [1] "Mao F, Xiao L, Li X, et al. RBP-Var: a database of functional variants involved in regulation mediated by RNA-binding proteins[J]. Nucleic Acids Research, 2016, 44(Database issue):D154-D163."
#> 
#> $db$item$rbp_var$tag
#> [1] "NGS"      "database"
#> 
#> $db$item$rbp_var$title
#> [1] "RBP-Var2: A platform for exploring functional variants involved in post-transcriptional regulation mediated by RNA-binding proteins"
#> 
#> 
#> $db$item$rddpred
#> $db$item$rddpred$description
#> [1] "RDDpred: A condition-specific RNA-editing prediction model from RNA-seq data 1) RDDpred deduces condition-specific training examples without any experimental validations to construct a predictor. \\n2) As far as we know, RDDpred is the very first machine-learning based automated pipeline for RNA-editing prediction. \\n3) RDDpred successfully reproduced the results of two previous studies (95%, 90%), \\nwith showing significant NPV (84%, 75%) and the prediction procedures are finished in reasonable time (18 hrs). \\n"
#> 
#> $db$item$rddpred$publication
#> [1] "Kim M, Hur B, Kim S. RDDpred: a condition-specific RNA-editing prediction model from RNA-seq data[J]. BMC genomics, 2016, 17(Suppl 1)."
#> 
#> $db$item$rddpred$url
#> [1] "http://epigenomics.snu.ac.kr/RDDpred/prior_data"
#> 
#> 
#> $db$item$remap2
#> $db$item$remap2$description
#> [1] "ReMap, an integrative analysis of transcriptional regulators ChIP-seq experiments from both Public and Encode datasets. The ReMap atlas consits of 80 million peaks from 485 transcription factors (TFs), transcription coactivators (TCAs) and chromatin-remodeling factors (CRFs). The atlas is available to browse or download either for a given TF or cell line, or for the entire dataset. "
#> 
#> $db$item$remap2$publication
#> [1] "Integrative analysis of public ChIP-seq experiments reveals a complex multi-cell regulatory landscape.Griffon, A., Barbier, Q., Dalino, J., van Helden, J., Spicuglia, S., Ballester, B. Nucleic Acids Research, Volume 43, Issue 4, 27 February 2015 " 
#> [2] "ReMap 2018: An updated regulatory regions atlas from an integrative analysis of DNA-binding ChIP-seq experiments. Cheneby J., Gheorghe M., Artufel M., Mathelier A., Ballester, B. Nucleic Acids Research, gkx1092, https://doi.org/10.1093/nar/gkx1092"
#> 
#> $db$item$remap2$url
#> [1] "http://tagc.univ-mrs.fr/remap/"
#> 
#> 
#> $db$item$rsnp3
#> $db$item$rsnp3$description
#> [1] "SNP related regulatory elements, element-gene pairs & SNP-based regulatory network"
#> 
#> $db$item$rsnp3$publication
#> [1] "Guo L, Wang J. rSNPBase 3.0: an updated database of SNP-related regulatory elements, element-gene pairs and SNP-based gene regulatory networks[J]. Nucleic Acids Research, 2017."
#> 
#> $db$item$rsnp3$url
#> [1] "http://rsnp3.psych.ac.cn/index.do"
#> 
#> 
#> $db$item$rvarbase
#> $db$item$rvarbase$description
#> [1] "rVarBase annotates variant's regulatory feature in three fields: chromatin state of the region surrounding variant, regulatory elements overlapped with variant, and variant's potential target genes. It also provides optioned extended annotation for variants, including: LD-proxies of known SNP, SNP/CNV that is overlapped with or located in queried variant, traits (disease and expression quantitative trait) associated with variant. rVarBase is an updated version of the database rSNPBase, it is consistent with the old version in utilizing experimentally supported regulatory elements from ENCODE and other data resources to make relevant annotation (such as involved regulatory manner and potential target gene)."
#> 
#> $db$item$rvarbase$publication
#> [1] "Guo, L., Du, Y., Qu, S., & Wang, J. (2015). rVarBase: an updated database for regulatory features of human variants. Nucleic acids research, gkv1107 PMID:26503253"
#> 
#> $db$item$rvarbase$url
#> [1] "http://rv.psych.ac.cn"
#> 
#> 
#> $db$item$seecancer
#> $db$item$seecancer$description
#> [1] "Cancer is driven by accumulating somatic alterations which confer normal cells fitness advantage to evolve from a premalignant status to malignant tumor. The SEECancer database presents the comprehensive cancer evolutionary stage-specific somatic events (including early-specific, late-specific, relapse-specific, metastasis-specific, drug-resistant and drug-induced genomic events) and their temporal orders."
#> 
#> $db$item$seecancer$publication
#> [1] "(Zhang and Luo, 2017) SEECancer: a resource for somatic events in evolution of cancer genome. DOI: 10.1093/nar/gkx964"
#> 
#> $db$item$seecancer$url
#> [1] "http://biocc.hrbmu.edu.cn/SEECancer"
#> 
#> 
#> $db$item$seeqtl
#> $db$item$seeqtl$description
#> [1] "seeQTL is a comprehensive and versatile eQTL database, including various eQTL studies and a meta-analysis of HapMap eQTL information. The database presents eQTL association results in a convenient browser, using both segmented local-association plots and genome-wide Manhattan plots."
#> 
#> $db$item$seeqtl$publication
#> [1] "Xia K, Shabalin A A, Huang S, et al. seeQTL: a searchable database for human eQTLs[J]. Bioinformatics, 2011, 28(3): 451-452. PMID:22171328"
#> 
#> $db$item$seeqtl$url
#> [1] "http://www.bios.unc.edu/research/genomic_software/seeQTL/"
#> 
#> 
#> $db$item$snipa3
#> $db$item$snipa3$description
#> [1] "SNiPA offers both functional annotations and linkage disequilibrium information for bi-allelic genomic variants (SNPs and SNVs). SNiPA combines LD data based on the 1000 Genomes Project with various annotation layers, such as gene annotations, phenotypic trait associations, and expression-/metabolic quantitative trait loci. See the documentation for all data sources integrated into SNiPA. For information on updates and new releases, see the Release Notes."
#> 
#> $db$item$snipa3$publication
#> [1] "Arnold, M., Raffler, J., Pfeufer, A., Suhre, K., & Kastenmüller, G. (2014). SNiPA: an interactive, genetic variant-centered annotation browser. Bioinformatics, 31(8), 1334-1336."
#> 
#> $db$item$snipa3$url
#> [1] "http://snipa.helmholtz-muenchen.de/snipa3"
#> 
#> 
#> $db$item$srnanalyzer
#> $db$item$srnanalyzer$description
#> [1] "sRNAnalyzer is a flexible, modular pipeline for the analysis of small RNA sequencing data."
#> 
#> $db$item$srnanalyzer$publication
#> [1] "Wu X, Kim TK, Baxter D, Scherler K, Gordon A, Fong O, Etheridge A, Galas DJ, Wang K. (2017) sRNAnalyzer—a flexible and customizable small RNA sequencing data analysis pipeline. Nucleic Acids Research"
#> 
#> $db$item$srnanalyzer$url
#> [1] "http://srnanalyzer.systemsbiology.net/"
#> 
#> 
#> $db$item$superdrug2
#> $db$item$superdrug2$description
#> [1] "SuperDRUG2 database is a unique, one-stop resource for approved/marketed drugs, containing more than 4,500 active pharmaceutical ingredients. We annotated drugs with regulatory details, chemical structures (2D and 3D), dosage, biological targets, physicochemical properties, external identifiers, side-effects and pharmacokinetic data. Different search mechanisms allow navigation through the chemical space of approved drugs. A 2D chemical structure search is provided in addition to a 3D superposition feature that superposes a drug with ligands already known to be found in the experimentally determined protein-ligand complexes. For the first time, we introduced simulation of \\\"physiologically-based\\\" pharmacokinetics of drugs. Our interaction check feature not only identifies potential drug-drug interactions but also provides alternative recommendations for elderly patients."
#> 
#> $db$item$superdrug2$publication
#> [1] "GB/T 7714 Siramshetty V B, Eckert O A, Gohlke B O, et al. SuperDRUG2: a one stop resource for approved/marketed drugs[J]. Nucleic Acids Research, 2017."
#> 
#> $db$item$superdrug2$url
#> [1] "http://cheminfo.charite.de/superdrug2"
#> 
#> 
#> $db$item$tumorfusions
#> $db$item$tumorfusions$description
#> [1] "Gene fusion represents a class of molecular aberrations in cancer and has been exploited for therapeutic purposes. In this paper we describe TumorFusions, a data portal that catalogues 20 731 gene fusions detected in 9966 well characterized cancer samples and 648 normal specimens from The Cancer Genome Atlas (TCGA). The portal spans 33 cancer types in TCGA. Fusion transcripts were identified via a uniform pipeline, including filtering against a list of 3838 transcript fusions detected in a panel of 648 non-neoplastic samples. Fusions were mapped to somatic DNA rearrangements identified using whole genome sequencing data from 561 cancer samples as a means of validation. We observed that 65% of transcript fusions were associated with a chromosomal alteration, which is annotated in the portal. Other features of the portal include links to SNP array-based copy number levels and mutational patterns, exon and transcript level expressions of the partner genes, and a network-based centrality score for prioritizing functional fusions. Our portal aims to be a broadly applicable and user friendly resource for cancer gene annotation and is publicly available at http://www.tumorfusions.org."
#> 
#> $db$item$tumorfusions$publication
#> [1] "Hu X, Wang Q, Tang M, et al. TumorFusions: an integrative resource for cancer-associated transcript fusions[J]. Nucleic Acids Research, 2017."
#> 
#> $db$item$tumorfusions$url
#> [1] "http://www.tumorfusions.org"
#> 
#> 
#> $db$item$varcards
#> $db$item$varcards$description
#> [1] "VarCards: an integrated genetic and clinical database for coding variants in the human genome"
#> 
#> $db$item$varcards$publication
#> [1] "Li J, Shi L, Zhang K, et al. VarCards: an integrated genetic and clinical database for coding variants in the human genome[J]. Nucleic Acids Research, 2017."
#> 
#> $db$item$varcards$url
#> [1] "http://varcards.biols.ac.cn"
#> 
#> 
#> 
#> 
#> $github
#> $github$cfg_meta
#> $github$cfg_meta$avaliable_cfg
#> [1] "github.toml"
#> 
#> $github$cfg_meta$cfg_dir
#> [1] "@>@system.file('extdata', 'config/github', package = 'BioInstaller')@<@"
#> 
#> $github$cfg_meta$prefix_url
#> [1] "https://raw.githubusercontent.com/JhuangLab/BioInstaller/master/inst/extdata/config/github"
#> 
#> 
#> $github$item
#> $github$item$arnapipe
#> $github$item$arnapipe$description
#> [1] "The wide range of RNA-seq applications and their high-computational needs require the development of pipelines orchestrating the entire workflow and optimizing usage of available computational resources. We present aRNApipe, a project-oriented pipeline for processing of RNA-seq data in high-performance cluster environments. aRNApipe is highly modular and can be easily migrated to any high-performance computing (HPC) environment. The current applications included in aRNApipe combine the essential RNA-seq primary analyses, including quality control metrics, transcript alignment, count generation, transcript fusion identification, alternative splicing and sequence variant calling. aRNApipe is project-oriented and dynamic so users can easily update analyses to include or exclude samples or enable additional processing modules. Workflow parameters are easily set using a single configuration file that provides centralized tracking of all analytical processes. Finally, aRNApipe incorporates interactive web reports for sample tracking and a tool for managing the genome assemblies available to perform an analysis."
#> 
#> $github$item$arnapipe$publication
#> [1] "Alonso A, Lasseigne B N, Williams K, et al. aRNApipe: a balanced, efficient and distributed pipeline for processing RNA-seq data in high-performance computing environments[J]. Bioinformatics, 2017, 33(11): 1727-1729."
#> 
#> $github$item$arnapipe$title
#> [1] "a project-oriented pipeline for processing of RNA-seq data in high performance cluster environments"
#> 
#> 
#> $github$item$bwa
#> $github$item$bwa$description
#> [1] "BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM. The first algorithm is designed for Illumina sequence reads up to 100bp, while the rest two for longer sequences ranged from 70bp to a few megabases. BWA-MEM and BWA-SW share similar features such as the support of long reads and chimeric alignment, but BWA-MEM, which is the latest, is generally recommended as it is faster and more accurate. BWA-MEM also has better performance than BWA-backtrack for 70-100bp Illumina reads."
#> 
#> $github$item$bwa$publication
#> [1] "Li H. and Durbin R. (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25, 1754-1760. [PMID: 19451168]"
#> [2] "Li H. and Durbin R. (2010) Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics, 26, 589-595. [PMID: 20080505]"   
#> [3] "Li H. (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:1303.3997v2 [q-bio.GN]"                            
#> 
#> $github$item$bwa$title
#> [1] "Burrow-Wheeler Aligner for pairwise alignment between DNA sequences"
#> 
#> 
#> $github$item$chronqc
#> $github$item$chronqc$description
#> [1] "ChronQC is a quality control (QC) tracking system for clinical implementation of next-generation sequencing (NGS). ChronQC generates time series plots for various QC metrics to allow comparison of current runs to historical runs. ChronQC has multiple features for tracking QC data including Westgard rules for clinical validity, laboratory-defined thresholds, and historical observations within a specified time period. Users can record their notes and corrective actions directly onto the plots for long-term recordkeeping. ChronQC facilitates regular monitoring of clinical NGS to enable adherence to high quality clinical standards."
#> 
#> $github$item$chronqc$publication
#> [1] "Tawari N R, Seow J J W, Dharuman P, et al. ChronQC: A Quality Control Monitoring System for Clinical Next Generation Sequencing[J]. Bioinformatics, 2017."
#> 
#> $github$item$chronqc$title
#> [1] "ChronQC: A Quality Control Monitoring System for Clinical Next Generation Sequencing"
#> 
#> 
#> $github$item$dart
#> $github$item$dart$description
#> [1] "We proposed a novel RNA-seq de novo mapping algorithm, call DART, which adopts a partitioning strategy to avoid the extension step. The experiment results on synthetic datasets and real NGS datasets showed that DART is a highly efficient aligner that yields the highest or comparable sensitivity and accuracy compared to most state-of-the-art aligners, and more importantly, it spends the least amount of time among the selected aligners."
#> 
#> $github$item$dart$publication
#> [1] "Lin H N, Hsu W L. DART: a fast and accurate RNA-seq mapper with a partitioning strategy[J]. Bioinformatics, 2017, 34(2): 190-197."
#> 
#> $github$item$dart$title
#> [1] "DART: a fast and accurate RNA-seq mapper with a partitioning strategy"
#> 
#> 
#> $github$item$giggle
#> $github$item$giggle$description
#> [1] "GIGGLE is a genomics search engine that identifies and ranks the significance of genomic loci shared between query features and thousands of genome interval files. GIGGLE (https:// github.com/ryanlayer/giggle) scales to billions of intervals and is over three orders of magnitude faster than existing methods. Its speed extends the accessibility and utility of resources such as ENCODE , Roadmap Epigenomics, and GTE x by facilitating data integration and hypothesis generation."
#> 
#> $github$item$giggle$publication
#> [1] "Layer, R.M. et al. GIGGLE: a search engine for large-scale integrated genome analysis. Nat Methods (2018)."
#> 
#> $github$item$giggle$title
#> [1] "GIGGLE: a search engine for large-scale integrated genome analysis"
#> 
#> 
#> $github$item$multiqc
#> $github$item$multiqc$description
#> [1] "MultiQC is a tool to create a single report with interactive plots for multiple bioinformatics analyses across many samples.MultiQC is written in Python (tested with v2.7, 3.4, 3.5 and 3.6). It is available on the Python Package Index and through conda using Bioconda.\\nReports are generated by scanning given directories for recognised log files. These are parsed and a single HTML report is generated summarising the statistics for all logs found. MultiQC reports can describe multiple analysis steps and large numbers of samples within a single plot, and multiple analysis tools making it ideal for routine fast quality control."
#> 
#> $github$item$multiqc$publication
#> [1] "Ewels P, Magnusson M, Lundin S, et al. MultiQC: summarize analysis results for multiple tools and samples in a single report[J]. Bioinformatics, 2016, 32(19): 3047-3048."
#> 
#> $github$item$multiqc$title
#> [1] "Aggregate results from bioinformatics analyses across many samples into a single report."
#> 
#> 
#> $github$item$ngs_qc_toolkit
#> $github$item$ngs_qc_toolkit$description
#> [1] "Next generation sequencing (NGS) technologies provide a high-throughput means to generate large amount of sequence data. However, quality control (QC) of sequence data generated from these technologies is extremely important for meaningful downstream analysis. Further, highly efficient and fast processing tools are required to handle the large volume of datasets. Here, we have developed an application, NGS QC Toolkit, for quality check and filtering of high-quality data. This toolkit is a standalone and open source application freely available at http://www.nipgr.res.in/ngsqctoolkit.html. All the tools in the application have been implemented in Perl programming language. The toolkit is comprised of user-friendly tools for QC of sequencing data generated using Roche 454 and Illumina platforms, and additional tools to aid QC (sequence format converter and trimming tools) and analysis (statistics tools). A variety of options have been provided to facilitate the QC at user-defined parameters. The toolkit is expected to be very useful for the QC of NGS data to facilitate better downstream analysis."
#> 
#> $github$item$ngs_qc_toolkit$publication
#> [1] "Patel R K, Jain M. NGS QC Toolkit: a toolkit for quality control of next generation sequencing data[J]. PloS one, 2012, 7(2): e30619."
#> 
#> $github$item$ngs_qc_toolkit$title
#> [1] "A toolkit for the quality control (QC) of next generation sequencing (NGS) data"
#> 
#> 
#> $github$item$olego
#> $github$item$olego$description
#> [1] "A crucial step in analyzing mRNA-Seq data is to accurately and efficiently map hundreds of millions of reads to the reference genome and exon junctions. Here we present OLego, an algorithm specifically designed for de novo mapping of spliced mRNASeq reads. OLego adopts a multiple-seed-andextend scheme, and does not rely on a separate external aligner. It achieves high sensitivity of junction detection by strategic searches with small seeds (14 nt for mammalian genomes). To improve accuracy and resolve ambiguous mapping at junctions, OLego uses a built-in statistical model to score exon junctions by splice-site strength and intron size. Burrows–Wheeler transform is used in multiple steps of the algorithm to efficiently map seeds, locate junctions and identify small exons. OLego is implemented in C++ with fully multithreaded execution, and allows fast processing of large-scale data. We systematically evaluated the performance of OLego in comparison with published tools using both simulated and real data. OLego demonstrated better sensitivity, higher or comparable accuracy and substantially improved speed. OLego also identified hundreds of novel micro-exons (<30 nt) in the mouse transcriptome, many of which are phylogenetically conserved and can be validated experimentally in vivo. OLego is freely available at http://zhanglab.c2b2.columbia. edu/index.php/OLego."
#> 
#> $github$item$olego$publication
#> [1] "Wu J, et al. OLego: fast and sensitive mapping of spliced mRNA-Seq reads using small seeds[J]. Nucleic acids research, 2013, 41(10): 5149-5163."
#> 
#> $github$item$olego$title
#> [1] "OLego: fast and sensitive mapping of spliced mRNA-Seq reads using small seeds"
#> 
#> 
#> $github$item$radia
#> $github$item$radia$description
#> [1] "RADIA identifies RNA and DNA variants in BAM files. RADIA is typically run on 3 BAM files consisting of the Normal DNA, Tumor DNA and Tumor RNA. If no RNA is available from the tumor, then it is run on the normal/tumor pairs. For the normal DNA, RADIA outputs any differences compared to the reference which could be potential Germline mutations. For the tumor DNA, RADIA outputs any differences compared to the reference and the normal DNA which could be potential Somatic mutations. RADIA combines the tumor DNA and tumor RNA to augment the somatic mutation calls. It also uses the tumor RNA to identify potential RNA editing events.\\nThe DNA Only Method (DOM) uses just the tumor/normal pairs of DNA (ignoring the RNA), while the Triple BAM Method (TBM) uses all three datasets from the same patient to detect somatic mutations. The mutations from the TBM are further categorized into 2 sub-groups: RNA Confirmation and RNA Rescue calls. RNA Confirmation calls are those that are made by both the DOM and the TBM due to the strong read support in both the DNA and RNA. RNA Rescue calls are those that had very little DNA support, hence not called by the DOM, but strong RNA support, and thus called by the TBM. RNA Rescue calls are typically missed by traditional methods that only interrogate the DNA."
#> 
#> $github$item$radia$publication
#> [1] "Radenbaugh AJ, Ma S, Ewing A, Stuart JM, Collisson EA, Zhu J, Haussler D. (2014) RADIA: RNA and DNA Integrated Analysis for Somatic Mutation Detection. PLoS ONE 9(11): e111516. doi:10.1371/journal.pone.0111516"
#> 
#> $github$item$radia$title
#> [1] "RADIA: RNA and DNA Integrated Analysis for Somatic Mutation Detection"
#> 
#> 
#> $github$item$resm
#> $github$item$resm$description
#> [1] "RSEM is a software package for estimating gene and isoform expression levels from RNA-Seq data. The RSEM package provides an user-friendly interface, supports threads for parallel computation of the EM algorithm, single-end and paired-end read data, quality scores, variable-length reads and RSPD estimation. In addition, it provides posterior mean and 95% credibility interval estimates for expression levels. For visualization, It can generate BAM and Wiggle files in both transcript-coordinate and genomic-coordinate. Genomic-coordinate files can be visualized by both UCSC Genome browser and Broad Institute's Integrative Genomics Viewer (IGV). Transcript-coordinate files can be visualized by IGV. RSEM also has its own scripts to generate transcript read depth plots in pdf format. The unique feature of RSEM is, the read depth plots can be stacked, with read depth contributed to unique reads shown in black and contributed to multi-reads shown in red. In addition, models learned from data can also be visualized. Last but not least, RSEM contains a simulator."
#> 
#> $github$item$resm$publication
#> [1] "Li B, Dewey C N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome[J]. BMC bioinformatics, 2011, 12(1): 323."
#> 
#> $github$item$resm$title
#> [1] "RSEM: accurate quantification of gene and isoform expression from RNA-Seq data"
#> 
#> 
#> $github$item$rhat
#> $github$item$rhat$description
#> [1] "MOTIVATION:Single Molecule Real-Time (SMRT) sequencing has been widely applied in cutting-edge genomic studies. However, it is still an expensive task to align the noisy long SMRT reads to reference genome by state-of-the-art aligners, which is becoming a bottleneck in applications with SMRT sequencing. Novel approach is on demand for improving the efficiency and effectiveness of SMRT read alignment.RESULTS:We propose Regional Hashing-based Alignment Tool (rHAT), a seed-and-extension-based read alignment approach specifically designed for noisy long reads. rHAT indexes reference genome by regional hash table (RHT), a hash table-based index which describes the short tokens within local windows of reference genome. In the seeding phase, rHAT utilizes RHT for efficiently calculating the occurrences of short token matches between partial read and local genomic windows to find highly possible candidate sites. In the extension phase, a sparse dynamic programming-based heuristic approach is used for reducing the cost of aligning read to the candidate sites. By benchmarking on the real and simulated datasets from various prokaryote and eukaryote genomes, we demonstrated that rHAT can effectively align SMRT reads with outstanding throughput."
#> 
#> $github$item$rhat$publication
#> [1] "Liu B, Guan D, Teng M, et al. rHAT: fast alignment of noisy long reads with regional hashing[J]. Bioinformatics, 2015, 32(11): 1625-1631."
#> 
#> $github$item$rhat$title
#> [1] "rHAT: fast alignment of noisy long reads with regional hashing."
#> 
#> 
#> $github$item$trimgalore
#> $github$item$trimgalore$title
#> [1] "A wrapper around Cutadapt and FastQC to consistently apply adapter and quality trimming to FastQ files, with extra functionality for RRBS data"
#> 
#> 
#> 
#> 
#> $nongithub
#> $nongithub$cfg_meta
#> $nongithub$cfg_meta$avaliable_cfg
#> [1] "nongithub.toml"
#> 
#> $nongithub$cfg_meta$cfg_dir
#> [1] "@>@system.file('extdata', 'config/nongithub', package = 'BioInstaller')@<@"
#> 
#> $nongithub$cfg_meta$prefix_url
#> [1] "https://raw.githubusercontent.com/JhuangLab/BioInstaller/master/inst/extdata/config/nongithub"
#> 
#> 
#> $nongithub$item
#> $nongithub$item$absolute
#> $nongithub$item$absolute$description
#> [1] "When DNA is extracted from an admixed population of cancer and normal cells, the information on absolute copy number per cancer cell is lost in the mixing.  The purpose of ABSOLUTE is to re-extract these data from the mixed DNA population.  This process begins by generation of segmented copy number data, which is input to the ABSOLUTE algorithm together with pre-computed models of recurrent cancer karyotypes and, optionally, allelic fraction values for somatic point mutations.  The output of ABSOLUTE then provides re-extracted information on the absolute cellular copy number of local DNA segments and, for point mutations, the number of mutated alleles."
#> 
#> $nongithub$item$absolute$publication
#> [1] "Carter S L, Cibulskis K, Helman E, et al. Absolute quantification of somatic DNA alterations in human cancer[J]. Nature biotechnology, 2012, 30(5): 413-421."
#> 
#> $nongithub$item$absolute$title
#> [1] "ABSOLUTE can estimate purity/ploidy, and from that compute absolute copy-number and mutation multiplicities."
#> 
#> 
#> $nongithub$item$atlas2
#> $nongithub$item$atlas2$description
#> [1] "Atlas2 is a next-generation sequencing suite of variant analysis tools specializing in the separation of true SNPs and insertions and deletions (indels) from sequencing and mapping errors in Whole Exome Capture Sequencing (WECS) data."
#> 
#> $nongithub$item$atlas2$publication
#> [1] "Challis D. etc. An integrative variant analysis suite for whole exome next-generation sequencing data. BMC Bioinformatics 2012, 13:8 doi:10.1186/1471-2105-13-8"
#> 
#> $nongithub$item$atlas2$title
#> [1] "Atlas2, next-generation sequencing suite of variant analysis tools specializing in the separation of true SNPs and insertions and deletions (indels)"
#> 
#> 
#> $nongithub$item$beagle
#> $nongithub$item$beagle$description
#> [1] "Beagle version 4.1 has a more accurate genotype phasing algorithm and a very fast and accurate genotype imputation algorithm. Version 4.1 also has several changes to the command line arguments which are described in the release notes. The \\\"ped\\\" argument has no effect in version 4.1. If your data contains nuclear families and you want to model the parent-offspring relationships when phasing genotypes, please use version 4.0."
#> 
#> $nongithub$item$beagle$publication
#> [1] "S R Browning and B L Browning (2007) Rapid and accurate haplotype phasing and missing data inference for whole genome association studies by use of localized haplotype clustering. Am J Hum Genet 81:1084-1097. doi:10.1086/521987"
#> [2] "B L Browning and S R Browning (2013). Improving the accuracy and efficiency of identity-by-descent detection in population data. Genetics 194(2):459-71. doi:10.1534/genetics.113.150029"                                           
#> [3] "B L Browning and S R Browning (2016). Genotype imputation with millions of reference samples. Am J Hum Genet 98:116-126. doi:10.1016/j.ajhg.2015.11.020"                                                                            
#> 
#> $nongithub$item$beagle$title
#> [1] "Beagle, a software package that performs genotype calling, genotype phasing, imputation of ungenotyped markers, and identity-by-descent segment detection."
#> 
#> 
#> $nongithub$item$contest
#> $nongithub$item$contest$description
#> [1] "Here, we present ContEst, a tool for estimating the level of cross-individual contamination in next-generation sequencing data. We demonstrate the accuracy of ContEst across a range of contamination levels, sources and read depths using sequencing data mixed in silico at known concentrations. We applied our tool to published cancer sequencing datasets and report their estimated contamination levels."
#> 
#> $nongithub$item$contest$publication
#> [1] "Cibulskis K, Mckenna A, Fennell T, et al. ContEst: estimating cross-contamination of human samples in next-generation sequencing data[J]. Bioinformatics, 2011, 27(18):2601-2602."
#> 
#> $nongithub$item$contest$title
#> [1] "ContEst is a tool (and method) for estimating the amount of cross-sample contamination in next generation sequencing data.  Using a Bayesian framework, contamination levels are estimated from array based genotypes and sequencing reads."
#> 
#> 
#> $nongithub$item$gmap
#> $nongithub$item$gmap$description
#> [1] "The programs GMAP and GSNAP, for aligning RNA-Seq and DNA-Seq datasets to genomes, have evolved along with advances in biological methodology to handle longer reads, larger volumes of data, and new types of biological assays. The genomic representation has been improved to include linear genomes that can compare sequences using single-instruction multiple-data (SIMD) instructions, compressed genomic hash tables with fast access using SIMD instructions, handling of large genomes with more than four billion bp, and enhanced suffix arrays (ESAs) with novel data structures for fast access. Improvements to the algorithms have included a greedy match-and-extend algorithm using suffix arrays, segment chaining using genomic hash tables, diagonalization using segmental hash tables, and nucleotide-level dynamic programming procedures that use SIMD instructions and eliminate the need for F-loop calculations. Enhancements to the functionality of the programs include standardization of indel positions, handling of ambiguous splicing, clipping and merging of overlapping paired-end reads, and alignments to circular chromosomes and alternate scaffolds. The programs have been adapted for use in pipelines by integrating their usage into R/Bioconductor packages such as gmapR and HTSeqGenie, and these pipelines have facilitated the discovery of numerous biological phenomena."
#> 
#> $nongithub$item$gmap$publication
#> [1] "Wu T D, Watanabe C K. GMAP: a genomic mapping and alignment program for mRNA and EST sequences[J]. Bioinformatics, 2005, 21(9): 1859-1875."                                                            
#> [2] "Wu T D, Reeder J, Lawrence M, et al. GMAP and GSNAP for genomic sequence alignment: enhancements to speed, accuracy, and functionality[J]. Statistical Genomics: Methods and Protocols, 2016: 283-334."
#> 
#> $nongithub$item$gmap$publication_date
#> [1] 2005 2016
#> 
#> $nongithub$item$gmap$publication_doi
#> [1] "10.1093/bioinformatics/bti310" "10.1007/978-1-4939-3578-9_15" 
#> 
#> $nongithub$item$gmap$tag
#> [1] "Genomics"                       "NGS"                           
#> [3] "Genomic alignment"              "DNA-seq"                       
#> [5] "RNA-seq"                        "mRNA"                          
#> [7] "Whole Transcriptome Sequencing" "EST"                           
#> 
#> $nongithub$item$gmap$title
#> [1] "GMAP: A Genomic Mapping and Alignment Program for mRNA and EST Sequences, and GSNAP: Genomic Short-read Nucleotide Alignment Program"
#> 
#> 
#> $nongithub$item$gridss
#> $nongithub$item$gridss$tag
#> [1] "NGS" "SV" 
#> 
#> $nongithub$item$gridss$title
#> [1] "GRIDSS: sensitive and specific genomic rearrangement detection using positional de Bruijn graph assembly."
#> 
#> 
#> $nongithub$item$hapseg
#> $nongithub$item$hapseg$description
#> [1] "The HAPSEG module takes single nucleotide polymorphism (SNP) microarray data and outputs copy number data segmented by haplotype.  The output data is suitable for use as input data for the ABSOLUTE module. More detail see http://software.broadinstitute.org/cancer/software/genepattern/modules/docs/HAPSEG/1"
#> 
#> $nongithub$item$hapseg$publication
#> [1] "Carter SL, Meyerson M, Getz G. Accurate estimation of homologue-specific DNA concentration-ratios in cancer samples allows long-range haplotyping. Available from Nature Precedings; 2011."
#> 
#> $nongithub$item$hapseg$title
#> [1] "A probabilistic method to interpret bi-allelic marker data in cancer samples."
#> 
#> 
#> $nongithub$item$igv
#> $nongithub$item$igv$description
#> [1] "The Integrative Genomics Viewer (IGV) is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets. It supports a wide variety of data types, including array-based and next-generation sequence data, and genomic annotations."
#> 
#> $nongithub$item$igv$publication
#> [1] "Integrative Genomics Viewer. Nature Biotechnology 29, 24–26 (2011)"                                                                              
#> [2] "Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Briefings in Bioinformatics 14, 178-192 (2013)."
#> 
#> $nongithub$item$igv$title
#> [1] "The Integrative Genomics Viewer (IGV)"
#> 
#> 
#> $nongithub$item$interproscan
#> $nongithub$item$interproscan$description
#> [1] "InterProScan is the software package that allows sequences (protein and nucleic) to be scanned against InterPro's signatures. Signatures are predictive models, provided by several different databases, that make up the InterPro consortium."
#> 
#> $nongithub$item$interproscan$tag
#> [1] "Protein"        "Classification"
#> 
#> $nongithub$item$interproscan$title
#> [1] "Protein sequence analysis & classification"
#> 
#> 
#> $nongithub$item$marina
#> $nongithub$item$marina$description
#> [1] "MARINA (Master Regulator Inference Algorithm) MAster Regulator INference algorithm (MARINa), designed to infer transcription factors (TFs) controlling the transition between the two phenotypes, A and B, and the maintenance of the latter phenotype. Expression at the mRNA level is often a poor predictor of a TF's regulatory activity and an even worst predictor of its biological relevance in regulating phenotype-specific programs. To obviate this problem, MARINa infers TF activity from the global transcriptional activation of its regulon (i.e. its activated and repressed targets) and its biological relevance by TF-regulon overlap with phenotype-specific programs."
#> 
#> $nongithub$item$marina$publication
#> [1] "Lefebvre C, Rajbhandari P, Alvarez MJ, Bandaru P, Lim WK, Sato M, Wang K, Sumazin P, Kustagi M, Bisikirska BC, Basso K, Beltrao P, Krogan N, Gautier J, Dalla-Favera R, Califano A. A human B-cell interactome identifies MYB and FOXM1 as master regulators of proliferation in germinal centers. Mol Syst Biol. 2010 Jun 8;6:377."
#> 
#> $nongithub$item$marina$title
#> [1] "Master Regulator Inference Algorithm"
#> 
#> 
#> $nongithub$item$meerkat
#> $nongithub$item$meerkat$description
#> [1] "Identification of somatic rearrangements in cancer genomes has accelerated through analysis of high-throughput sequencing data. However, characterization of complex structural alterations and their underlying mechanisms remains inadequate. Here, applying an algorithm to predict structural variations from short reads, we report a comprehensive catalog of somatic structural variations and the mechanisms generating them, using high-coverage whole-genome sequencing data from 140 patients across ten tumor types. We characterize the relative contributions of different types of rearrangements and their mutational mechanisms, find that ∼20% of the somatic deletions are complex deletions formed by replication errors, and describe the differences between the mutational mechanisms in somatic and germline alterations. Importantly, we provide detailed reconstructions of the events responsible for loss of CDKN2A/B and gain of EGFR in glioblastoma, revealing that these alterations can result from multiple mechanisms even in a single genome and that both DNA double-strand breaks and replication errors drive somatic rearrangements."
#> 
#> $nongithub$item$meerkat$publication
#> [1] "Yang L, Luquette L J, Gehlenborg N, et al. Diverse Mechanisms of Somatic Structural Variations in Human Cancer Genomes[J]. Cell, 2013, 153(4):919-29."
#> 
#> $nongithub$item$meerkat$title
#> [1] "http://dx.doi.org/10.1016/j.cell.2013.04.010"
#> 
#> 
#> $nongithub$item$mutsig
#> $nongithub$item$mutsig$description
#> [1] "MutSig (for \\\"Mutation Significance\\\") is a package of tools for analyzing mutation data.  It operates on a cohort of patients and identifies mutations, genes, and other genomic elements predicted to be driver candidates.\\n"
#> 
#> $nongithub$item$mutsig$publication
#> [1] "Lawrence, M. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214-218 (2013) http://dx.doi.org/10.1038/nature12213"
#> 
#> $nongithub$item$mutsig$title
#> [1] "Mutational heterogeneity in cancer and the search for new cancer-associated genes"
#> 
#> 
#> $nongithub$item$paradigm
#> $nongithub$item$paradigm$description
#> [1] "High-dimensional ‘-omics’ profiling provides a detailed molecular view of individual cancers; however, understanding the mechanisms by which tumors evade cellular defenses requires deep knowledge of the underlying cellular pathways within each cancer sample. We extended the PARADIGM algorithm (Vaske et al., 2010, Bioinformatics, 26, i237–i245), a pathway analysis method for combining multiple ‘-omics’ data types, to learn the strength and direction of 9139 gene and protein interactions curated from the literature. Using genomic and mRNA expression data from 1936 samples in The Cancer Genome Atlas (TCGA) cohort, we learned interactions that provided support for and relative strength of 7138 (78%) of the curated links. Gene set enrichment found that genes involved in the strongest interactions were significantly enriched for transcriptional regulation, apoptosis, cell cycle regulation and response to tumor cells. Within the TCGA breast cancer cohort, we assessed different interaction strengths between breast cancer subtypes, and found interactions associated with the MYC pathway and the ER alpha network to be among the most differential between basal and luminal A subtypes. PARADIGM with the Naive Bayesian assumption produced gene activity predictions that, when clustered, found groups of patients with better separation in survival than both the original version of PARADIGM and a version without the assumption. We found that this Naive Bayes assumption was valid for the vast majority of co-regulators, indicating that most co-regulators act independently on their shared target."
#> 
#> $nongithub$item$paradigm$publication
#> [1] "Sedgewick A J, Benz S C, Rabizadeh S, et al. Learning subgroup-specific regulatory interactions and regulator independence with PARADIGM[J]. Bioinformatics, 2013, 29(13): i62-i70. https://doi.org/10.1093/bioinformatics/btt229"
#> 
#> $nongithub$item$paradigm$title
#> [1] "PAthway Representation and Analysis by Direct Inference on Graphical Models"
#> 
#> 
#> $nongithub$item$prada
#> $nongithub$item$prada$description
#> [1] "Massively parallel sequencing of cDNA reverse transcribed from RNA (RNASeq) provides an accurate estimate of the quantity and composition of mRNAs. To characterize the transcriptome through the analysis of RNA-seq data, we developed PRADA. PRADA focuses on the processing and analysis of gene expression estimates, supervised and unsupervised gene fusion identification, and supervised intragenic deletion identification. PRADA currently supports 7 modules to process and identify abnormalities from RNAseq data:\\npreprocess: Generates aligned and recalibrated BAM files.\\nexpression: Generates gene expression (RPKM) and quality metrics.\\nfusion: Identifies candidate gene fusions.\\nguess-ft: Supervised search for fusion transcripts.\\nguess-if: Supervised search for intragenic fusions.\\nhomology: Calculates homology between given two genes.\\nframe: Predicts functional consequence of fusion transcript"
#> 
#> $nongithub$item$prada$publication
#> [1] "PRADA: pipeline for RNA sequencing data analysis[J]. Bioinformatics, 2014, 30(15): 2224-2226. https://doi.org/10.1093/bioinformatics/btu169"
#> 
#> $nongithub$item$prada$title
#> [1] "PRADA : Pipeline for RNA-Sequencing Data Analysis"
#> 
#> 
#> $nongithub$item$rmats
#> $nongithub$item$rmats$description
#> [1] "MATS is a computational tool to detect differential alternative splicing events from RNA-Seq data. The statistical model of MATS calculates the P-value and false discovery rate that the difference in the isoform ratio of a gene between two conditions exceeds a given user-defined threshold. From the RNA-Seq data, MATS can automatically detect and analyze alternative splicing events corresponding to all major types of alternative splicing patterns. MATS handles replicate RNA-Seq data from both paired and unpaired study design."
#> 
#> $nongithub$item$rmats$publication
#> [1] "Shen S., Park JW., Lu ZX., Lin L., Henry MD., Wu YN., Zhou Q., Xing Y. rMATS: Robust and Flexible Detection of Differential Alternative Splicing from Replicate RNA-Seq Data. PNAS, 111(51):E5593-601. doi: 10.1073/pnas.1419161111"                       
#> [2] "Park JW., Tokheim C., Shen S., Xing Y. Identifying differential alternative splicing events from RNA sequencing data using RNASeq-MATS. Methods in Molecular Biology: Deep Sequencing Data Analysis, 2013;1038:171-179 doi: 10.1007/978-1-62703-514-9_10"  
#> [3] "Shen S., Park JW., Huang J., Dittmar KA., Lu ZX., Zhou Q., Carstens RP., Xing Y. MATS: A Bayesian Framework for Flexible Detection of Differential Alternative Splicing from RNA-Seq Data. Nucleic Acids Research, 2012;40(8):e61 doi: 10.1093/nar/gkr1291"
#> 
#> $nongithub$item$rmats$title
#> [1] "Multivariate Analysis of Transcript Splicing (MATS)"
#> 
#> 
#> $nongithub$item$subread
#> $nongithub$item$subread$description
#> [1] "The Subread software package is a tool kit for processing next-gen sequencing data. It includes Subread aligner, Subjunc exon-exon junction detector and featureCounts read summarization program. Subread aligner can be used to align both gDNA-seq and RNA-seq reads. Subjunc aligner was specified designed for the detection of exon-exon junction. For the mapping of RNA-seq reads, Subread performs local alignments and Subjunc performs global alignments."
#> 
#> $nongithub$item$subread$publication
#> [1] "Yang Liao, Gordon K Smyth and Wei Shi. \\\"The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote\\\", Nucleic Acids Research, 2013, 41(10):e108"
#> 
#> $nongithub$item$subread$tag
#> [1] "NGS"     "aligner"
#> 
#> $nongithub$item$subread$title
#> [1] "High-performance read alignment, quantification and mutation discovery"
#> 
#> 
#> $nongithub$item$vadir
#> $nongithub$item$vadir$description
#> [1] "Advances in next-generation DNA sequencing technologies are now enabling detailedcharacterization of sequence variations in cancer genomes. With whole genome sequencing, variations in\\ncoding and non-coding sequences can be discovered. But the cost associated with it is currently limiting its\\ngeneral use in research. Whole exome sequencing is used to characterize sequence variations in coding regions,\\nbut the cost associated with capture reagents and biases in capture rate limit its full use in research. Additional\\nlimitations include uncertainty in assigning the functional signi\fcance of the mutations when these mutations\\nare observed in the non-coding region or in genes that are not expressed in cancer tissue.\\nWe investigated the feasibility of uncovering mutations from expressed genes using RNA sequencing\\ndatasets with a method called VaDiR: Variant Detection in RNA\\\" that integrate three variant callers, namely:\\nSNPiR, RVBoost and MuTect2. The combination of all three methods, which we called Tier1 variants,\\nproduced the highest precision with true positive mutations from RNA-seq that could be validated at the DNA\\nlevel. We also found that the integration of Tier1 variants with those called by MuTect2 and SNPiR produced\\nthe highest recall with acceptable precision. Finally, we observed higher rate of mutation discovery in genes\\nthat are expressed at higher levels."
#> 
#> $nongithub$item$vadir$publication
#> [1] "Neums L, Suenaga S, Beyerlein P, et al. VaDiR: an integrated approach to Variant Detection in RNA[J]. GigaScience, 2017. https://doi.org/10.1093/gigascience/gix122"
#> 
#> $nongithub$item$vadir$title
#> [1] "VaDiR: an integrated approach to Variant Detection in RNA"
#> 
#> 
#> $nongithub$item$vcfanno
#> $nongithub$item$vcfanno$description
#> [1] "vcfanno allows you to quickly annotate your VCF with any number of INFO fields from any number of VCFs or BED files. It uses a simple conf file to allow the user to specify the source annotation files and fields and how they will be added to the info of the query VCF."
#> 
#> $nongithub$item$vcfanno$publication
#> [1] "Pedersen B S, Layer R M, Quinlan A R. Vcfanno: fast, flexible annotation of genetic variants[J]. Genome Biology, 2016, 17(1):1-9."
#> 
#> $nongithub$item$vcfanno$tag
#> [1] "NGS"        "annotation"
#> 
#> $nongithub$item$vcfanno$title
#> [1] "annotate a VCF with other VCFs/BEDs/tabixed files"
#> 
#> 
#> 
#> 
#> $title
#> [1] "A library of useful WEB URL resource."
#> 
#> $web
#> $web$item
#> $web$item$cbioportal
#> $web$item$cbioportal$url
#> [1] "http://www.cbioportal.org/index.do"
#> 
#> 
#> $web$item$ensembl
#> $web$item$ensembl$ftp
#> [1] "ftp://ftp.ensembl.org/pub/"
#> 
#> $web$item$ensembl$url
#> [1] "http://www.ensembl.org/"
#> 
#> 
#> $web$item$kegg
#> $web$item$kegg$ftp
#> [1] "ftp://ftp.genome.jp/pub"
#> 
#> $web$item$kegg$url
#> [1] "http://www.kegg.jp/"
#> 
#> 
#> $web$item$ncbi
#> $web$item$ncbi$ftp
#> [1] "ftp://ftp.ncbi.nih.gov/pub"
#> 
#> $web$item$ncbi$url
#> [1] "https://www.ncbi.nlm.nih.gov/"
#> 
#> 
#> $web$item$rsnp3
#> $web$item$rsnp3$ftp
#> [1] "ftp://rv.psych.ac.cn/pub/rsnp3/"
#> 
#> $web$item$rsnp3$url
#> [1] "http://rsnp3.psych.ac.cn/index.do"
#> 
#> 
#> $web$item$tcga_gdc
#> $web$item$tcga_gdc$url
#> [1] "https://portal.gdc.cancer.gov/search/s?facetTab=cases&filters=%7B%22op%22:%22and%22,%22content%22:%5B%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.program.name%22,%22value%22:%5B%22TCGA%22%5D%7D%7D%5D%7D"
#> 
#> 
#> $web$item$uniprot
#> $web$item$uniprot$ftp
#> [1] "ftp://ftp.uniprot.org/pub/databases/uniprot"
#> 
#> $web$item$uniprot$url
#> [1] "http://www.uniprot.org/"

# Examples of get.meta
db_cfg_meta <- get.meta(value = "cfg_meta", config = 'db')
db_cfg_meta
#> $avaliable_cfg
#> [1] "db_annovar.toml" "db_blast.toml"   "db_main.toml"   
#> 
#> $cfg_dir
#> [1] "@>@system.file('extdata', 'config/db', package = 'BioInstaller')@<@"
#> 
#> $prefix_url
#> [1] "https://raw.githubusercontent.com/JhuangLab/BioInstaller/master/inst/extdata/config/db/"

db_cfg_meta_parsed <- get.meta(value = 'cfg_meta', config = 'db', read.config.params = list(rcmd.parse = TRUE))
db_cfg_meta_parsed
#> $avaliable_cfg
#> [1] "db_annovar.toml" "db_blast.toml"   "db_main.toml"   
#> 
#> $cfg_dir
#> [1] "/tmp/RtmpZICqdL/Rinstd5f859170595/BioInstaller/extdata/config/db"
#> 
#> $prefix_url
#> [1] "https://raw.githubusercontent.com/JhuangLab/BioInstaller/master/inst/extdata/config/db/"

db_cfg_meta <- get.meta(config = 'github', value = 'item')
db_cfg_meta$bwa
#> $description
#> [1] "BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM. The first algorithm is designed for Illumina sequence reads up to 100bp, while the rest two for longer sequences ranged from 70bp to a few megabases. BWA-MEM and BWA-SW share similar features such as the support of long reads and chimeric alignment, but BWA-MEM, which is the latest, is generally recommended as it is faster and more accurate. BWA-MEM also has better performance than BWA-backtrack for 70-100bp Illumina reads."
#> 
#> $publication
#> [1] "Li H. and Durbin R. (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25, 1754-1760. [PMID: 19451168]"
#> [2] "Li H. and Durbin R. (2010) Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics, 26, 589-595. [PMID: 20080505]"   
#> [3] "Li H. (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:1303.3997v2 [q-bio.GN]"                            
#> 
#> $title
#> [1] "Burrow-Wheeler Aligner for pairwise alignment between DNA sequences"

# Get databases meta file
db_meta_file <- get.meta(config = 'db_meta_file')
db_meta_file
#> [1] "/tmp/RtmpZICqdL/Rinstd5f859170595/BioInstaller/extdata/config/db/db_meta.toml"
db_meta_file <- meta_files[["db_meta_file"]]
db_meta_file
#> [1] "/tmp/RtmpZICqdL/Rinstd5f859170595/BioInstaller/extdata/config/db/db_meta.toml"

Download databases

You can use install.bioinfo directly download the supported databases (since v0.3.0) or use the inst/config/db files (exclude db_meta.toml) as the nongithub.cfg parameter value.

# get all database name
library(stringr)
x <- install.bioinfo(show.all.names = T)
x <- x[str_detect(x, "^db_")]
x
#>   [1] "db_atcircdb"                      "db_biosystems"                   
#>   [3] "db_cancer_hotspots"               "db_cgi"                          
#>   [5] "db_circbase"                      "db_circnet"                      
#>   [7] "db_circrnadb"                     "db_civic"                        
#>   [9] "db_cscd"                          "db_denovo_db"                    
#>  [11] "db_dgidb"                         "db_differentialnet"              
#>  [13] "db_diseaseenhancer"               "db_disgenet"                     
#>  [15] "db_docm"                          "db_drugbank"                     
#>  [17] "db_ecodrug"                       "db_eggnog"                       
#>  [19] "db_exorbase"                      "db_expression_atlas"             
#>  [21] "db_exsnp"                         "db_fantom_cage_peaks"            
#>  [23] "db_fantom_co_expression_clusters" "db_fantom_enhancers"             
#>  [25] "db_fantom_motifs"                 "db_fantom_ontology"              
#>  [27] "db_fantom_tss_classifier"         "db_funcoup"                      
#>  [29] "db_gtex"                          "db_hgnc"                         
#>  [31] "db_hpo"                           "db_inbiomap"                     
#>  [33] "db_interpro"                      "db_intogen"                      
#>  [35] "db_lncediting"                    "db_medreaders"                   
#>  [37] "db_mndr"                          "db_msdd"                         
#>  [39] "db_omim_open"                     "db_omim_private"                 
#>  [41] "db_oncotator"                     "db_pancanqtl"                    
#>  [43] "db_proteinatlas"                  "db_rbp_var"                      
#>  [45] "db_rddpred"                       "db_remap"                        
#>  [47] "db_remap2"                        "db_rsnp3"                        
#>  [49] "db_rvarbase"                      "db_seecancer"                    
#>  [51] "db_seeqtl"                        "db_snipa3"                       
#>  [53] "db_srnanalyzer"                   "db_superdrug2"                   
#>  [55] "db_tumorfusions"                  "db_varcards"                     
#>  [57] "db_annovar_1000g"                 "db_annovar_1000g_sqlite"         
#>  [59] "db_annovar_avsift"                "db_annovar_avsnp"                
#>  [61] "db_annovar_avsnp_sqlite"          "db_annovar_brvar"                
#>  [63] "db_annovar_cadd"                  "db_annovar_cadd_sqlite"          
#>  [65] "db_annovar_cancer_hotspots"       "db_annovar_cg"                   
#>  [67] "db_annovar_civic_gene_summaries"  "db_annovar_clinvar"              
#>  [69] "db_annovar_clinvar_sqlite"        "db_annovar_cosmic"               
#>  [71] "db_annovar_cosmic_sqlite"         "db_annovar_cscd"                 
#>  [73] "db_annovar_darned_sqlite"         "db_annovar_dbnsfp"               
#>  [75] "db_annovar_dbnsfp_sqlite"         "db_annovar_dbscsnv"              
#>  [77] "db_annovar_dbscsnv_sqlite"        "db_annovar_dhs_gene_connectivity"
#>  [79] "db_annovar_disgenet"              "db_annovar_docm"                 
#>  [81] "db_annovar_eigen"                 "db_annovar_eigen_sqlite"         
#>  [83] "db_annovar_ensgene"               "db_annovar_epi_genes"            
#>  [85] "db_annovar_esp6500siv2"           "db_annovar_exac03"               
#>  [87] "db_annovar_exac03_sqlite"         "db_annovar_fathmm"               
#>  [89] "db_annovar_gdi_score"             "db_annovar_gerp"                 
#>  [91] "db_annovar_gme"                   "db_annovar_gme_sqlite"           
#>  [93] "db_annovar_gnomad"                "db_annovar_gnomad_sqlite"        
#>  [95] "db_annovar_gtex_eqtl_egenes"      "db_annovar_gtex_eqtl_pairs"      
#>  [97] "db_annovar_gwava"                 "db_annovar_gwava_sqlite"         
#>  [99] "db_annovar_hgnc"                  "db_annovar_hrcr1"                
#> [101] "db_annovar_hrcr1_sqlite"          "db_annovar_icgc21"               
#> [103] "db_annovar_icgc_sqlite"           "db_annovar_intervar"             
#> [105] "db_annovar_intervar_sqlite"       "db_annovar_intogen"              
#> [107] "db_annovar_kaviar"                "db_annovar_knowngene"            
#> [109] "db_annovar_ljb26_all"             "db_annovar_lncediting_sqlite"    
#> [111] "db_annovar_loftool_scores"        "db_annovar_mcap"                 
#> [113] "db_annovar_mcap_sqlite"           "db_annovar_mitimpact"            
#> [115] "db_annovar_nci60"                 "db_annovar_nci60_sqlite"         
#> [117] "db_annovar_normal_pool"           "db_annovar_omim_genemap2"        
#> [119] "db_annovar_popfreq"               "db_annovar_popfreq_sqlite"       
#> [121] "db_annovar_radar_sqlite"          "db_annovar_rddpred_sqlite"       
#> [123] "db_annovar_rediportal_sqlite"     "db_annovar_refgene"              
#> [125] "db_annovar_regsnpintron"          "db_annovar_revel"                
#> [127] "db_annovar_revel_sqlite"          "db_annovar_rvis_esv_score"       
#> [129] "db_annovar_seeqtl"                "db_annovar_snp"                  
#> [131] "db_annovar_tall_somatic_genes"    "db_annovar_tmcsnpdb"             
#> [133] "db_annovar_varcards"              "db_annovar_varcards_sqlite"      
#> [135] "db_ucsc_cytoband"                 "db_ucsc_dnase_clustered"         
#> [137] "db_ucsc_ensgene"                  "db_ucsc_knowngene"               
#> [139] "db_ucsc_refgene"                  "db_ucsc_tfbs_clustered"          
#> [141] "db_blast_env_nr"                  "db_blast_est_human"              
#> [143] "db_blast_est_mouse"               "db_blast_est_others"             
#> [145] "db_blast_gss"                     "db_blast_htgs"                   
#> [147] "db_blast_human_genomic"           "db_blast_landmark"               
#> [149] "db_blast_mouse_genomic"           "db_blast_nr"                     
#> [151] "db_blast_nt"                      "db_blast_other_genomic"          
#> [153] "db_blast_pataa"                   "db_blast_patnt"                  
#> [155] "db_blast_pdbaa"                   "db_blast_pdbnt"                  
#> [157] "db_blast_ref_prok_rep_genomes"    "db_blast_ref_viroids_rep_genomes"
#> [159] "db_blast_ref_viruses_rep_genomes" "db_blast_refseq_genomic"         
#> [161] "db_blast_refseq_protein"          "db_blast_refseq_rna"             
#> [163] "db_blast_refseqgene"              "db_blast_sts"                    
#> [165] "db_blast_swissprot"               "db_blast_taxdb"                  
#> [167] "db_blast_tsa_nr"                  "db_blast_tsa_nt"                 
#> [169] "db_blast_vector"

# all databases config 
db_cfg_meta <- get.meta(config = 'db', value = 'cfg_meta', 
                        read.config.params=list(rcmd.parse = TRUE))
cfg_dir <- db_cfg_meta$cfg_dir
cfg_dir
#> [1] "/tmp/RtmpZICqdL/Rinstd5f859170595/BioInstaller/extdata/config/db"
avaliable_cfg <- db_cfg_meta$avaliable_cfg
avaliable_cfg
#> [1] "db_annovar.toml" "db_blast.toml"   "db_main.toml"
sprintf("%s/%s", cfg_dir, avaliable_cfg)
#> [1] "/tmp/RtmpZICqdL/Rinstd5f859170595/BioInstaller/extdata/config/db/db_annovar.toml"
#> [2] "/tmp/RtmpZICqdL/Rinstd5f859170595/BioInstaller/extdata/config/db/db_blast.toml"  
#> [3] "/tmp/RtmpZICqdL/Rinstd5f859170595/BioInstaller/extdata/config/db/db_main.toml"

# ANNOVAR
download.dir <- sprintf('%s/db_annovar', tempdir())
config.toml <- system.file("extdata", "config/db/db_annovar.toml", 
  package = "BioInstaller")
#install.bioinfo('db_ucsc_refgene', download.dir = download.dir, 
#  nongithub.cfg = config.toml, extra.list = list(buildver = "hg19"))

# db_main
download.dir <- sprintf('%s/db_main', tempdir())
config.toml <- system.file("extdata", "config/db/db_main.toml", 
  package = "BioInstaller")
install.bioinfo('db_diseaseenhancer', download.dir = download.dir, 
  nongithub.cfg = config.toml)
#> INFO [2018-01-24 19:30:10] Debug:name:db_diseaseenhancer
#> INFO [2018-01-24 19:30:10] Debug:destdir:
#> INFO [2018-01-24 19:30:10] Debug:db:/tmp/Rtmpf6U2g1/filed608139ce4bc
#> INFO [2018-01-24 19:30:10] Debug:github.cfg:/tmp/RtmpZICqdL/Rinstd5f859170595/BioInstaller/extdata/config/github/github.toml
#> INFO [2018-01-24 19:30:10] Debug:nongithub.cfg:/tmp/RtmpZICqdL/Rinstd5f859170595/BioInstaller/extdata/config/db/db_main.toml
#> INFO [2018-01-24 19:30:10] Fetching db_diseaseenhancer versions....
#> INFO [2018-01-24 19:30:10] Install versions:diseaseEnh5.1
#> INFO [2018-01-24 19:30:10] Now start to install db_diseaseenhancer in /tmp/Rtmpf6U2g1/db_main.
#> INFO [2018-01-24 19:30:10] Running before install steps.
#> INFO [2018-01-24 19:30:10] Now start to download db_diseaseenhancer in /tmp/Rtmpf6U2g1/db_main.
#> INFO [2018-01-24 19:30:11] Running install steps.
#> INFO [2018-01-24 19:30:11] Running after install successful steps.
#> INFO [2018-01-24 19:30:11] Running change.info for db_diseaseenhancer and be saved to /tmp/Rtmpf6U2g1/filed608139ce4bc
#> INFO [2018-01-24 19:30:11] Debug:Install by Github configuration file: 
#> INFO [2018-01-24 19:30:11] Debug:Install by Non Github configuration file: db_diseaseenhancer
#> INFO [2018-01-24 19:30:11] Installed successful list: db_diseaseenhancer
#> $fail.list
#> [1] ""
#> 
#> $success.list
#> [1] "db_diseaseenhancer"