infographic

This tutorial is a recommended starting place for learning how to use drake. It is the abridged version of the basic example vignette. See this section of the README for a high-level overview of the available documentation.

Get the code.

Write the code files to your workspace.

drake_example("basic")

The new basic folder now includes a file structure of a serious drake project, plus an interactive-tutorial.R to narrate the example. The code is also online here.

The motivation of the basic example

Is there an association between the weight and the fuel efficiency of cars? To find out, we use the mtcars dataset from the datasets package. The mtcars dataset originally came from the 1974 Motor Trend US magazine, and it contains design and performance data on 32 models of automobile.

# ?mtcars # more info
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Here, wt is weight in tons, and mpg is fuel efficiency in miles per gallon. We want to figure out if there is an association between wt and mpg. The mtcars dataset itself only has 32 rows, so we generate two larger bootstrapped datasets and then analyze them with regression models. We summarize the regression models to see if there is an association.

A taste of the basic example

Your workspace begins with a bunch of imports: functions, pre-loaded data objects, and saved files available before the real work begins.

load_basic_example(verbose = FALSE) # Get the code with drake_example("basic").

# Drake looks for data objects and functions in your R session environment
ls()
##  [1] "AES"               "AESdecryptECB"     "AESencryptECB"    
##  [4] "AESinit"           "attr_sha1"         "avoid_this"       
##  [7] "b"                 "bad_plan"          "cache"            
## [10] "command"           "config"            "cranlogs_plan"    
## [13] "debug_plan"        "digest"            "digest_impl"      
## [16] "envir"             "error"             "example_class"    
## [19] "example_object"    "f"                 "g"                
## [22] "get_logs"          "good_plan"         "hard_plan"        
## [25] "hmac"              "little_b"          "logs"             
## [28] "makeRaw"           "makeRaw.character" "makeRaw.default"  
## [31] "makeRaw.digest"    "makeRaw.raw"       "modes"            
## [34] "my_plan"           "my_variable"       "myplan"           
## [37] "new_objects"       "num2hex"           "padWithZeros"     
## [40] "plan"              "print.AES"         "query"            
## [43] "random_rows"       "reg1"              "reg2"             
## [46] "rules_grid"        "sha1"              "sha1.Date"        
## [49] "sha1.NULL"         "sha1.POSIXct"      "sha1.POSIXlt"     
## [52] "sha1.anova"        "sha1.array"        "sha1.call"        
## [55] "sha1.character"    "sha1.complex"      "sha1.data.frame"  
## [58] "sha1.default"      "sha1.factor"       "sha1.function"    
## [61] "sha1.integer"      "sha1.list"         "sha1.logical"     
## [64] "sha1.matrix"       "sha1.name"         "sha1.numeric"     
## [67] "sha1.pairlist"     "sha1.raw"          "simulate"         
## [70] "timestamp"         "tmp"               "totally_okay"     
## [73] "url"               "x"

# and saved files in your file system.
list.files()
##  [1] "best-practices.R"     "best-practices.Rmd"   "best-practices.html" 
##  [4] "best-practices.md"    "caution.R"            "caution.Rmd"         
##  [7] "caution.html"         "caution.md"           "debug.R"             
## [10] "debug.Rmd"            "debug.html"           "debug.md"            
## [13] "drake.R"              "drake.Rmd"            "example-basic.Rmd"   
## [16] "example-gsp.Rmd"      "example-packages.Rmd" "faq.Rmd"             
## [19] "graph.Rmd"            "parallelism.Rmd"      "report.R"            
## [22] "report.Rmd"           "storage.Rmd"          "timing.Rmd"

Your real work is outlined in a data frame of data analysis steps called “targets”. The targets depend on the imports, and drake will figure out how they are all connected.

my_plan
## # A tibble: 15 x 2
##    target                 command                                         
##    <chr>                  <chr>                                           
##  1 ""                     "knit(knitr_in(\"report.Rmd\"), file_out(\"repo…
##  2 small                  simulate(48)                                    
##  3 large                  simulate(64)                                    
##  4 regression1_small      reg1(small)                                     
##  5 regression1_large      reg1(large)                                     
##  6 regression2_small      reg2(small)                                     
##  7 regression2_large      reg2(large)                                     
##  8 summ_regression1_small suppressWarnings(summary(regression1_small$resi…
##  9 summ_regression1_large suppressWarnings(summary(regression1_large$resi…
## 10 summ_regression2_small suppressWarnings(summary(regression2_small$resi…
## 11 summ_regression2_large suppressWarnings(summary(regression2_large$resi…
## 12 coef_regression1_small suppressWarnings(summary(regression1_small))$co…
## 13 coef_regression1_large suppressWarnings(summary(regression1_large))$co…
## 14 coef_regression2_small suppressWarnings(summary(regression2_small))$co…
## 15 coef_regression2_large suppressWarnings(summary(regression2_large))$co…

Wildcard templating generates these data frames at scale.

library(magrittr)
dataset_plan <- drake_plan(
  small = simulate(5),
  large = simulate(50)
)
dataset_plan
## # A tibble: 2 x 2
##   target command     
##   <chr>  <chr>       
## 1 small  simulate(5) 
## 2 large  simulate(50)

analysis_methods <- drake_plan(
  regression = regNUMBER(dataset__) # nolint
) %>%
  evaluate_plan(wildcard = "NUMBER", values = 1:2)
analysis_methods
## # A tibble: 2 x 2
##   target       command        
##   <chr>        <chr>          
## 1 regression_1 reg1(dataset__)
## 2 regression_2 reg2(dataset__)

analysis_plan <- plan_analyses(
  plan = analysis_methods,
  datasets = dataset_plan
)
analysis_plan
## # A tibble: 4 x 2
##   target             command    
##   <chr>              <chr>      
## 1 regression_1_small reg1(small)
## 2 regression_1_large reg1(large)
## 3 regression_2_small reg2(small)
## 4 regression_2_large reg2(large)

whole_plan <- rbind(dataset_plan, analysis_plan)
whole_plan
## # A tibble: 6 x 2
##   target             command     
##   <chr>              <chr>       
## 1 small              simulate(5) 
## 2 large              simulate(50)
## 3 regression_1_small reg1(small) 
## 4 regression_1_large reg1(large) 
## 5 regression_2_small reg2(small) 
## 6 regression_2_large reg2(large)

For the commands you pass in with the free-form ... argument, drake_plan() uses tidy evaluation. For example, it supports quasiquotation with the !! argument. Use tidy_evaluation = FALSE or the list argument to suppress this behavior.

my_variable <- 5

drake_plan(
  a = !!my_variable,
  b = !!my_variable + 1,
  list = c(d = "!!my_variable")
)
## # A tibble: 3 x 2
##   target command      
##   <chr>  <chr>        
## 1 a      5            
## 2 b      5 + 1        
## 3 d      !!my_variable

drake_plan(
  a = !!my_variable,
  b = !!my_variable + 1,
  list = c(d = "!!my_variable"),
  tidy_evaluation = FALSE
)
## # A tibble: 3 x 2
##   target command            
##   <chr>  <chr>              
## 1 a      !(!my_variable)    
## 2 b      !(!my_variable + 1)
## 3 d      !!my_variable

For instances of !! that remain in the workflow plan, make() will run these commands in tidy fashion, evaluating the !! operator using the environment you provided.

Using static code analysis, drake detects the dependencies of all your targets. The result is an interactive network diagram.

vis_drake_graph(my_plan)

At this point, all your targets are out of date because the project is new.

config <- drake_config(my_plan, verbose = FALSE) # Master configuration list
outdated(config)
##  [1] "\"report.md\""          "coef_regression1_large"
##  [3] "coef_regression1_small" "coef_regression2_large"
##  [5] "coef_regression2_small" "large"                 
##  [7] "regression1_large"      "regression1_small"     
##  [9] "regression2_large"      "regression2_small"     
## [11] "small"                  "summ_regression1_large"
## [13] "summ_regression1_small" "summ_regression2_large"
## [15] "summ_regression2_small"

The make() function traverses the network and builds the targets that require updates.

make(my_plan)
## target large
## target small
## target regression1_large
## target regression1_small
## target regression2_large
## target regression2_small
## target coef_regression1_large
## target coef_regression1_small
## target coef_regression2_large
## target coef_regression2_small
## target summ_regression1_large
## target summ_regression1_small
## target summ_regression2_large
## target summ_regression2_small
## target file "report.md"

For the reg2() model on the small dataset, the p-value on x2 is so small that there may be an association between weight and fuel efficiency after all.

readd(coef_regression2_small)
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 27.504915 1.02496426 26.835000 9.676340e-30
## x2          -0.708536 0.08285938 -8.551066 4.617125e-11

The project is currently up to date, so the next make() does nothing.

make(my_plan)
## Unloading targets from environment:
##   coef_regression2_small
##   large
##   small
## All targets are already up to date.

But a nontrivial change in reg2() triggers updates to all the affected downstream targets.

reg2 <- function(d){
  d$x3 <- d$x ^ 3
  lm(y ~ x3, data = d)
}

make(my_plan)
## target regression2_large
## target regression2_small
## target coef_regression2_large
## target coef_regression2_small
## target summ_regression2_large
## target summ_regression2_small
## target file "report.md"

Built-in example projects

Drake has built-in example projects. You can generate the code files for an example with drake_example(), and you can list the available examples with drake_examples(). For instance, drake_example("gsp") generates the R script and R Markdown report for the built-in econometrics data analysis project. See below for the currently supported examples.

Learn how to use drake.

High-performance computing

Regarding the high-performance computing examples, there is no one-size-fits-all *.tmpl configuration file for any job scheduler, so we cannot guarantee that the above examples will work for you out of the box. To learn how to configure the files to suit your needs, you should make sure you understand how to use your job scheduler and batchtools.