Quickstart

quickstart example for drake

William Michael Landau

2017-11-05

## cache C:/Users/c240390/AppData/Local/Temp/RtmpSQkVhU/Rbuild2db86da6234/drake/...

1 Quick examples

Inspect and run your project.

library(drake)
load_basic_example() # Also (over)writes report.Rmd.
plot_graph(my_plan) # Hover, click, drag, zoom, pan. See args 'from' and 'to'.
make(my_plan) # Run the workflow.
outdated(my_plan) # Check that everything is already up to date.

Debug errors.

failed()                 # Targets that failed in the most recent `make()`
diagnose()               # Targets that failed in any previous `make()`
error <- diagnose(large) # Most recent verbose error log of `large`
str(error)               # Object of class "error"
error$calls              # Call stack / traceback

Dive deeper into the built-in examples.

example_drake("basic") # Write the code files.
examples_drake() # List the other examples.
vignette("quickstart") # This vignette

2 Setting up the basic example

Let’s establish the building blocks of a data analysis workflow.

library(knitr)
library(drake)

First, we will generate a few datasets.

simulate <- function(n){
  data.frame(
    x = stats::rnorm(n),
    y = rpois(n, 1)
  )
}

Then, we will analyze each dataset with multiple analysis methods.

reg1 <- function(d){
  lm(y ~ + x, data = d)
}

reg2 <- function(d){
  d$x2 <- d$x ^ 2
  lm(y ~ x2, data = d)
}

We need the source file report.Rmd for the end.k

lines <- c(
  "---",
  "title: Example Report",
  "author: You",
  "output: html_document",
  "---",
  "",
  "Look how I read outputs from the drake cache.",
  "Drake notices that `small`, `coef_regression2_small`,",
  "and `large` are dependencies of the",
  "future compiled output report file target, `report.md`.",
  "Just be sure that the workflow plan command for the target `'report.md'`",
  "has an explicit call to `knit()`, something like `knit('report.Rmd')` or",
  "`knitr::knit(input = 'report.Rmd', quiet = TRUE)`.",
  "",
  "```{r example_chunk}",
  "library(drake)",
  "readd(small)",
  "readd(coef_regression2_small)",
  "loadd(large)",
  "head(large)",
  "```")
writeLines(lines, "report.Rmd")

3 Workflow plan

The workflow plan lists the intermediate steps of your project.

load_basic_example()
my_plan
##                    target                                      command
## 1             'report.md'             knit('report.Rmd', quiet = TRUE)
## 2                   small                                  simulate(5)
## 3                   large                                 simulate(50)
## 4       regression1_small                                  reg1(small)
## 5       regression1_large                                  reg1(large)
## 6       regression2_small                                  reg2(small)
## 7       regression2_large                                  reg2(large)
## 8  summ_regression1_small suppressWarnings(summary(regression1_small))
## 9  summ_regression1_large suppressWarnings(summary(regression1_large))
## 10 summ_regression2_small suppressWarnings(summary(regression2_small))
## 11 summ_regression2_large suppressWarnings(summary(regression2_large))
## 12 coef_regression1_small              coefficients(regression1_small)
## 13 coef_regression1_large              coefficients(regression1_large)
## 14 coef_regression2_small              coefficients(regression2_small)
## 15 coef_regression2_large              coefficients(regression2_large)

Each row is an intermediate step, and each command generates a target. A target is an output R object (cached when generated) or output file (specified with single quotes), and a command just an ordinary piece of R code (not necessarily a single function call). As input, commands may take objects imported from your workspace, targets generated by other commands, or initial input files. These dependencies give your project an underlying network.

# Hover, click, drag, zoom, and pan.
plot_graph(my_plan, width = "100%", height = "500px")

See also dataframes_graph(), render_graph(), and config() for faster and more customized regraphing.

You can also check the dependencies of individual targets.

deps(reg2)
## [1] "lm"
deps(my_plan$command[1]) # Files like report.Rmd are single-quoted.
## [1] "'report.Rmd'"           "coef_regression2_small"
## [3] "knit"                   "large"                 
## [5] "small"
deps(my_plan$command[nrow(my_plan)])
## [1] "coefficients"      "regression2_large"

List all the reproducibly-tracked objects and files, including imports and targets.

tracked(my_plan, targets = "small")
## Unloading targets from environment:
##   small
##   large
##   coef_regression2_small
## connect 15 imports: f, reg1, reg2, my_plan, myplan, small_plan, simulate, dat...
## connect 15 targets: 'report.md', small, large, regression1_small, regression1...
## [1] "small"        "simulate"     "data.frame"   "rpois"       
## [5] "stats::rnorm"
tracked(my_plan)
## connect 13 imports: f, reg1, reg2, my_plan, myplan, small_plan, simulate, dat...
## connect 15 targets: 'report.md', small, large, regression1_small, regression1...
##  [1] "'report.md'"            "small"                 
##  [3] "large"                  "regression1_small"     
##  [5] "regression1_large"      "regression2_small"     
##  [7] "regression2_large"      "summ_regression1_small"
##  [9] "summ_regression1_large" "summ_regression2_small"
## [11] "summ_regression2_large" "coef_regression1_small"
## [13] "coef_regression1_large" "coef_regression2_small"
## [15] "coef_regression2_large" "reg1"                  
## [17] "reg2"                   "simulate"              
## [19] "'report.Rmd'"           "knit"                  
## [21] "summary"                "suppressWarnings"      
## [23] "coefficients"           "lm"                    
## [25] "data.frame"             "rpois"                 
## [27] "stats::rnorm"

Check for cycles, missing input files, and other pitfalls.

check(my_plan)
## cache C:/Users/c240390/AppData/Local/Temp/RtmpSQkVhU/Rbuild2db86da6234/drake/...
## connect 13 imports: f, reg1, reg2, my_plan, myplan, small_plan, simulate, dat...
## connect 15 targets: 'report.md', small, large, regression1_small, regression1...

4 Generate the workflow plan

The data frame my_plan would be a pain to write by hand, so drake has functions to help you.

my_datasets <- workplan(
  small = simulate(5),
  large = simulate(50))
my_datasets
##   target      command
## 1  small  simulate(5)
## 2  large simulate(50)

For multiple replicates:

expand(my_datasets, values = c("rep1", "rep2"))
##       target      command
## 1 small_rep1  simulate(5)
## 2 small_rep2  simulate(5)
## 3 large_rep1 simulate(50)
## 4 large_rep2 simulate(50)

Each dataset is analyzed multiple ways.

methods <- workplan(
  regression1 = reg1(..dataset..), # nolint
  regression2 = reg2(..dataset..)) # nolint
methods
##        target           command
## 1 regression1 reg1(..dataset..)
## 2 regression2 reg2(..dataset..)

We evaluate the ..dataset.. wildcard.

my_analyses <- analyses(methods, data = my_datasets)
my_analyses
##              target     command
## 1 regression1_small reg1(small)
## 2 regression1_large reg1(large)
## 3 regression2_small reg2(small)
## 4 regression2_large reg2(large)

Next, we summarize each analysis of each dataset using summary statistics and regression coefficients.

summary_types <- workplan(
  summ = suppressWarnings(summary(..analysis..)), # nolint
  coef = coefficients(..analysis..)) # nolint
summary_types
##   target                                 command
## 1   summ suppressWarnings(summary(..analysis..))
## 2   coef              coefficients(..analysis..)
results <- summaries(summary_types, analyses = my_analyses,
  datasets = my_datasets, gather = NULL)
results
##                   target                                      command
## 1 summ_regression1_small suppressWarnings(summary(regression1_small))
## 2 summ_regression1_large suppressWarnings(summary(regression1_large))
## 3 summ_regression2_small suppressWarnings(summary(regression2_small))
## 4 summ_regression2_large suppressWarnings(summary(regression2_large))
## 5 coef_regression1_small              coefficients(regression1_small)
## 6 coef_regression1_large              coefficients(regression1_large)
## 7 coef_regression2_small              coefficients(regression2_small)
## 8 coef_regression2_large              coefficients(regression2_large)

The gather feature groups summaries into a smaller number of more manageable targets. I shut it off here to make the data frames more readable.

For the dynamic report, we have to make sure the files are single-quoted. Single quotes denote file targets and file imports, and double quotes denote literal strings that should not be treated as dependencies where they are mentioned. Also, knit() needs to be somewhere visible in the workflow plan command so that drake knows to dig into the active code chunks of 'report.Rmd' and look for dependencies mentioned in calls to loadd() and readd().

report <- workplan(
  report.md = knit('report.Rmd', quiet = TRUE), # nolint
  file_targets = TRUE, strings_in_dots = "filenames")
report
##        target                          command
## 1 'report.md' knit('report.Rmd', quiet = TRUE)

Finally, gather your workflow together with rbind(). Row order does not matter.

my_plan <- rbind(report, my_datasets, my_analyses, results)
my_plan
##                    target                                      command
## 1             'report.md'             knit('report.Rmd', quiet = TRUE)
## 2                   small                                  simulate(5)
## 3                   large                                 simulate(50)
## 4       regression1_small                                  reg1(small)
## 5       regression1_large                                  reg1(large)
## 6       regression2_small                                  reg2(small)
## 7       regression2_large                                  reg2(large)
## 8  summ_regression1_small suppressWarnings(summary(regression1_small))
## 9  summ_regression1_large suppressWarnings(summary(regression1_large))
## 10 summ_regression2_small suppressWarnings(summary(regression2_small))
## 11 summ_regression2_large suppressWarnings(summary(regression2_large))
## 12 coef_regression1_small              coefficients(regression1_small)
## 13 coef_regression1_large              coefficients(regression1_large)
## 14 coef_regression2_small              coefficients(regression2_small)
## 15 coef_regression2_large              coefficients(regression2_large)

5 Flexible helpers to make workflow plans

If your workflow does not fit the rigid datasets/analyses/summaries framework, check out functions expand(), evaluate(), and gather().

df <- workplan(data = simulate(center = MU, scale = SIGMA))
df
##   target                              command
## 1   data simulate(center = MU, scale = SIGMA)
df <- expand(df, values = c("rep1", "rep2"))
df
##      target                              command
## 1 data_rep1 simulate(center = MU, scale = SIGMA)
## 2 data_rep2 simulate(center = MU, scale = SIGMA)
evaluate(df, wildcard = "MU", values = 1:2)
##        target                             command
## 1 data_rep1_1 simulate(center = 1, scale = SIGMA)
## 2 data_rep1_2 simulate(center = 2, scale = SIGMA)
## 3 data_rep2_1 simulate(center = 1, scale = SIGMA)
## 4 data_rep2_2 simulate(center = 2, scale = SIGMA)
evaluate(df, wildcard = "MU", values = 1:2, expand = FALSE)
##      target                             command
## 1 data_rep1 simulate(center = 1, scale = SIGMA)
## 2 data_rep2 simulate(center = 2, scale = SIGMA)
evaluate(df, rules = list(MU = 1:2, SIGMA = c(0.1, 1)), expand = FALSE)
##      target                           command
## 1 data_rep1 simulate(center = 1, scale = 0.1)
## 2 data_rep2   simulate(center = 2, scale = 1)
evaluate(df, rules = list(MU = 1:2, SIGMA = c(0.1, 1, 10)))
##             target                           command
## 1  data_rep1_1_0.1 simulate(center = 1, scale = 0.1)
## 2    data_rep1_1_1   simulate(center = 1, scale = 1)
## 3   data_rep1_1_10  simulate(center = 1, scale = 10)
## 4  data_rep1_2_0.1 simulate(center = 2, scale = 0.1)
## 5    data_rep1_2_1   simulate(center = 2, scale = 1)
## 6   data_rep1_2_10  simulate(center = 2, scale = 10)
## 7  data_rep2_1_0.1 simulate(center = 1, scale = 0.1)
## 8    data_rep2_1_1   simulate(center = 1, scale = 1)
## 9   data_rep2_1_10  simulate(center = 1, scale = 10)
## 10 data_rep2_2_0.1 simulate(center = 2, scale = 0.1)
## 11   data_rep2_2_1   simulate(center = 2, scale = 1)
## 12  data_rep2_2_10  simulate(center = 2, scale = 10)
gather(df)
##   target                                            command
## 1 target list(data_rep1 = data_rep1, data_rep2 = data_rep2)
gather(df, target = "my_summaries", gather = "rbind")
##         target                                             command
## 1 my_summaries rbind(data_rep1 = data_rep1, data_rep2 = data_rep2)

6 Run the workflow

You may want to check for outdated or missing targets/imports first.

outdated(my_plan, verbose = FALSE) # Targets that need to be (re)built.
##  [1] "'report.md'"            "coef_regression1_large"
##  [3] "coef_regression1_small" "coef_regression2_large"
##  [5] "coef_regression2_small" "large"                 
##  [7] "regression1_large"      "regression1_small"     
##  [9] "regression2_large"      "regression2_small"     
## [11] "small"                  "summ_regression1_large"
## [13] "summ_regression1_small" "summ_regression2_large"
## [15] "summ_regression2_small"
missed(my_plan, verbose = FALSE) # Checks your workspace.
## character(0)

Then just make(my_plan).

make(my_plan)
## cache C:/Users/c240390/AppData/Local/Temp/RtmpSQkVhU/Rbuild2db86da6234/drake/...
## connect 20 imports: f, reg1, df, reg2, my_plan, myplan, results, small_plan, ...
## connect 15 targets: 'report.md', small, large, regression1_small, regression1...
## check 9 items: 'report.Rmd', knit, summary, suppressWarnings, coefficients, l...
## import 'report.Rmd'
## import knit
## import summary
## import suppressWarnings
## import coefficients
## import lm
## import data.frame
## import rpois
## import stats::rnorm
## check 3 items: reg1, reg2, simulate
## import reg1
## import reg2
## import simulate
## check 2 items: small, large
## target small
## target large
## check 4 items: regression1_small, regression1_large, regression2_small, regre...
## target regression1_small
## target regression1_large
## target regression2_small
## target regression2_large
## check 8 items: summ_regression1_small, summ_regression1_large, summ_regressio...
## target summ_regression1_small
## target summ_regression1_large
## target summ_regression2_small
## target summ_regression2_large
## target coef_regression1_small
## target coef_regression1_large
## target coef_regression2_small
## target coef_regression2_large
## check 1 item: 'report.md'
## unload 11 items: regression1_small, regression1_large, regression2_small, reg...
## target 'report.md'

The non-file dependencies of your last target are already loaded in your workspace.

ls()
##  [1] "coef_regression2_small" "command"               
##  [3] "datasets"               "df"                    
##  [5] "envir"                  "f"                     
##  [7] "files"                  "large"                 
##  [9] "lines"                  "methods"               
## [11] "my_analyses"            "my_datasets"           
## [13] "my_plan"                "myplan"                
## [15] "reg1"                   "reg2"                  
## [17] "report"                 "results"               
## [19] "rules"                  "simulate"              
## [21] "small"                  "small_plan"            
## [23] "summary_types"
outdated(my_plan, verbose = FALSE) # Everything is up to date.
## character(0)
build_times(digits = 4) # How long did it take to make each target?
## cache C:/Users/c240390/AppData/Local/Temp/RtmpSQkVhU/Rbuild2db86da6234/drake/...
##                      item   type elapsed  user system
## 1            'report.Rmd' import      0s    0s     0s
## 2             'report.md' target   0.09s 0.03s  0.06s
## 3  coef_regression1_large target   0.02s    0s  0.01s
## 4  coef_regression1_small target      0s    0s     0s
## 5  coef_regression2_large target      0s    0s     0s
## 6  coef_regression2_small target   0.01s    0s  0.02s
## 7            coefficients import      0s    0s     0s
## 8              data.frame import   0.02s    0s  0.01s
## 9                    knit import      0s    0s     0s
## 10                  large target   0.02s    0s  0.01s
## 11                     lm import   0.01s    0s  0.01s
## 12                   reg1 import      0s    0s     0s
## 13                   reg2 import      0s    0s     0s
## 14      regression1_large target   0.02s 0.02s     0s
## 15      regression1_small target   0.01s    0s  0.01s
## 16      regression2_large target   0.02s 0.01s     0s
## 17      regression2_small target      0s    0s     0s
## 18                  rpois import      0s    0s     0s
## 19               simulate import   0.02s    0s  0.02s
## 20                  small target   0.04s 0.03s     0s
## 21           stats::rnorm import   0.02s    0s  0.02s
## 22 summ_regression1_large target      0s    0s     0s
## 23 summ_regression1_small target   0.02s 0.02s     0s
## 24 summ_regression2_large target   0.02s 0.01s     0s
## 25 summ_regression2_small target   0.01s    0s  0.02s
## 26                summary import   0.02s    0s  0.02s
## 27       suppressWarnings import      0s    0s     0s

See also predict_runtime() and rate_limiting_times().

In the new graph, the red nodes from before are now green.

# Hover, click, drag, zoom, and pan.
plot_graph(my_plan, width = "100%", height = "500px")

Optionally, get visNetwork nodes and edges so you can make your own plot with visNetwork or render_graph().

dataframes_graph(my_plan)

Use readd() and loadd() to load more targets. (They are cached in the hidden .drake/ folder using storr). Other functions interact and view the cache.

readd(coef_regression2_large)
## cache C:/Users/c240390/AppData/Local/Temp/RtmpSQkVhU/Rbuild2db86da6234/drake/...
## (Intercept)          x2 
##   1.1378831  -0.0494711
loadd(small)
## cache C:/Users/c240390/AppData/Local/Temp/RtmpSQkVhU/Rbuild2db86da6234/drake/...
head(small)
##            x y
## 1  1.2557721 1
## 2  1.4661077 1
## 3  1.4598818 3
## 4  0.9595286 0
## 5 -0.1949191 0
rm(small)
cached(small, large)
## cache C:/Users/c240390/AppData/Local/Temp/RtmpSQkVhU/Rbuild2db86da6234/drake/...
## small large 
##  TRUE  TRUE
cached()
## cache C:/Users/c240390/AppData/Local/Temp/RtmpSQkVhU/Rbuild2db86da6234/drake/...
##  [1] "'report.Rmd'"           "'report.md'"           
##  [3] "coef_regression1_large" "coef_regression1_small"
##  [5] "coef_regression2_large" "coef_regression2_small"
##  [7] "coefficients"           "data.frame"            
##  [9] "knit"                   "large"                 
## [11] "lm"                     "reg1"                  
## [13] "reg2"                   "regression1_large"     
## [15] "regression1_small"      "regression2_large"     
## [17] "regression2_small"      "rpois"                 
## [19] "simulate"               "small"                 
## [21] "stats::rnorm"           "summ_regression1_large"
## [23] "summ_regression1_small" "summ_regression2_large"
## [25] "summ_regression2_small" "summary"               
## [27] "suppressWarnings"
built()
## cache C:/Users/c240390/AppData/Local/Temp/RtmpSQkVhU/Rbuild2db86da6234/drake/...
##  [1] "'report.md'"            "coef_regression1_large"
##  [3] "coef_regression1_small" "coef_regression2_large"
##  [5] "coef_regression2_small" "large"                 
##  [7] "regression1_large"      "regression1_small"     
##  [9] "regression2_large"      "regression2_small"     
## [11] "small"                  "summ_regression1_large"
## [13] "summ_regression1_small" "summ_regression2_large"
## [15] "summ_regression2_small"
imported()
## cache C:/Users/c240390/AppData/Local/Temp/RtmpSQkVhU/Rbuild2db86da6234/drake/...
##  [1] "'report.Rmd'"     "coefficients"     "data.frame"      
##  [4] "knit"             "lm"               "reg1"            
##  [7] "reg2"             "rpois"            "simulate"        
## [10] "stats::rnorm"     "summary"          "suppressWarnings"
head(read_plan())
## cache C:/Users/c240390/AppData/Local/Temp/RtmpSQkVhU/Rbuild2db86da6234/drake/...
##              target                          command
## 1       'report.md' knit('report.Rmd', quiet = TRUE)
## 2             small                      simulate(5)
## 3             large                     simulate(50)
## 4 regression1_small                      reg1(small)
## 5 regression1_large                      reg1(large)
## 6 regression2_small                      reg2(small)
head(progress()) # See also in_progress()
## cache C:/Users/c240390/AppData/Local/Temp/RtmpSQkVhU/Rbuild2db86da6234/drake/...
##           'report.Rmd'            'report.md' coef_regression1_large 
##             "finished"             "finished"             "finished" 
## coef_regression1_small coef_regression2_large coef_regression2_small 
##             "finished"             "finished"             "finished"
progress(large)
## cache C:/Users/c240390/AppData/Local/Temp/RtmpSQkVhU/Rbuild2db86da6234/drake/...
##      large 
## "finished"
session() # of the last call to make()
## cache C:/Users/c240390/AppData/Local/Temp/RtmpSQkVhU/Rbuild2db86da6234/drake/...
## R version 3.4.1 (2017-06-30)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 7 x64 (build 7601) Service Pack 1
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=C                          
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] future_1.6.2 knitr_1.17   magrittr_1.5 drake_4.4.0 
## 
## loaded via a namespace (and not attached):
##  [1] igraph_1.1.2      Rcpp_0.12.12      R6_2.2.2         
##  [4] storr_1.1.2       plyr_1.8.4        stringr_1.2.0    
##  [7] visNetwork_2.0.1  globals_0.10.3    tools_3.4.1      
## [10] parallel_3.4.1    R.oo_1.21.0       eply_0.1.0       
## [13] withr_2.0.0       htmltools_0.3.6   yaml_2.1.14      
## [16] rprojroot_1.2     digest_0.6.12     crayon_1.3.4     
## [19] htmlwidgets_0.9   R.utils_2.5.0     codetools_0.2-15 
## [22] testthat_1.0.2    evaluate_0.10.1   rmarkdown_1.6    
## [25] stringi_1.1.5     compiler_3.4.1    backports_1.1.0  
## [28] R.methodsS3_1.7.1 jsonlite_1.5      lubridate_1.6.0  
## [31] listenv_0.6.0     pkgconfig_2.0.1

The next time you run make(my_plan), nothing will be built because drake knows everything is up to date.

make(my_plan)
## cache C:/Users/c240390/AppData/Local/Temp/RtmpSQkVhU/Rbuild2db86da6234/drake/...
## connect 20 imports: summary_types, files, command, simulate, small_plan, reg1...
## connect 15 targets: 'report.md', small, large, regression1_small, regression1...
## check 9 items: 'report.Rmd', knit, summary, suppressWarnings, coefficients, d...
## import 'report.Rmd'
## import knit
## import summary
## import suppressWarnings
## import coefficients
## import data.frame
## import rpois
## import stats::rnorm
## import lm
## check 3 items: simulate, reg1, reg2
## import simulate
## import reg1
## import reg2
## check 2 items: small, large
## check 4 items: regression1_small, regression1_large, regression2_small, regre...
## check 8 items: summ_regression1_small, summ_regression1_large, summ_regressio...
## check 1 item: 'report.md'
## All targets are already up to date.

But if you change one of your functions, commands, or other dependencies, drake will update the affected parts of the workflow. Let’s say we want to change the quadratic term to a cubic term in our reg2() function.

reg2 <- function(d) {
  d$x3 <- d$x ^ 3
  lm(y ~ x3, data = d)
}

The targets depending on reg2() need to be rebuilt and everything else is left alone.

outdated(my_plan, verbose = FALSE)
## [1] "'report.md'"            "coef_regression2_large"
## [3] "coef_regression2_small" "regression2_large"     
## [5] "regression2_small"      "summ_regression2_large"
## [7] "summ_regression2_small"
# Hover, click, drag, zoom, and pan.
plot_graph(my_plan, width = "100%", height = "500px")
make(my_plan)
## cache C:/Users/c240390/AppData/Local/Temp/RtmpSQkVhU/Rbuild2db86da6234/drake/...
## connect 20 imports: summary_types, files, command, simulate, small_plan, reg1...
## connect 15 targets: 'report.md', small, large, regression1_small, regression1...
## check 9 items: 'report.Rmd', knit, summary, suppressWarnings, coefficients, d...
## import 'report.Rmd'
## import knit
## import summary
## import suppressWarnings
## import coefficients
## import data.frame
## import rpois
## import stats::rnorm
## import lm
## check 3 items: simulate, reg1, reg2
## import simulate
## import reg1
## import reg2
## check 2 items: small, large
## check 4 items: regression1_small, regression1_large, regression2_small, regre...
## load 2 items: large, small
## target regression2_small
## target regression2_large
## check 8 items: summ_regression1_small, summ_regression1_large, summ_regressio...
## target summ_regression2_small
## target summ_regression2_large
## target coef_regression2_small
## target coef_regression2_large
## check 1 item: 'report.md'
## unload 5 items: regression2_small, regression2_large, summ_regression2_small,...
## target 'report.md'

But trivial changes to whitespace and comments are totally ignored in your functions and in my_plan$command.

reg2 <- function(d) {
  d$x3 <- d$x ^ 3
    lm(y ~ x3, data = d) # I indented here.
}
outdated(my_plan, verbose = FALSE) # Everything is up to date.
## character(0)

Need to add new work on the fly? Just append rows to the workflow plan. If the rest of your workflow is up to date, only the new work is run.

new_simulation <- function(n){
  data.frame(x = rnorm(n), y = rnorm(n))
}

additions <- workplan(
  new_data = new_simulation(36) + sqrt(10))
additions
##     target                       command
## 1 new_data new_simulation(36) + sqrt(10)
my_plan <- rbind(my_plan, additions)
my_plan
##                    target                                      command
## 1             'report.md'             knit('report.Rmd', quiet = TRUE)
## 2                   small                                  simulate(5)
## 3                   large                                 simulate(50)
## 4       regression1_small                                  reg1(small)
## 5       regression1_large                                  reg1(large)
## 6       regression2_small                                  reg2(small)
## 7       regression2_large                                  reg2(large)
## 8  summ_regression1_small suppressWarnings(summary(regression1_small))
## 9  summ_regression1_large suppressWarnings(summary(regression1_large))
## 10 summ_regression2_small suppressWarnings(summary(regression2_small))
## 11 summ_regression2_large suppressWarnings(summary(regression2_large))
## 12 coef_regression1_small              coefficients(regression1_small)
## 13 coef_regression1_large              coefficients(regression1_large)
## 14 coef_regression2_small              coefficients(regression2_small)
## 15 coef_regression2_large              coefficients(regression2_large)
## 16               new_data                new_simulation(36) + sqrt(10)
make(my_plan)
## cache C:/Users/c240390/AppData/Local/Temp/RtmpSQkVhU/Rbuild2db86da6234/drake/...
## connect 22 imports: summary_types, new_simulation, files, command, simulate, ...
## connect 16 targets: 'report.md', small, large, regression1_small, regression1...
## check 11 items: 'report.Rmd', knit, summary, suppressWarnings, coefficients, ...
## import 'report.Rmd'
## import knit
## import summary
## import suppressWarnings
## import coefficients
## import sqrt
## import data.frame
## import rnorm
## import rpois
## import stats::rnorm
## import lm
## check 4 items: new_simulation, simulate, reg1, reg2
## import new_simulation
## import simulate
## import reg1
## import reg2
## check 3 items: small, large, new_data
## target new_data
## check 4 items: regression1_small, regression1_large, regression2_small, regre...
## check 8 items: summ_regression1_small, summ_regression1_large, summ_regressio...
## check 1 item: 'report.md'

If you ever need to erase your work, use clean(). Any targets removed from the cache will have to be rebuilt on the next call to make(), so be careful.

clean(small, reg1) # uncaches individual targets and imported objects
## cache C:/Users/c240390/AppData/Local/Temp/RtmpSQkVhU/Rbuild2db86da6234/drake/...
clean() # cleans all targets out of the cache
## cache C:/Users/c240390/AppData/Local/Temp/RtmpSQkVhU/Rbuild2db86da6234/drake/...
clean(destroy = TRUE) # removes the cache entirely
## cache C:/Users/c240390/AppData/Local/Temp/RtmpSQkVhU/Rbuild2db86da6234/drake/...

7 Automatic watching for changed dependencies

As you have seen with reg2(), drake reacts to changes. In other words, make() notices when your dependencies are different from last time, rebuilds any affected targets, and continues downstream. In particular, drake watches for nontrivial changes to

  1. Other imported functions, whether user-defined or from packages.
  2. For imported functions from your environment, any nested functions also in your environment or from packages.
  3. Commands in your workflow plan data frame.
  4. Global variables mentioned in the commands or imported functions.
  5. Upstream targets.
  6. For dynamic knitr reports (with knit('your_report.Rmd') as a command in your workflow plan data frame), targets and imports mentioned in calls to readd() and loadd() in the code chunks to be evaluated. Drake treats these targets and imports as dependencies of the compiled output target (say, ‘report.md’).

To explore the dependencies, please refer to the deps() and tracked() functions. Better yet, generate interactive graphs with plot_graph() as shown above. Hover over the nodes in the graph to see the content that drake watches.

There is more to reproducibility than just using drake to watch for dependencies. Packrat creates a tightly-controlled local library of packages to extend the shelf life of your project. And with Docker, you can execute your project on a virtual machine to ensure platform independence. Together, packrat and Docker can help others reproduce your work even if they have different software and hardware.

8 Need more speed?

Drake has extensive high-performance computing support, from local multicore computing on your laptop to serious supercomputing across multiple nodes of a large cluster. See the parallelism vignette for the full details.