This vignette is a guide to debugging and testing drake projects. Please also see the “caution” vignette, which addresses drake's known edge cases, pitfalls, and weaknesses that may or may not be fixed in future releases. For the most up-to-date information on unhandled edge cases, please visit the issue tracker, where you can submit your own bug reports as well. Be sure to search the closed issues too, especially if you are not using the most up-to-date development version.

The configuration list

Most of drake's functions rely on a central config list. An understanding of config will help you grasp the internals. make() and drake_config() both return the config list. Unlike make(), drake_config()'s return value is visible, and its only purpose is to construct your config.

load_basic_example() # Get the code with drake_example("basic").
config <- drake_config(my_plan)

sort(names(config))
##  [1] "args"               "cache"              "cache_log_file"    
##  [4] "cache_path"         "caching"            "command"           
##  [7] "cpu"                "elapsed"            "envir"             
## [10] "evaluator"          "fetch_cache"        "graph"             
## [13] "hook"               "imports_only"       "jobs"              
## [16] "keep_going"         "lazy_load"          "log_progress"      
## [19] "long_hash_algo"     "parallelism"        "plan"              
## [22] "prepend"            "prework"            "recipe_command"    
## [25] "retries"            "seed"               "session"           
## [28] "session_info"       "short_hash_algo"    "skip_imports"      
## [31] "skip_safety_checks" "targets"            "timeout"           
## [34] "trigger"            "verbose"

The fields of config mostly arguments to make() and are documented there. The rest of the fields are as follows.

Early in make(), the config list is stored in the cache. You can retrieve it with

read_drake_config()

and you can access parts of it with some companion functions.

read_drake_graph()
read_drake_plan()

Plan your work.

Workflow plan data frames

The workflow plan data frame is your responsibility, and it takes effort and care. Fortunately, functions in drake can help. You can check the plan for formatting issues, missing input files, etc. with the check_plan() function.

load_basic_example() # Get the code with drake_example("basic").
my_plan
## # A tibble: 15 x 2
##    target                 command                                         
##    <chr>                  <chr>                                           
##  1 ""                     "knit(knitr_in(\"report.Rmd\"), file_out(\"repo…
##  2 small                  simulate(48)                                    
##  3 large                  simulate(64)                                    
##  4 regression1_small      reg1(small)                                     
##  5 regression1_large      reg1(large)                                     
##  6 regression2_small      reg2(small)                                     
##  7 regression2_large      reg2(large)                                     
##  8 summ_regression1_small suppressWarnings(summary(regression1_small$resi…
##  9 summ_regression1_large suppressWarnings(summary(regression1_large$resi…
## 10 summ_regression2_small suppressWarnings(summary(regression2_small$resi…
## 11 summ_regression2_large suppressWarnings(summary(regression2_large$resi…
## 12 coef_regression1_small suppressWarnings(summary(regression1_small))$co…
## 13 coef_regression1_large suppressWarnings(summary(regression1_large))$co…
## 14 coef_regression2_small suppressWarnings(summary(regression2_small))$co…
## 15 coef_regression2_large suppressWarnings(summary(regression2_large))$co…

check_plan(my_plan) # No issues.

Visualize your workflow.

After quality-checking your plan, you should check that you understand how the steps of your workflow are interconnected. The web of dependencies affects which targets are built and which ones are skipped during make().

# Hover, click, drag, zoom, and pan. See args 'from' and 'to'.
config <- drake_config(my_plan)
vis_drake_graph(config, width = "100%", height = "500px")

See the rendered graph vignette to learn more about how graphing can help (for example, how to visualize small subgraphs). If you want to take control of your own visNetwork graph, use the dataframes_graph() function to get data frames of nodes, edges, and legend nodes.

Check dependency relationships.

Programmatically, several functions can help you check immediate dependencies.

deps(reg2)
## [1] "lm"

# knitr_in() makes sure your target depends on `report.Rmd`
# and any dependencies loaded with loadd() and readd()
# in the report's active code chunks.
deps(my_plan$command[1])
## [1] "\"report.Rmd\""         "\"report.md\""         
## [3] "coef_regression2_small" "knit"                  
## [5] "large"                  "small"

deps(my_plan$command[nrow(my_plan)])
## [1] "regression2_large" "summary"           "suppressWarnings"

Drake takes special precautions so that a target/import does not depend on itself. For example, deps(f) might return "f" if f() is a recursive function, but make() just ignores this conflict and runs as expected. In other words, make() automatically removes all self-referential loops in the dependency network.

List all the reproducibly-tracked objects and files, including imports and targets.

tracked(my_plan, targets = "small")
## [1] "nrow"        "sample.int"  "data.frame"  "mtcars"      "random_rows"
## [6] "small"       "simulate"

tracked(my_plan)
##  [1] "nrow"                   "sample.int"            
##  [3] "data.frame"             "mtcars"                
##  [5] "random_rows"            "\"report.Rmd\""        
##  [7] "lm"                     "coef_regression2_small"
##  [9] "knit"                   "large"                 
## [11] "small"                  "simulate"              
## [13] "reg1"                   "reg2"                  
## [15] "regression1_small"      "summary"               
## [17] "suppressWarnings"       "regression1_large"     
## [19] "regression2_small"      "regression2_large"     
## [21] "\"report.md\""          "summ_regression1_small"
## [23] "summ_regression1_large" "summ_regression2_small"
## [25] "summ_regression2_large" "coef_regression1_small"
## [27] "coef_regression1_large" "coef_regression2_large"

Outdated, up to date, and missing items

missed() reports import dependencies missing from your environment

config <- drake_config(my_plan, verbose = FALSE)
missed(config) # Nothing is missing right now.
## character(0)

outdated() reports any targets that are outdated, plus any downstream targets that depend on them.

outdated(config)
##  [1] "\"report.md\""          "coef_regression1_large"
##  [3] "coef_regression1_small" "coef_regression2_large"
##  [5] "coef_regression2_small" "large"                 
##  [7] "regression1_large"      "regression1_small"     
##  [9] "regression2_large"      "regression2_small"     
## [11] "small"                  "summ_regression1_large"
## [13] "summ_regression1_small" "summ_regression2_large"
## [15] "summ_regression2_small"

To find out why a target is out of date, you can load the storr-based cache and compare the appropriate hash keys to the output of dependency_profile(). To use dependency_profile(), be sure to supply the master configuration list as the config argument. The same is true for drake_meta(), another alternative.

load_basic_example() # Get the code with drake_example("basic").
config <- make(my_plan, verbose = FALSE)
# Change a dependency.
reg2 <- function(d) {
  d$x3 <- d$x ^ 3
  lm(y ~ x3, data = d)
}
outdated(config)
## [1] "\"report.md\""          "coef_regression2_large"
## [3] "coef_regression2_small" "regression2_large"     
## [5] "regression2_small"      "summ_regression2_large"
## [7] "summ_regression2_small"

dependency_profile(target = "regression2_small", config = config)
## $cached_command
## [1] "{\n reg2(small) \n}"
## 
## $current_command
## [1] "{\n reg2(small) \n}"
## 
## $cached_file_modification_time
## NULL
## 
## $cached_dependency_hash
## [1] "bff91683ab896912a57d3010b489e68a50294f7c46dc5c8bc80797a3a616194b"
## 
## $current_dependency_hash
## [1] "8685dbd7c688d9ceca90b9c7cdde2e151e39a4882af35b18c9697a04c24e9d63"
## 
## $hashes_of_dependencies
##               reg2              small 
## "d47109544c89ca7a" "40fb781de184c741"

drake_meta(target = "regression2_small", config = config)
## $target
## [1] "regression2_small"
## 
## $imported
## [1] FALSE
## 
## $foreign
## [1] TRUE
## 
## $missing
## [1] FALSE
## 
## $seed
## [1] 1034257256
## 
## $command
## [1] "{\n reg2(small) \n}"
## 
## $depends
## [1] "8685dbd7c688d9ceca90b9c7cdde2e151e39a4882af35b18c9697a04c24e9d63"
## 
## $file
## [1] NA

config$cache$get_hash(key = "small", namespace = "kernels") # same
## [1] "40fb781de184c741"

config$cache$get_hash(key = "small") # same
## [1] "40fb781de184c741"

config$cache$get_hash(key = "reg2", namespace = "kernels") # same
## [1] "d47109544c89ca7a"

config$cache$get_hash(key = "reg2") # different
## [1] "89c33700774643ff"

In drake, the “kernel” of a target or import is the piece of the output that is reproducibly tracked. For ordinary R objects, the kernel is just the object itself. For custom external files, it is a separate hash. But for functions, the kernel is the deparsed body of the function, together with the dependency hash if the function is imported (see drake:::store_function()).

The internal functions drake:::meta() and drake:::meta_list() compute the metadata on each target that drake uses to decide which targets to build and which to skip (via drake:::should_build_target()). Then, after the target/import is processed, drake:::finish_meta() updates the metadata (except for the $missing element) before it is cached. See diagnose() to read available metadata, along with any errors, warnings, and messages generated during the build.

str(diagnose(small))
## List of 11
##  $ target      : chr "small"
##  $ imported    : logi FALSE
##  $ foreign     : logi TRUE
##  $ missing     : logi TRUE
##  $ seed        : num 1.95e+09
##  $ command     : chr "{\n simulate(48) \n}"
##  $ depends     : chr "5aa9da170b33b159cfd382e15229d7efe6d6a5777d1c69e71b0c6a1188ee5116"
##  $ file        : chr NA
##  $ start       :Class 'proc_time'  Named num [1:5] 13.905 0.22 14.745 0.024 0.018
##   .. ..- attr(*, "names")= chr [1:5] "user.self" "sys.self" "elapsed" "user.child" ...
##  $ time_command:'data.frame':    1 obs. of  5 variables:
##   ..$ item   : chr "small"
##   ..$ type   : chr "target"
##   ..$ elapsed: num 0
##   ..$ user   : num 0.001
##   ..$ system : num 0
##  $ time_build  :'data.frame':    1 obs. of  5 variables:
##   ..$ item   : chr "small"
##   ..$ type   : chr "target"
##   ..$ elapsed: num 0.003
##   ..$ user   : num 0.003
##   ..$ system : num 0

str(diagnose("\"report.md\""))
## List of 12
##  $ target      : chr "\"report.md\""
##  $ imported    : logi FALSE
##  $ foreign     : logi TRUE
##  $ missing     : logi TRUE
##  $ seed        : num 1.85e+09
##  $ command     : chr "{\n knit(knitr_in(\"report.Rmd\"), file_out(\"report.md\"), quiet = TRUE) \n}"
##  $ depends     : chr "5f72f8f2a06b7a515e17a3ea3f57b1dd3522d7143e0cbd0650a3c0985683f81d"
##  $ file        : chr "ed35f108e2ccd75a904273f1e8559d5a0acb9c2700531276a7acdcfba09decc6"
##  $ start       :Class 'proc_time'  Named num [1:5] 14.001 0.224 14.845 0.024 0.018
##   .. ..- attr(*, "names")= chr [1:5] "user.self" "sys.self" "elapsed" "user.child" ...
##  $ time_command:'data.frame':    1 obs. of  5 variables:
##   ..$ item   : chr "\"report.md\""
##   ..$ type   : chr "target"
##   ..$ elapsed: num 0.032
##   ..$ user   : num 0.031
##   ..$ system : num 0
##  $ mtime       : POSIXct[1:1], format: "2018-04-09 23:50:07"
##  $ time_build  :'data.frame':    1 obs. of  5 variables:
##   ..$ item   : chr "\"report.md\""
##   ..$ type   : chr "target"
##   ..$ elapsed: num 0.034
##   ..$ user   : num 0.035
##   ..$ system : num 0

If your target's last build succeeded, then diagnose(your_target) has the most current information from that build. But if your target failed, then only diagnose(your_target)$error, diagnose(your_target)$warnings, and diagnose(your_target)$messages correspond to the failure, and all the other metadata correspond to the last build that completed without an error.

Test with triggers.

To track dependencies and make decisions about what needs building, make() store the fingerprint, or hash, of each target. Hashing is great for detecting the right changes in targets, but if all you want to do is test and debug a workflow, the full rigor can be time-consuming.

Fortunately, you can change the triggers that tell drake when to (re)build each target. Below, drake disregards outdatedness and just builds the targets that are missing.

clean(verbose = FALSE) # Start from scratch
config <- make(my_plan, trigger = "missing")
## Unloading targets from environment:
##   coef_regression2_small
##   large
##   small
## target large: trigger "missing"
## target small: trigger "missing"
## target regression1_large: trigger "missing"
## target regression1_small: trigger "missing"
## target regression2_large: trigger "missing"
## target regression2_small: trigger "missing"
## target coef_regression1_large: trigger "missing"
## target coef_regression1_small: trigger "missing"
## target coef_regression2_large: trigger "missing"
## target coef_regression2_small: trigger "missing"
## target summ_regression1_large: trigger "missing"
## target summ_regression1_small: trigger "missing"
## target summ_regression2_large: trigger "missing"
## target summ_regression2_small: trigger "missing"
## target file "report.md": trigger "missing"
## Used non-default triggers. Some targets may not be up to date.

You can choose from any of the following triggers for all targets or for each target individually.

To select triggers for individual targets, create an optional trigger column in the workflow plan data frame. Entries in this column override the trigger argument to make()

my_plan$trigger <- "command"
my_plan$trigger[1] <- "file"
my_plan
## # A tibble: 15 x 3
##    target                 command                                  trigger
##    <chr>                  <chr>                                    <chr>  
##  1 ""                     "knit(knitr_in(\"report.Rmd\"), file_ou… file   
##  2 small                  simulate(48)                             command
##  3 large                  simulate(64)                             command
##  4 regression1_small      reg1(small)                              command
##  5 regression1_large      reg1(large)                              command
##  6 regression2_small      reg2(small)                              command
##  7 regression2_large      reg2(large)                              command
##  8 summ_regression1_small suppressWarnings(summary(regression1_sm… command
##  9 summ_regression1_large suppressWarnings(summary(regression1_la… command
## 10 summ_regression2_small suppressWarnings(summary(regression2_sm… command
## 11 summ_regression2_large suppressWarnings(summary(regression2_la… command
## 12 coef_regression1_small suppressWarnings(summary(regression1_sm… command
## 13 coef_regression1_large suppressWarnings(summary(regression1_la… command
## 14 coef_regression2_small suppressWarnings(summary(regression2_sm… command
## 15 coef_regression2_large suppressWarnings(summary(regression2_la… command

# Change an imported dependency:
reg2
## function(d) {
##   d$x3 <- d$x ^ 3
##   lm(y ~ x3, data = d)
## }

reg2 <- function(d) {
  d$x3 <- d$x ^ 3
  lm(y ~ x3, data = d)
}
make(my_plan, trigger = "any") # Nothing changes!
## Unloading targets from environment:
##   coef_regression2_small
##   large
##   small
## Used non-default triggers. Some targets may not be up to date.

The outdated() function responds to triggers. For example, even if outdated(my_plan) shows all targets up to date, outdated(my_plan, trigger = "always") will claim that all the targets are outdated.

Skipping imports

Similar to triggers, you can also to skip the processing of imported objects and files. However, you should only use this for testing purposes. If some of your imports are not already cached and up to date, any built targets will be out of sync. In other words, outdated() is more likely to be wrong, and your project may no longer be reproducible.

clean(verbose = FALSE)
my_plan$trigger <- NULL

make(my_plan, skip_imports = TRUE)
## target large
## target small
## target regression1_large
## target regression1_small
## target regression2_large
## target regression2_small
## target coef_regression1_large
## target coef_regression1_small
## target coef_regression2_large
## target coef_regression2_small
## target summ_regression1_large
## target summ_regression1_small
## target summ_regression2_large
## target summ_regression2_small
## target file "report.md"
## Skipped the imports. If some imports are not already cached, targets could be out of date.

Impose timeouts and retries

See the timeout, cpu, elapsed, and retries argument to make().

clean(verbose = FALSE)
f <- function(...){
  Sys.sleep(1)
}
debug_plan <- drake_plan(x = 1, y = f(x))
debug_plan
## # A tibble: 2 x 2
##   target command
##   <chr>  <chr>  
## 1 x      1      
## 2 y      f(x)

withr::with_message_sink(
  stdout(),
  make(debug_plan, timeout = 1e-3, retries = 2)
)
## Unloading targets from environment:
##   x
## target x
## target y
## retry y: 1 of 2
## retry y: 2 of 2
## [2018-04-09 23:50:14] TimeoutException: reached CPU time limit [cpu=0.001s,
## elapsed=0.001s]
## Warning: No message sink to remove.

To tailor these settings to each individual target, create new timeout, cpu, elapsed, or retries columns in your workflow plan. These columns override the analogous arguments to make().

clean(verbose = FALSE)
debug_plan$timeout <- c(1e-3, 2e-3)
debug_plan$retries <- 1:2

debug_plan
## # A tibble: 2 x 4
##   target command timeout retries
##   <chr>  <chr>     <dbl>   <int>
## 1 x      1       0.00100       1
## 2 y      f(x)    0.00200       2

withr::with_message_sink(
  new = stdout(),
  make(debug_plan, timeout = Inf, retries = 0)
)
## Unloading targets from environment:
##   x
## target x
## target y
## fail y
## Error: Target `y`` failed. Call `diagnose(y)` for details. Error message:
##   reached elapsed time limit
## Warning: No message sink to remove.

Diagnose failures.

Drake records diagnostic metadata on all your targets, including the latest errors, warnings, messages, and other bits of context.

diagnose(verbose = FALSE) # Targets with available metadata.
## [1] "Sys.sleep" "f"         "x"         "y"

f <- function(x){
  if (x < 0){
    stop("`x` cannot be negative.")
  }
  x
}
bad_plan <- drake_plan(
  a = 12,
  b = -a,
  my_target = f(b)
)

bad_plan
## # A tibble: 3 x 2
##   target    command
##   <chr>     <chr>  
## 1 a         12     
## 2 b         -a     
## 3 my_target f(b)

withr::with_message_sink(
  new = stdout(),
  make(bad_plan)
)
## target a
## target b
## target my_target
## fail my_target
## Error: Target `my_target`` failed. Call `diagnose(my_target)` for details. Error message:
##   `x` cannot be negative.
## Warning: No message sink to remove.

failed(verbose = FALSE) # from the last make() only
## [1] "my_target" "y"

# See also warnings and messages.
error <- diagnose(my_target, verbose = FALSE)$error

error$message
## [1] "`x` cannot be negative."

error$call
## f(b)

error$calls # View the traceback.
## [[1]]
## local({
##     f(b)
## })
## 
## [[2]]
## eval.parent(substitute(eval(quote(expr), envir)))
## 
## [[3]]
## eval(expr, p)
## 
## [[4]]
## eval(expr, p)
## 
## [[5]]
## eval(quote({
##     f(b)
## }), new.env())
## 
## [[6]]
## eval(quote({
##     f(b)
## }), new.env())
## 
## [[7]]
## f(b)
## 
## [[8]]
## stop("`x` cannot be negative.")

To figure out what went wrong, you could try to build the failed target interactively. To do that, simply call drake_build(). This function first calls loadd(deps = TRUE) to load any missing dependencies (see the replace argument here) and then builds your target.

# Pretend we just opened a new R session.
library(drake)

# Unloads target `b`.
config <- drake_config(plan = bad_plan)
## Unloading targets from environment:
##   b

# my_target depends on b.
"b" %in% ls()
## [1] FALSE

# Try to build my_target until the error is fixed.
# Skip all that pesky work checking dependencies.
drake_build(my_target, config = config)
## target my_target
## fail my_target
## Error: Target `my_target`` failed. Call `diagnose(my_target)` for details. Error message:
##   `x` cannot be negative.

# The target failed, but the dependency was loaded.
"b" %in% ls()
## [1] TRUE

# What was `b` again?
b
## [1] -12

# How was `b` used?
diagnose(my_target)$message
## NULL

diagnose(my_target)$call
## NULL

f
## function(x){
##   if (x < 0){
##     stop("`x` cannot be negative.")
##   }
##   x
## }

# Aha! The error was in f(). Let's fix it and try again.
f <- function(x){
  x <- abs(x)
  if (x < 0){
    stop("`x` cannot be negative.")
  }
  x
}

# Now it works!
# Since you called make() previously, `config` is read from the cache
# if you do not supply it.
drake_build(my_target)
## target my_target

readd(my_target)
## [1] 12

Tidy evaluation: a caveat to diagnosing interactively

Running commands in your R console is not always exactly like running them with make(). That's because make() uses tidy evaluation as implemented in the rlang package.

# This workflow plan uses rlang's quasiquotation operator `!!`.
my_plan <- drake_plan(list = c(
  little_b = "\"b\"",
  letter = "!!little_b"
))
my_plan
## # A tibble: 2 x 2
##   target   command   
##   <chr>    <chr>     
## 1 little_b "\"b\""   
## 2 letter   !!little_b
make(my_plan)
## Unloading targets from environment:
##   little_b
##   letter
## target little_b
## target letter
readd(letter)
## [1] "b"

Debrief a build session.

After your project is at least somewhat built, you can inspect and read your results from the cache.

make(my_plan, verbose = FALSE)

# drake_session(verbose = FALSE) # Prints the sessionInfo() of the last make(). # nolint

cached(verbose = FALSE)
## [1] "Sys.sleep" "a"         "b"         "f"         "letter"    "little_b" 
## [7] "my_target" "stop"      "x"

built(verbose = FALSE)
## [1] "a"         "b"         "letter"    "little_b"  "my_target" "x"

imported(verbose = FALSE)
## [1] "Sys.sleep" "f"         "stop"

loadd(little_b, verbose = FALSE)

little_b
## [1] "b"

readd(letter, verbose = FALSE)
## [1] "b"

progress(verbose = FALSE)
## Error in progress(verbose = FALSE): unused argument (verbose = FALSE)

in_progress(verbose = FALSE) # Unfinished targets
## character(0)

There are functions to help you locate the project's cache.

# find_project() # nolint
# find_cache()   # nolint

For more information on the cache, see the storage vignette.

Start tinkering.

The load_basic_example() function loads the basic example from drake_example("basic") right into your workspace. The workflow plan data frame, workspace, and import files are set up for you. Only make(my_plan) is left to you.

Drake has many more built-in examples. To see your choices, use

drake_examples()
## [1] "Docker-psock"     "Makefile-cluster" "basic"           
## [4] "gsp"              "packages"         "sge"             
## [7] "slurm"            "torque"

To write the files for an example, use drake_example().

drake_example("basic")
drake_example("slurm")