Drake tries to reproducibly track everything correctly, but there are limitations. For the most up-to-date information on unhandled edge cases, please visit the issue tracker, where you can submit your own bug reports as well. Here, I will try to address some of the main issues to keep in mind for writing reproducible workflows safely.

Commands are NOT perfectly flexible.

In your workflow plan data frame (produced by plan() and accepted by make()), your commands can usually be flexible R expressions.

plan(target1 = 1 + 1 - sqrt(sqrt(3)), 
     target2 = my_function(web_scraped_data) %>% my_tidy)
##    target                                   command
## 1 target1                     1 + 1 - sqrt(sqrt(3))
## 2 target2 my_function(web_scraped_data) %>% my_tidy

However, please try to avoid formulas and function definitions in your commands. You may be able to get away with plan(f = function(x){x + 1}) or plan(f = y ~ x) in some use cases, but be careful, this could cause problems in general. Rather than using commands for this, is better to define functions and formulas in your workspace before calling make(). (Alternatively, use the envir argument to make() for more control.) Use the check() function to help screen and quality-control your workflow plan data frame.

Minimize the side effects of your commands.

Consider the workflow plan data frame below.

plan(list = c(a = "x <- 1; return(x)"))
##   target           command
## 1      a x <- 1; return(x)

Here, x is a mere side effect of the command, and it will not be reproducibly tracked. And if you add to the workflow to include a proper target called x, the results of your analysis may not be correct. Side effects of commands can be unpredictable, so please try to minimize them.

Do not change your working directory.

During the execution workflow of a drake project, please do not change your working directory (with setwd(), for example). At the very least, if you do change your working directory during a command in your workflow plan, please return to the original working directory before the command is completed. Drake relies on a hidden cache (the .drake/ folder) at the root of your project, so navigating to a different folder will confuse drake.

Directories (folders) are not reproducibly tracked.

Yes, you can declare a file target or input file by enclosing it in single quotes in your workflow plan data frame. But entire directories (i.e. folders) cannot yet be tracked this way. This is a trickier problem to solve, and lots of individual edge cases need to be ironed out before I can deliver a clean, reliable implementation. Please see issue 12 for updates and a discussion.

Dependencies are not tracked in some edge cases.

First of all, if you are ever unsure about what exactly is reproducibly tracked, use the tracked() function to list the names of all reproducibly tracked objects, functions, targets, files, etc. Alternatively, use build_graph() to obtain an igraph object of the dependency structure of your workflow, and use plot_graph() to make a plot of the graph. And again, use the check() function to help screen and quality-control your project.

Drake uses codetools::findGlobals() in the backend to look for dependencies, which can be fooled. For example, suppose you have a custom function f in your workspace.

f <- function(){
  assign("a", 1)
  b = get("x", envir = globalenv())

When drake looks for the dependencies of f, it will fail to recognize the objects a and x, the function digest(), and the file 'my_file.txt'. Objects a and x are referenced with quoted strings, not symbols, which tricks drake. The function digest() is referenced with the scoping rule ::, so codetools::findGlobals() does not detect it. Lastly, because 'my_file.txt' is inside a function and not a command in your workflow plan data frame, drake will not reproducibly track it.

When it comes to commands in your workflow plan data frame, there are similar issues. It is possible to use double-quoted strings and the scoping operator :: to trick drake into overlooking objects, functions, and files that should be dependencies. Use the check() function to scan the workflow plan for double-quoted strings and print out messages telling you where they occur.

Proper Makefiles are not standalone.

The Makefile generated by make(plan, parallelism = "Makefile") is not standalone. Do not run it outside of drake::make(). Drake uses dummy timestamp files to tell the Makefile what to do, and running make in the terminal will most likely give incorrect results.