This vignette describes general best practices for creating, configuring, and running
It is best to write your code as a bunch of functions. You can save those functions in R scripts and then
source() them before doing anything else.
# Load functions get_data(), analyze_data, and summarize_results() source("my_functions.R")
Then, set up your workflow plan data frame.
good_plan <- drake_plan( my_data = get_data('data.csv'), # External files need to be in commands explicitly. # nolint my_analysis = analyze_data(my_data), my_summaries = summarize_results(my_data, my_analysis) ) good_plan ## target command ## 1 my_data get_data('data.csv') ## 2 my_analysis analyze_data(my_data) ## 3 my_summaries summarize_results(my_data, my_analysis)
Drake knows that
my_analysis depends on
my_data is an argument to
analyze_data(), which is part of the command for
config <- drake_config(good_plan) vis_drake_graph(config)
Now, you can call
make() to build the targets.
If your commands are really long, just put them in larger functions.
Drake analyzes imported functions for non-file dependencies.
Some people are accustomed to dividing their work into R scripts and then calling
source() to run each step of the analysis. For example you might have the following files.
If you migrate to
drake, you may be tempted to set up a workflow plan like this.
bad_plan <- drake_plan( my_data = source('get_data.R'), # nolint my_analysis = source('analyze_data.R'), # nolint my_summaries = source('summarize_data.R') # nolint ) bad_plan ## target command ## 1 my_data source('get_data.R') ## 2 my_analysis source('analyze_data.R') ## 3 my_summaries source('summarize_data.R')
But now, the dependency structure of your work is broken. Your R script files are dependencies, but since
my_data is not mentioned in a function or command,
drake does not know that
my_analysis depends on it.
config <- drake_config(bad_plan) vis_drake_graph(config)
make(bad_plan, jobs = 2),
drakewill try to build
my_analysisat the same time even though
my_datamust finish before
Drakeis oblivious to
data.csvsince it is not explicitly mentioned in a workflow plan command. So when
make(bad_plan)will not rebuild
my_analysiswill not update when
source()is formatted counter-intuitively. If
source('get_data.R')is the command for
my_datawill always be a list with elements
"visible". In other words,
source('get_data.R')$valueis really what you would want.
In addition, this
source()-based approach is simply inconvenient.
my_data every time
get_data.R changes, even when those changes are just extra comments or blank lines. On the other hand, in the previous plan that uses
my_data = get_data(),
drake does not trigger rebuilds when comments or whitespace in
get_data() are modified.
Drake is R-focused, not file-focused. If you embrace this viewpoint, your work will be easier.
Drake makes special exceptions for R Markdown reports and other knitr reports such as
*.Rnw files. Not every
drake project needs them, but it is good practice to use them to summarize the final results of a project once all the other targets have already been built. The basic example, for instance, has an R Markdown report.
report.Rmd is knitted to build
report.md, which summarizes the final results.
# Load all the functions and the workflow plan data frame, my_plan. load_basic_example() # Get the code with drake_example("basic"). ## cache /tmp/Rtmp7EbM7A/Rbuild75ca54d145b9/drake/vignettes/.drake ## connect 7 imports: tmp, simulate, reg1, my_plan, reg2, bad_plan, good_plan ## connect 15 targets: 'report.md', small, large, regression1_small, regression1...
To see where
report.md will be built, look to the right of the workflow graph.
config <- drake_config(my_plan) vis_drake_graph(config)
Drake treats knitr report as a special cases. Whenever
render() (rmarkdown) mentioned in a command, it dives into the source file to look for dependencies. Consider
report.Rmd, which you can view here. When
readd(small) in an active code chunk, it knows report.Rmd depends on the target called
small, and it draws the appropriate arrow in the workflow graph above. And if
small ever changes,
make(my_plan) will re-process report.Rmd to produce the target file
knitr reports are the only kind of file that
drake analyzes for dependencies. It does not give R scripts the same special treatment.
## Error in file.remove(): invalid first filename