Time logging

William Michael Landau

2017-10-17

1 Build times

Thanks to Jasper Clarkberg, drake records how long it takes to build each target.

library(drake)
load_basic_example()
make(my_plan, jobs = 2, verbose = FALSE) # See also max_useful_jobs(my_plan).
build_times(digits = 8) # From the cache.
##                      item   type elapsed   user system
## 1            'report.Rmd' import  0.016s 0.007s 0.008s
## 2             'report.md' target  0.039s 0.036s 0.003s
## 3  coef_regression1_large target  0.003s 0.001s 0.002s
## 4  coef_regression1_small target  0.004s 0.001s 0.003s
## 5  coef_regression2_large target  0.004s 0.003s 0.001s
## 6  coef_regression2_small target  0.004s 0.002s 0.002s
## 7            coefficients import  0.005s 0.004s 0.001s
## 8              data.frame import  0.011s 0.009s 0.002s
## 9                    knit import  0.018s 0.013s 0.003s
## 10                  large target  0.007s 0.003s 0.002s
## 11                     lm import  0.006s 0.004s 0.002s
## 12                   reg1 import  0.006s 0.003s 0.001s
## 13                   reg2 import  0.005s 0.004s     0s
## 14      regression1_large target  0.013s 0.004s 0.007s
## 15      regression1_small target  0.013s 0.004s 0.008s
## 16      regression2_large target  0.005s 0.002s 0.003s
## 17      regression2_small target  0.005s 0.002s 0.003s
## 18                  rpois import  0.004s 0.004s     0s
## 19               simulate import  0.005s 0.003s 0.002s
## 20                  small target  0.007s 0.001s 0.004s
## 21           stats::rnorm import  0.004s 0.003s 0.001s
## 22 summ_regression1_large target  0.006s 0.004s     0s
## 23 summ_regression1_small target  0.007s 0.003s 0.002s
## 24 summ_regression2_large target  0.004s 0.002s 0.003s
## 25 summ_regression2_small target  0.005s 0.004s 0.001s
## 26                summary import  0.009s 0.007s 0.002s
## 27       suppressWarnings import  0.007s 0.004s 0.003s
build_times(digits = 8, targets_only = TRUE)
##                      item   type elapsed   user system
## 2             'report.md' target  0.039s 0.036s 0.003s
## 3  coef_regression1_large target  0.003s 0.001s 0.002s
## 4  coef_regression1_small target  0.004s 0.001s 0.003s
## 5  coef_regression2_large target  0.004s 0.003s 0.001s
## 6  coef_regression2_small target  0.004s 0.002s 0.002s
## 10                  large target  0.007s 0.003s 0.002s
## 14      regression1_large target  0.013s 0.004s 0.007s
## 15      regression1_small target  0.013s 0.004s 0.008s
## 16      regression2_large target  0.005s 0.002s 0.003s
## 17      regression2_small target  0.005s 0.002s 0.003s
## 20                  small target  0.007s 0.001s 0.004s
## 22 summ_regression1_large target  0.006s 0.004s     0s
## 23 summ_regression1_small target  0.007s 0.003s 0.002s
## 24 summ_regression2_large target  0.004s 0.002s 0.003s
## 25 summ_regression2_small target  0.005s 0.004s 0.001s

For drake version 4.1.0 and earlier, build_times() just measures the elapsed runtime of each command in my_plan$command. For later versions, the build times also account for all the internal operations in drake:::build(), such as storage and hashing.

2 Predicting runtime

Drake uses these times to predict the runtime of the next make(). At the moment, everything is up to date, so a hypothetical next make() should be fast. Here, we only factor in the times of the targets (excluding the imports using targets_only = TRUE).

predict_runtime(
  my_plan,
  digits = 8,
  verbose = FALSE,
  targets_only = TRUE
)
## [1] "0s"

But you can also predict the elapsed time of a full runthrough scratch (after clean()).

predict_runtime(
  my_plan,
  from_scratch = TRUE,
  digits = 8,
  verbose = FALSE,
  targets_only = TRUE
)
## [1] "0.126s"

Let’s change a dependency to make some targets out of date. Now, even though from_scatch is FALSE, the next make() should take some time.

reg2 <- function(d){
  d$x3 <- d$x ^ 3
  lm(y ~ x3, data = d)
}
predict_runtime(
  my_plan,
  digits = 8,
  verbose = FALSE,
  targets_only = TRUE
)
## [1] "0.066s"

We can also factor in parallelism using the future_jobs argument.

predict_runtime(
  my_plan,
  future_jobs = 1,
  from_scratch = TRUE,
  digits = 8,
  verbose = FALSE,
  targets_only = TRUE
)
## [1] "0.126s"
predict_runtime(
  my_plan,
  future_jobs = 2,
  from_scratch = TRUE,
  digits = 8,
  verbose = FALSE,
  targets_only = TRUE
)
## [1] "0.094s"
predict_runtime(
  my_plan,
  future_jobs = 4,
  from_scratch = TRUE,
  digits = 8,
  verbose = FALSE,
  targets_only = TRUE
)
## [1] "0.072s"

3 Rate-limiting targets

To predict the next runtime with multiple parallel jobs, drake makes some assumptions.

  1. The outdated targets are spread out evenly over the available jobs. (All targets are used if from_scratch is TRUE)
  2. One job gets all the slowest targets (pessimistic scenario).

Then, drake simply takes the targets from the slowest job in each parallelizable stage and sums the corresponding elapsed build times. A parallelizable stage is a column in the workflow graph.

# Hover, click, drag, zoom, and pan.
plot_graph(my_plan, width = "100%", height = "500px")

You can explore the rate-limiting targets.

rate_limiting_times(
  my_plan,
  from_scratch = TRUE,
  digits = 8,
  verbose = FALSE,
  targets_only = TRUE
)
##                      item   type elapsed   user system stage
## 1                   small target  0.007s 0.001s 0.004s     1
## 2                   large target  0.007s 0.003s 0.002s     1
## 3       regression1_small target  0.013s 0.004s 0.008s     2
## 4       regression1_large target  0.013s 0.004s 0.007s     2
## 5       regression2_small target  0.005s 0.002s 0.003s     2
## 6       regression2_large target  0.005s 0.002s 0.003s     2
## 7  summ_regression1_small target  0.007s 0.003s 0.002s     3
## 8  summ_regression1_large target  0.006s 0.004s     0s     3
## 9  summ_regression2_small target  0.005s 0.004s 0.001s     3
## 10 summ_regression2_large target  0.004s 0.002s 0.003s     3
## 11 coef_regression1_small target  0.004s 0.001s 0.003s     3
## 12 coef_regression2_large target  0.004s 0.003s 0.001s     3
## 13 coef_regression2_small target  0.004s 0.002s 0.002s     3
## 14 coef_regression1_large target  0.003s 0.001s 0.002s     3
## 15            'report.md' target  0.039s 0.036s 0.003s     4
rate_limiting_times(
  my_plan,
  future_jobs = 2,
  from_scratch = TRUE,
  digits = 8,
  verbose = FALSE,
  targets_only = TRUE
)
##                     item   type elapsed   user system stage
## 1                  small target  0.007s 0.001s 0.004s     1
## 2      regression1_small target  0.013s 0.004s 0.008s     2
## 3      regression1_large target  0.013s 0.004s 0.007s     2
## 4 summ_regression1_small target  0.007s 0.003s 0.002s     3
## 5 summ_regression1_large target  0.006s 0.004s     0s     3
## 6 summ_regression2_small target  0.005s 0.004s 0.001s     3
## 7 summ_regression2_large target  0.004s 0.002s 0.003s     3
## 8            'report.md' target  0.039s 0.036s 0.003s     4
rate_limiting_times(
  my_plan,
  future_jobs = 4,
  from_scratch = TRUE,
  digits = 8,
  verbose = FALSE,
  targets_only = TRUE
)
##                     item   type elapsed   user system stage
## 1                  small target  0.007s 0.001s 0.004s     1
## 2      regression1_small target  0.013s 0.004s 0.008s     2
## 3 summ_regression1_small target  0.007s 0.003s 0.002s     3
## 4 summ_regression1_large target  0.006s 0.004s     0s     3
## 5            'report.md' target  0.039s 0.036s 0.003s     4

4 A word of caution

Drake only accounts for the targets with logged build times. If some targets have not been timed, drake throws a warning and lists the names of the untimed targets.