Time logging

William Michael Landau

2017-11-05

1 Build times

Thanks to Jasper Clarkberg, drake records how long it takes to build each target.

library(drake)
load_basic_example()
make(my_plan, jobs = 2, verbose = FALSE) # See also max_useful_jobs(my_plan).
build_times(digits = 8) # From the cache.
## cache C:/Users/c240390/AppData/Local/Temp/RtmpSQkVhU/Rbuild2db86da6234/drake/...
##                      item   type elapsed  user system
## 1            'report.Rmd' import   0.03s 0.03s     0s
## 2             'report.md' target   0.55s  0.4s  0.03s
## 3  coef_regression1_large target      0s    0s     0s
## 4  coef_regression1_small target   0.01s    0s  0.01s
## 5  coef_regression2_large target      0s    0s     0s
## 6  coef_regression2_small target   0.02s 0.02s     0s
## 7            coefficients import   0.03s 0.01s  0.02s
## 8              data.frame import   0.02s 0.01s     0s
## 9                    knit import   0.02s    0s  0.01s
## 10                  large target   0.18s 0.15s  0.01s
## 11                     lm import   0.02s 0.02s     0s
## 12                   reg1 import      0s    0s     0s
## 13                   reg2 import      0s    0s     0s
## 14      regression1_large target   0.04s 0.01s  0.02s
## 15      regression1_small target   0.05s 0.05s     0s
## 16      regression2_large target   0.04s    0s  0.01s
## 17      regression2_small target   0.08s 0.05s  0.04s
## 18                  rpois import      0s    0s     0s
## 19               simulate import   0.02s    0s  0.02s
## 20                  small target   0.16s 0.16s     0s
## 21           stats::rnorm import   0.02s    0s  0.01s
## 22 summ_regression1_large target   0.02s 0.02s     0s
## 23 summ_regression1_small target   0.02s    0s  0.02s
## 24 summ_regression2_large target      0s    0s     0s
## 25 summ_regression2_small target   0.02s    0s  0.01s
## 26                summary import   0.01s 0.01s     0s
## 27       suppressWarnings import      0s    0s     0s
build_times(digits = 8, targets_only = TRUE)
## cache C:/Users/c240390/AppData/Local/Temp/RtmpSQkVhU/Rbuild2db86da6234/drake/...
##                      item   type elapsed  user system
## 2             'report.md' target   0.55s  0.4s  0.03s
## 3  coef_regression1_large target      0s    0s     0s
## 4  coef_regression1_small target   0.01s    0s  0.01s
## 5  coef_regression2_large target      0s    0s     0s
## 6  coef_regression2_small target   0.02s 0.02s     0s
## 10                  large target   0.18s 0.15s  0.01s
## 14      regression1_large target   0.04s 0.01s  0.02s
## 15      regression1_small target   0.05s 0.05s     0s
## 16      regression2_large target   0.04s    0s  0.01s
## 17      regression2_small target   0.08s 0.05s  0.04s
## 20                  small target   0.16s 0.16s     0s
## 22 summ_regression1_large target   0.02s 0.02s     0s
## 23 summ_regression1_small target   0.02s    0s  0.02s
## 24 summ_regression2_large target      0s    0s     0s
## 25 summ_regression2_small target   0.02s    0s  0.01s

For drake version 4.1.0 and earlier, build_times() just measures the elapsed runtime of each command in my_plan$command. For later versions, the build times also account for all the internal operations in drake:::build(), such as storage and hashing.

2 Predicting runtime

Drake uses these times to predict the runtime of the next make(). At the moment, everything is up to date, so a hypothetical next make() should be fast. Here, we only factor in the times of the targets (excluding the imports using targets_only = TRUE).

predict_runtime(
  my_plan,
  digits = 8,
  verbose = FALSE,
  targets_only = TRUE
)
## [1] "0s"

But you can also predict the elapsed time of a full runthrough scratch (after clean()).

predict_runtime(
  my_plan,
  from_scratch = TRUE,
  digits = 8,
  verbose = FALSE,
  targets_only = TRUE
)
## [1] "1.19s"

Let’s change a dependency to make some targets out of date. Now, even though from_scatch is FALSE, the next make() should take some time.

reg2 <- function(d){
  d$x3 <- d$x ^ 3
  lm(y ~ x3, data = d)
}
predict_runtime(
  my_plan,
  digits = 8,
  verbose = FALSE,
  targets_only = TRUE
)
## [1] "0.71s"

We can also factor in parallelism using the future_jobs argument.

predict_runtime(
  my_plan,
  future_jobs = 1,
  from_scratch = TRUE,
  digits = 8,
  verbose = FALSE,
  targets_only = TRUE
)
## [1] "1.19s"
predict_runtime(
  my_plan,
  future_jobs = 2,
  from_scratch = TRUE,
  digits = 8,
  verbose = FALSE,
  targets_only = TRUE
)
## [1] "0.94s"
predict_runtime(
  my_plan,
  future_jobs = 4,
  from_scratch = TRUE,
  digits = 8,
  verbose = FALSE,
  targets_only = TRUE
)
## [1] "0.85s"

3 Rate-limiting targets

To predict the next runtime with multiple parallel jobs, drake makes some assumptions.

  1. The outdated targets are spread out evenly over the available jobs. (All targets are used if from_scratch is TRUE)
  2. One job gets all the slowest targets (pessimistic scenario).

Then, drake simply takes the targets from the slowest job in each parallelizable stage and sums the corresponding elapsed build times. A parallelizable stage is a column in the workflow graph.

# Hover, click, drag, zoom, and pan.
plot_graph(my_plan, width = "100%", height = "500px")

You can explore the rate-limiting targets.

rate_limiting_times(
  my_plan,
  from_scratch = TRUE,
  digits = 8,
  verbose = FALSE,
  targets_only = TRUE
)
##                      item   type elapsed  user system stage
## 1                   large target   0.18s 0.15s  0.01s     1
## 2                   small target   0.16s 0.16s     0s     1
## 3       regression2_small target   0.08s 0.05s  0.04s     2
## 4       regression1_small target   0.05s 0.05s     0s     2
## 5       regression1_large target   0.04s 0.01s  0.02s     2
## 6       regression2_large target   0.04s    0s  0.01s     2
## 7  summ_regression2_small target   0.02s    0s  0.01s     3
## 8  summ_regression1_small target   0.02s    0s  0.02s     3
## 9  summ_regression1_large target   0.02s 0.02s     0s     3
## 10 coef_regression2_small target   0.02s 0.02s     0s     3
## 11 coef_regression1_small target   0.01s    0s  0.01s     3
## 12 summ_regression2_large target      0s    0s     0s     3
## 13 coef_regression1_large target      0s    0s     0s     3
## 14 coef_regression2_large target      0s    0s     0s     3
## 15            'report.md' target   0.55s  0.4s  0.03s     4
rate_limiting_times(
  my_plan,
  future_jobs = 2,
  from_scratch = TRUE,
  digits = 8,
  verbose = FALSE,
  targets_only = TRUE
)
##                     item   type elapsed  user system stage
## 1                  large target   0.18s 0.15s  0.01s     1
## 2      regression2_small target   0.08s 0.05s  0.04s     2
## 3      regression1_small target   0.05s 0.05s     0s     2
## 4 summ_regression2_small target   0.02s    0s  0.01s     3
## 5 summ_regression1_small target   0.02s    0s  0.02s     3
## 6 summ_regression1_large target   0.02s 0.02s     0s     3
## 7 coef_regression2_small target   0.02s 0.02s     0s     3
## 8            'report.md' target   0.55s  0.4s  0.03s     4
rate_limiting_times(
  my_plan,
  future_jobs = 4,
  from_scratch = TRUE,
  digits = 8,
  verbose = FALSE,
  targets_only = TRUE
)
##                     item   type elapsed  user system stage
## 1                  large target   0.18s 0.15s  0.01s     1
## 2      regression2_small target   0.08s 0.05s  0.04s     2
## 3 summ_regression2_small target   0.02s    0s  0.01s     3
## 4 summ_regression1_small target   0.02s    0s  0.02s     3
## 5            'report.md' target   0.55s  0.4s  0.03s     4

4 A word of caution

Drake only accounts for the targets with logged build times. If some targets have not been timed, drake throws a warning and lists the names of the untimed targets.