crew is a distributed computing framework with a
centralized interface and auto-scaling. A
is an object in R which accepts tasks, returns results, and launches
workers. Workers can be local processes, jobs on traditional clusters
such as SLURM, or jobs on cloud services such as AWS Batch, depending on
plugin of the controller.
A task is a piece of R code, such as an expression or a
function call. A worker is a non-interactive
R process that runs one or more tasks. When tasks run on workers, the
local R session is free and responsive, and work gets done faster. For
vignette shows how
together to speed up Shiny apps.
First, create a controller object to manage tasks and workers.
Next, start the controller to create the
client. Later, when you are done with the controller, call
controller$terminate() to clean up the workers and
push() to submit a new task and
to return a completed task.
As a side effect, methods
scale() also launch workers to run the tasks. If your
controller uses transient workers and has a backlog of tasks, you may
need to loop over
times to make sure enough workers are always available.
controller$pop() # No workers started yet and the task is not done. #> NULL task <- controller$pop() # Worker started, task complete. task #> # A tibble: 1 × 12 #> name command result seconds seed algorithm error trace warnings #> <chr> <chr> <list> <dbl> <int> <chr> <chr> <chr> <chr> #> 1 get pid NA <int> 0 NA NA NA NA NA #> # ℹ 3 more variables: launcher <chr>, worker <int>, instance <chr>
wait() is a loop that repeatedly checks
tasks and launches workers until all tasks complete.
The return value of the task is in the
Here is the full list of output in the
name: the task name if given.
command: a character string with the R command if
save_commandwas set to
result: a list containing the return value of the R command.
seconds: number of seconds that the task ran.
seed: the single integer originally supplied to
seedwas supplied as
algorithm: name of the pseudo-random number generator algorithm originally supplied to
algorithmwas supplied as
error: the first 2048 characters of the error message if the task threw an error,
trace: the first 2048 characters of the text of the traceback if the task threw an error,
warnings: the first 2048 characters. of the text of warning messages that the task may have generated,
launcher: name of the
crewlauncher where the task ran.
algorithm are both non-missing
in the output, then you can recover the pseudo-random number generator
state of the task using
set.seed(seed = seed, kind = algorithm). However, it is
recommended to supply
NULL to these arguments in
push(), in which case you will observe
the outputs. With
NULL, the random number generator defaults to the
recommended widely spaced worker-specific L’Ecuyer streams supported by
vignette("parallel", package = "parallel") for details.
method of the controller supports functional programming similar to
The arguments of
are mostly the same those of
but there is a new
iterate argument to define the inputs of
submits a whole collection of tasks, auto-scales the workers, waits for
all the tasks to finish, and returns the results in a
submits one task to compute
1 + 2 + 5 + 6 and another task
3 + 4 + 5 + 6. The lists and vectors inside
iterate vary from task to task, while the elements of
globals stay constant across
results <- controller$map( command = a + b + c + d, iterate = list( a = c(1, 3), b = c(2, 4) ), data = list(c = 5), globals = list(d = 6) ) results #> # A tibble: 2 × 12 #> name command result seconds seed algorithm error trace warnings #> <chr> <chr> <list> <dbl> <int> <chr> <chr> <chr> <chr> #> 1 1 NA <dbl > 0 NA NA NA NA NA #> 2 2 NA <dbl > 0 NA NA NA NA NA #> # ℹ 3 more variables: launcher <chr>, worker <int>, instance <chr> as.numeric(results$result) #>  14 18
If at least one task in
throws an error, the default behavior is to error out in the main
session and not return the results, If that happens, the results are
available in the
controller$error. To return the results
instead of setting
controller$error, regardless of error
error = "warn" or
To conserve memory, consider setting
controller$error <- NULL when you are done
The controller summary shows how many tasks each worker ran, how many total seconds it spent running tasks, and how many tasks threw warnings and errors.
The launcher summary counts the number of times each worker was
launched, and it shows the total number of assigned and completed tasks
from all past terminated instances of each worker. In addition, it shows
whether the current worker instance was actively connected (“online”) or
had connected at some point during its life cycle (“discovered”) as of
the last call to
Finally, the client summary shows up-to-date worker status from
terminate() on the controller after you finish
terminate() tries to close the the
dispatcher and any workers that may still be running. It is important to
free up these resources.
mirai dispatcher process should exit on its own, but
if not, you can manually terminate the process ID at
controller$client$dispatcher or call
crew_clean() to terminate any dispatchers from current or
previous R sessions.
As explained above,
wait() launch new workers to run tasks. The number of new
workers depends on the number of tasks at the time. In addition, workers
can shut themselves down as work completes. In other words,
crew automatically raises and lowers the number of workers
in response to fluctuations in the task workload.
The most useful arguments for down-scaling, in order of importance, are:
seconds_idle: shut down a worker if it spends too long waiting for a task.
tasks_max: shut down a worker after it completes a certain number of tasks.
seconds_wall: soft wall time of a worker.
Please tune these these arguments to achieve the desired balance for
auto-scaling. The two extremes of auto-scaling are
persistent workers and
transient workers, and each is problematic in its own way.
Some launchers support local processes to launch and terminate
workers asynchronously. For example, a cloud-based launcher may need to
make HTTP requests to launch and terminate workers on e.g. AWS Batch,
and these time-consuming requests should happen in the background.
Controllers that support this will have a
argument to specify the number of local R processes to churn through
worker launches and terminations. Set
processes = NULL to
disable async, which can be helpful for troubleshooting.