Data Stepping

Brandon Taylor


First, import datastepr.


The basic idea behind this package was inspired by SAS data steps. In each step, the environment is populated by a slice of data from a data frame. Then, operations are performed. The environment is whole-sale appended to a results data frame. Then, the datastep repeats.


Let’s begin with a brief tour of the dataStepClass. First, create an instance.

step = dataStepClass$new()

Please read the dataStepClass documentation before continuing, which is extensive.

## Using development documentation for dataStepClass


Our example will be Euler’s method for solving differential equations. In fact, it is unimportant if you understand the method itself. The differential equation to be solved is given below: \[ \dfrac{dy}{dx} = xy \]

First, we will set initial values. The x values are the series of x values over which the method will be applied.

xFrame = data.frame(x = 0:9)

Our initial y value will only be for the first iteration of the data-step.

y_initial = data.frame(y = 1)

Now here is our stair function. First, begin is called, setting up an evaluation environment in the function’s environment(). Next, only in the first step, initialize y. Note, importantly, that without another set call later (or a manual override of continue), the data step would only run once. A lag of x is stored in all but the first step. This is important, because after the set call, x is overwritten using a slice of the dataframe above. Then, a new y is estimated using the new x, the lag of x, and the derivative estimate (in all but the first step). Next, a derivative is estimated (see equation above). Finally, we output the results.

stairs = function(...) {

  if (step$i == 1) step$set(y_initial)

  if (step$i > 1) lagx = x

  if (step$i > 1) y = y + dydx*(x - lagx)
  dydx = x*y


Let’s take a look at our results!

dydx x y lagx
0 0 1 NA
1 1 1 0
4 2 2 1
18 3 6 2
96 4 24 3
600 5 120 4
4320 6 720 5
35280 7 5040 6
322560 8 40320 7
3265920 9 362880 8