Custom References

Custom References

The goal of this vignette is to show how to use custom asymptotic references. As an example, we explore the differences in asymptotic time complexity between different implementations of binary segmentation.

Asymptotic time/memory measurement of different R expressions

The code below uses the following arguments:

library(data.table)
seg.result <- atime::atime(
  N=2^seq(2, 20),
  setup={
    max.segs <- as.integer(N/2)
    max.changes <- max.segs-1L
    set.seed(1)
    data.vec <- 1:N
  },
  "changepoint\n::cpt.mean"={
    cpt.fit <- changepoint::cpt.mean(data.vec, method="BinSeg", Q=max.changes)
    sort(c(N,cpt.fit@cpts.full[max.changes,]))
  },
  "binsegRcpp\nmultiset"={
    binseg.fit <- binsegRcpp::binseg(
      "mean_norm", data.vec, max.segs, container.str="multiset")
    sort(binseg.fit$splits$end)
  },
  "fpop::\nmultiBinSeg"={
    mbs.fit <- fpop::multiBinSeg(data.vec, max.changes)
    sort(c(mbs.fit$t.est, N))
  },
  "wbs::sbs"={
    wbs.fit <- wbs::sbs(data.vec)
    split.dt <- data.table(wbs.fit$res)[order(-min.th, scale)]
    sort(split.dt[, c(N, cpt)][1:max.segs])
  },
  "binsegRcpp\nlist"={
    binseg.fit <- binsegRcpp::binseg(
      "mean_norm", data.vec, max.segs, container.str="list")
    sort(binseg.fit$splits$end)
  })
plot(seg.result)
#> Warning: Transformation introduced infinite values in continuous y-axis
#> Transformation introduced infinite values in continuous y-axis
#> Transformation introduced infinite values in continuous y-axis

plot of chunk unnamed-chunk-1

The plot method creates a log-log plot of median time and memory vs data size, for each of the specified R expressions. The plot above shows some speed differences between binary segmentation algorithms, but they could be even easier to see for larger data sizes (exercise for the reader: try modifying the N and seconds.limit arguments). You can also see that memory usage is much larger for changepoint than for the other packages.

You can use references_best to get a tall/long data table that can be plotted to show both empirical time and memory complexity:

seg.best <- atime::references_best(seg.result)
plot(seg.best)
#> Warning: Transformation introduced infinite values in continuous y-axis

plot of chunk unnamed-chunk-2

The figure above shows asymptotic references which are best fit for each expression. We do the best fit by adjusting each reference to the largest N, and then ranking each reference by distance to the measurement of the second to largest N.

predict method

(seg.pred <- predict(seg.best))
#> atime_prediction object
#>       unit               expr.name unit.value         N
#>     <char>                  <char>      <num>     <num>
#> 1: seconds changepoint\n::cpt.mean       0.01  222.0695
#> 2: seconds    binsegRcpp\nmultiset       0.01 1026.3114
#> 3: seconds     fpop::\nmultiBinSeg       0.01 5334.0288
#> 4: seconds                wbs::sbs       0.01 1481.9823
#> 5: seconds        binsegRcpp\nlist       0.01  561.7443
plot(seg.pred)
#> Warning: Transformation introduced infinite values in continuous x-axis

plot of chunk unnamed-chunk-3

Custom asymptotic references

If you have one or more expected time complexity classes that you want to compare with your empirical measurements, you can use the fun.list argument. Note that each function in that list should take as input the data size N and output log base 10 of the reference function, as below:

my.refs <- list(
  "N \\log N"=function(N)log10(N) + log10(log(N)),
  "N^2"=function(N)2*log10(N),
  "N^3"=function(N)3*log10(N))
my.best <- atime::references_best(seg.result, fun.list=my.refs)
plot(my.best)
#> Warning: Transformation introduced infinite values in continuous y-axis

plot of chunk unnamed-chunk-4

From the plot above you should be able to see the asymptotic time complexity class of each algorithm. Keep in mind that the plot method displays only the next largest and next smallest reference lines, from those specified in fun.list.

Exercises for the reader