Hadley Wickham’s Rcpp tutorial

If you are conversant in R, you probably know about Hadley Wickham.  His ggplot package for visualization is fantastic.  His plyr package covering the split-apply-combine design pattern is even better (and plays so nicely with parallelized backends for the foreach package like snow or multicore).

Now, via the great blog SimplyStatistics, I found that Hadley has written a lauded tutorial about how to use the Rcpp package.  From the tutorial:

Sometimes R code just isn’t fast enough – you’ve used profiling to find the bottleneck, but there’s simply no way to make the code any faster. This [tutorial] is the answer to that problem.

Obviously, Rcpp is no speed panacea.  If you’ve already done hard work to vectorize your code, then just moving part of it to Cpp might not yield a great improvement in speed, and might be more work than it is worth.  But there are some good use cases covered:

  1. Say you have a loop that contains dependency on previous iterations.  It cannot be trivially parallelized, but should not be too much work to translate into Cpp. 

  2. Say you have an algorithm that is based on some more complicated data structures (e.g ball trees for k-nearest neighbours) that are not convenient to use in R.  Rcpp would make using a data structure like this a breeze.

It’s all in there, go have a look.