Tuesday, February 8, 2011

Utilizing multiple cores in R

There are a couple of options in R, if you want to utilize multiple cores on your machine. These days my favorite is doMC package, which depends on foreach and multicore packages.

in the section below squareroot for each number is calculated in parallel. Check the vignette for more complicated example. In practice, if you need to iterate through a large data structure and there is no escape from that, this package makes things considerably faster depending on how many cores you have access to in your machine.

> library(doMC)
> registerDoMC() 
> foreach(i = 1:3) %dopar% sqrt(i)
 
[[1]] 
[1] 1
[[2]] 
[1] 1.414214 
[[3]] 
[1] 1.732051


you can also choose how the resulting data structure is combined

> library(doMC) 
> registerDoMC() 
> foreach(i = 1:3,.combine="rbind") %dopar% sqrt(i)
 
             [,1]
result.1 1.000000
result.2 1.414214
result.3 1.732051 

2 comments:

  1. This is very useful when you need to speed things up, but not for all cases. An issue is that the entire workspace gets copied to each thread so if you start with a large data structure of n GB the memory requirement becomes n x ncores GB. Does anyone see any way to avoid this problem?

    ReplyDelete
  2. I don't know any way around it. Please keep us posted if you find anything on this!! : )

    ReplyDelete