There are a couple of options in R, if you want to utilize multiple cores on your machine. These days my favorite is
doMC package, which depends on
foreach and
multicore packages.
in the section below squareroot for each number is calculated in parallel. Check the
vignette for more complicated example. In practice, if you need to iterate through a large data structure and there is no escape from that, this package makes things considerably faster depending on how many cores you have access to in your machine.
> library(doMC)
> registerDoMC()
> foreach(i = 1:3) %dopar% sqrt(i)
[[1]]
[1] 1
[[2]]
[1] 1.414214
[[3]]
[1] 1.732051
you can also choose how the resulting data structure is combined
> library(doMC)
> registerDoMC()
> foreach(i = 1:3,.combine="rbind") %dopar% sqrt(i)
[,1]
result.1 1.000000
result.2 1.414214
result.3 1.732051
This is very useful when you need to speed things up, but not for all cases. An issue is that the entire workspace gets copied to each thread so if you start with a large data structure of n GB the memory requirement becomes n x ncores GB. Does anyone see any way to avoid this problem?
ReplyDeleteI don't know any way around it. Please keep us posted if you find anything on this!! : )
ReplyDelete