The following code reads in a bed-like file with numerical values for chromosome (1:24) and strand (1,0), processes the file by parallelizing on chromosomes, and returns the values as a list. Note the use of the descriptor to identify the shared object. Any change on the shared object will immediately be visible for all processes.
library(bigmemory)
library(doMC)
registerDoMC(cores=24)
bigtab <- read.big.matrix(filename, sep="\t" col.names=c('chr','start','end','strand'),
type='integer', shared=FALSE)
descriptor <- describe(bigtab)
result <- foreach(chr = seq(1,24)) %dopar% {
tab <- attach.big.matrix(descriptor)
tab.chr <- tab[tab[,'chr'] == chr,]
# Do some stuff with these values
# and return result
}
Cool, huh?
this is cool!!!
ReplyDelete