Showing posts with label samtools. Show all posts
Showing posts with label samtools. Show all posts

Wednesday, February 1, 2012

samtools in parallel

GNU parallel is a great tool for parallelizing your samtools jobs and making things considerably faster. GNU parallel is also available through macports (useful, if you have OS X operating system).

Here is an example on how to use it. Consider a case where you have multiple SAM files in a folder and they need to be converted to BAM and indexed.

here is a slow way to achive it via a bash script. The script loops over the SAM files in the directory.
 for sample in *.sam  
 do  
   echo $sample  
   describer=$(echo ${sample} | sed 's/.sam//')  
   echo $describer  
   
   # Convert file from SAM to BAM format  
   samtools view -b $sample > ${describer}.uns.bam  
   
   # Sort BAM file  
   samtools sort ${describer}.uns.bam ${describer}   
   
   # index the bam file  
   samtools index ${describer}.bam  
   
   # Remove intermediate files  
   rm ${describer}.uns.bam  
 done 


here is how you can do it using GNU parallel and using 4 cores:

 
ls *.sam | parallel -j4 -k bash convert2bam.sh {}


The contents of covert2bam.sh:

  
sample=$1
describer=$(echo ${sample} | sed 's/.sam//')  
   
# Convert file from SAM to BAM format  
samtools view -b $sample > ${describer}.uns.bam  
   
# Sort BAM file  
samtools sort ${describer}.uns.bam ${describer}   
   
# index the bam file  
samtools index ${describer}.bam  
   
# Revove intermediate files  
rm ${describer}.uns.bam  


EDIT: and here is how you can achieve the same result without creating the intermediate file via piping (Thanks to the commentators)

  

 ls *.sam | parallel "samtools view -b -S {} | samtools sort - {.}; samtools index {.}.bam"