Tuesday, February 18, 2014

summary, annotation and visualization of genomic intervals with genomation package

We have been working on a new R package, genomation, that will help with the analysis of genomic intervals. Briefly, the package contains a collection of tools for visualizing and analyzing genome-wide data sets. The package works with a variety of genomic interval file types (BAM, bigWig, BED, GFF and any tabular format containing genomic locations) and enables easy summarization and annotation of high throughput data sets with given genomic annotations. Below is a presentation I'm preparing for an upcoming internal meeting. You can find more information and the vignette on the webpage.

genomation is published in Bioinformatics
Akalin A, Franke V, Vlahoviček K, Mason CE, Schübeler D. genomation: a toolkit to summarize, annotate and visualize genomic intervals. Bioinformatics. 2014 Nov 21. pii: btu775

It is also currently in the Bioconductor development version

Sunday, August 25, 2013

Using JavaScript visualization libraries with R

This is a short tutorial on knitr/markdown and JS visualization packages googleVis and rCharts. With these packages you can create web pages with interactive visualizations just using R. This will require minimal or no knowledge of HTML or JavaScript. 
You need to have the following R packages and their dependencies installed:
  • knitr
  • googleVis
  • rCharts (not on CRAN)
The tutorial is organized in four parts. First two parts introduce basics about knitr and markdown. The last parts are about using googleVis and rCharts packages in markdown documents. However, the tutorial does not have in-depth examples that show all the capabilities of rCharts and googleVis.

You can download the .Rmd files (or clone the repository from github) and run knit2html() on them in your R console, or if you are using RStudio you can click "knit HTML" button on the upper left corner.

The best way to go through the tutorial is to examine the code chunks and explanations in .Rmd files, and then check the HTML output from knit2html().

1. R and markdown

markdown_knitr.Rmd shows basics of markdown and knitr integration. These tools will help you create an HTML document using R. knit2html()output is here. In addition, R markdown basics are described here.

2. Customizing code output in R markdown documents

controlling_knitr.Rmd shows how code chunk output can be controlled by knitr options. knit2html() output is here

3. Using Google visualization API in R markdown documents

googleVis.Rmd shows how to use googleVis package in a markdown document.You can incorporate plots from Google Visualization API in your R markdown document, which will be converted to an HTML document by knit2html(), the HTML output is here.

4. Using multiple JS visualization libraries in R markdown documents

rCharts.Rmd shows how to use rCharts package in a markdown document. Using rCharts, you can incorporate various JS visualizations (such as Polychart, NVD3 etc.) on your HTML document. knit2html() output is here.

The project page is: http://al2na.github.io/Rmarkdown_JSviz/

Wednesday, July 24, 2013

Q&A forum for methylKit

methylKit is an R package for DNA methylation analysis and annotation using high-throughput bisulfite sequencing data. We recently activated a Q&A forum for methylKit related questions. If you are using methylKit and if you have questions about it,  it is a great place to ask questions and browse previously answered questions.

You can reach it via https://groups.google.com/forum/#!forum/methylkit_discussion

Tuesday, March 19, 2013

EpiWorkshop 2013: DNA methylation analysis in R

Elemento Lab at Weill Cornell Medical College organized a workshop on Epigenomics. I had the opportunity to give a tutorial on DNA methylation analysis in R. The tutorial demonstrates how to analyze high-throughput bisulfite sequencing data ( RRBS is used as an example in the tutorial) with R package methylKit.

My slides are below and all of the workshop material is here at the workshop website.

Friday, March 1, 2013

Using ENCODE methylation data (RRBS) in R

ENCODE project has generated reduced-representation bilsulfite sequencing data for multiple cell lines. The data is organized in an extended bed format with additional columns denoting % methylation and coverage per base. Luckily, this sort of generic % methylation information can be read in by R package methylKit, which is a package for analyzing basepair resolution 5mC and 5hmC data.

The code snippets below show how to read RRBS bed file produced by ENCODE. But, let's first download the files.
wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeHaibMethylRrbs/wgEncodeHaibMethylRrbsA549Dm002p7dHaibSitesRep1.bed.gz

wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeHaibMethylRrbs/wgEncodeHaibMethylRrbsAg04449UwstamgrowprotSitesRep1.bed9.gz

Unfortunately, methylKit currently can not read them directly because the track definition line causes a problem. It should be deleted from each bed file. Ideally, methylKit should be able to skip over lines (this issue should be fixed in later versions)
For now, we have to use some unix tools to remove the first lines from the bed files. You run the code below in your terminal. This set of commands will delete the first line from every *.gz file in the directory so be careful.
for files in *.gz
  gzip -dc "$files" | tail +2 | gzip -c > "$files".tmp
  if [ "$?" -eq 0 ]; then
    mv "$files".tmp "$files"

Now we can read the files using methylKit. The pipeline argument defines which columns in the file are corresponding to chr,start,end,strand, percent methylation and coverage:
You can also read multiple files at a time:

Since we have read the files and now they are methylKit objects, we can use all of the methylKit functionality on these objects. For example, the code below plots the distribution of methylation % for covered bases.

getMethylationStats(obj[[1]], plot = TRUE)

You can check the methylKit vignette and the website for the rest of the functionality and details.

Wednesday, February 20, 2013

Progress bar in R

A decent percentage of working time in R, I spend looping over chromosomes, transcription factors or tissues, usually, using parallelization.

To get the stuff to run simultaneously I use the foreach function from the doMC package, and for monitoring of the progress of the execution, I made use of the cat/print functions, which mostly just clutter the terminal.

Today a friend of mine set me a link to this blog, and I was dumbfounded when I read that the R base package has an inbuilt function for a nice looking progress bar. To my luck, it works perfectly when put inside a foreach loop.

total <- 200
# create progress bar
pb <- txtProgressBar(min = 0, max = total, style = 3)
foreach(i = 1:total)%dopar%{
   setTxtProgressBar(pb, i)


Friday, December 14, 2012

Sending commands from Notepad++ to a remote R session

If you have your working environment set up in a Windows operating system, it can be a bit of a hassle to work with R sessions on remote Linux servers.

I use WinSCP + Notepad++ to handle my projects and Putty + screen to handle the R sessions. It becomes tiring to use the mouse to move the code from Notepad to the Putty all the time. Thankfully the amazing AutoHotkey comes to the rescue.

Using AutoHotkey it is possible to set up a hotkey macro which will copy the code that you want from the Notepad and paste it into Putty.

The following code does exactly that:

Send {End 2}+{Home}^c{End 2}{Down}{End 2}
WinActivate, root
Send +{Ins}{Enter}
SetTitleMatchMode, 2
WinActivate, Notepad

Let's break it apart.

!q:: - defines the hotkey as alt-q

Send {End 2}+{Home}^c{End 2}{Down}{End}
 - selects the current line of code and copies the stuff into clipboard. It does that by brute force manipulation of the pointer. Firstly it moves the pointer to the end of the current line, does a shift + home, which selects the text. ^c copies the text to the clipboard, and the last two commands put the pointer to the end of the next line of code. The reason for doing the End command twice is that the macro becomes confused if the word wrap is on.

WinActivate, root - the winactivate command selects any window by it's name. When I log onto my remote server the Putty window is named root, so basically this line just switches to the terminal.

Send +{Ins}{Enter} - pastes the line of code from the clipboard and executes the line

SetTitleMatchMode, 2 - this command regulates the pattern matching for the WinActivate command. 2 means that the pattern can be found anywhere in the name of the window. The default is 1 which means that pattern must be found at the beginning of the name of the window.

WinActivate, Notepad - switches us back to the Notepad 

return - ends the macro

For the code to work, you need to install the AutoHotkey, create a file that has the .ahk extension, copy the code into the file, and run it.

I hope this will save you some time!