Monday, February 21, 2011

Access all UCSC wiggle tracks from R and your terminal

EDIT: Current version (13 November 2013) of the rtracklayer lets you import bigWig files over predefined regions, which is very convenient way to access most UCSC tracks that come in Wig format. All you need to do is to download the bigWig version of the tracks (if only in wiggle Format you can convert them to bigWig using UCSC command line tools) and use the import() function from rtracklayer.

rtracklayer package allows you to access most of the UCSC wiggle tracks from R. However, there is another way which might more practical in situations where you need to summarize the wig track scores over a given set of genomic coordinates. Although you can get a similar information from rtracklayer, you will need to do the summary statistics for each genomic coordinate and you will need to loop over a RangedList object, which may not be practical if you have a large set of genomic coordinates.

The other way to do this is to use hgWiggle program from UCSC source code. You can download the program from here, and you will need to prepare a three line ".hg.conf" file in your home directory.
.hg.conf should contain the exact following lines:

db.host=genome-mysql.cse.ucsc.edu
db.user=genomep
db.password=password


You should download the related .wib file from UCSC.  For human hg18 assembly they are located at http://hgdownload.cse.ucsc.edu/gbdb/hg18/ . Then do "chmod 600 .hg.conf " on the file. Now we are ready to use hgWiggle. See this wiki page for all the other details.

As an example, you can get the statistics for chr16 for phastCons scores as shown below.

hgWiggle -db=hg18 -chr=chr16 -doStats phastCons44wayPlacental

you can also give a bed file with genomic coordinates using -bedFile  option, then you will get phastCons summary for the regions in the bed file.

It is also pretty straight forward to call the hgWiggle from R using system() function, so you can include this functionality in your R code. Although I intended to do that here when I started the post, I thought that it is trivial to use the system() command and call any command line tool, so I won't be doing that for now. I might add it later on when I have more time.

No comments:

Post a Comment