Showing posts with label recipe. Show all posts
Showing posts with label recipe. Show all posts

Monday, November 22, 2010

Retrieving transcriptome sequences for RNASeq analysis

One approach for analyzing RNASeq data from an organism with a well-annotated genome, is to align the reads to mRNA (cDNA) sequences instead of the genome. To do that you need to extract the transcript sequences from a database. This is how to extract ensembl transcript sequences from UCSC from within R:
_________________________________________________

library(GenomicFeatures)
library(BSgenome.Hsapiens.UCSC.hg18)

tr <- makeTranscriptDbFromUCSC(genome="hg18", tablename="ensGene")
tr_seq <- extractTranscriptsFromGenome(Hsapiens, tr)
write.XStringSet(tr_seq, file="hg18.ensgene.transcripts.fasta", 'fasta', width=80, append=F)

_________________________________________________

Next steps can be to build a reference index for bowtie, perform the alignment, and count the number of reads aligned in R using table(). Differential expression analysis may be performed by DESeq.




Saturday, November 20, 2010

Allowing multi-hop ssh

If you are trying to reach a server only accessible through another server, you will need to use ssh twice. This might cause mild irritation. Luckily, there is a recipe that can make things easier.

Assuming we are trying to reach hostnameB through hostnameA, add the following lines (after you put appropriate values for hostnames) to your SSH configuration in ~/.ssh/config

Host hostnameA
ProxyCommand ssh hostnameB nc hostnameA 22


For this to work, netcat needs to be installed on hostnameB, but many new systems have it, so you may have that too. Now, if you type, "ssh hostnameB" automatically you will first ssh to hostnameA and then hostnameB.

Stay logged in the server with ssh

Add the following lines to the ~/.ssh/config file and you will stay logged in the server. If you are trying to connect to your work machines from home, this might be a useful trick.

Host *
ServerAliveInterval 120
ServerAliveCountMax 3

Friday, November 19, 2010

Command-line preprocessing of short-read data

FASTX toolkit provides command line tools for processing FASTA and FASTQ files. For example, you can filter sequences based on their quality, you can trim low-quality sequence at the ends of the sequence and you can trim adapters, in addition to many other functions.

check out: http://hannonlab.cshl.edu/fastx_toolkit/

Running R on remote computer via local emacs

Aquamacs in Mac OS X and Emacs in Linux/unix can be used to edit remote (and local) R code and submit pieces of code to a remote R session. For this to work you need to install ess for emacs (Aquamacs comes with ess by default now, I don't know about emacs). If you don't have ess, you can install following this recipe.

open an R script using emacs, you can open remote files using TRAMP. You need to do C-x C-f and enter the remote location as ". After you opened your remote R script, now you have to start a remote R session. First, we need to ssh to remote host from Emacs/Aquamacs using the following commands.

M-x ssh #after this you need to enter your username, remote host and you will be prompted your password

R #start R
M-x ess-remote #after starting R, you need to start and ess-remote session,
so your R script and R session is connected.

Now you can edit your remote script and also send code to your remote R session from
your script. If you can't ssh from Emacs/Aquamacs you will need to install ssh.el. 
  
First prepare a .emacs file, put it in your home folder. Also, make .emacs.d folder 
your home directory. You will put the  .el files in this directory.
 
------contents of .emacs  file -----------

(setq load-path

      (cons "/home/username/.emacs.d" load-path))


(require 'tramp)

(require 'ssh)

(require 'password-cache)

(setq password-cache-expiry nil)

------- end of .emacs file -----------------------

in .emacs.d directory, put the following files

http://www.splode.com/~friedman/software/emacs-lisp/src/ssh.el

or

ftp://ftp.splode.com/pub/users/friedman/emacs-lisp/ssh.el



and this one you need so TRAMP doesn't ask password every time you save a
file

http://braeburn.aquamacs.org/code/master/lisp/password-cache.el

EDIT:
With the recent versions of Aquamacs (Emacs on OS X) and possibly Emacs, I can do M-x shell, to get on to the shell and ssh to the remote server. Then, I use TRAMP via C-x C-f /username@remote.host:/home/username/example.R to open the remote file in another tab. Then, finally use,M-x ess-remote. and everything works as expected. I can pass the code in the R file to R session. 

Getting nth line in a file

Ever wondered what is the nth line of a file without using a text editor ?
Here is something you may use in a unix environment.

sed -n '5 p' file1.txt

this sed one-liner outputs the 5th line of the file1.txt