Monday, November 22, 2010

Retrieving transcriptome sequences for RNASeq analysis

One approach for analyzing RNASeq data from an organism with a well-annotated genome, is to align the reads to mRNA (cDNA) sequences instead of the genome. To do that you need to extract the transcript sequences from a database. This is how to extract ensembl transcript sequences from UCSC from within R:
_________________________________________________

library(GenomicFeatures)
library(BSgenome.Hsapiens.UCSC.hg18)

tr <- makeTranscriptDbFromUCSC(genome="hg18", tablename="ensGene")
tr_seq <- extractTranscriptsFromGenome(Hsapiens, tr)
write.XStringSet(tr_seq, file="hg18.ensgene.transcripts.fasta", 'fasta', width=80, append=F)

_________________________________________________

Next steps can be to build a reference index for bowtie, perform the alignment, and count the number of reads aligned in R using table(). Differential expression analysis may be performed by DESeq.




1 comment:

  1. There are four types of the RNA and you have to study all types of the RNA. The information that you share in this article is so good and you give the complete process about RNA. Overall, I need my research paper with this information and thank you for giving the complete detail of this topic.

    ReplyDelete