Monday, November 22, 2010

Retrieving transcriptome sequences for RNASeq analysis

One approach for analyzing RNASeq data from an organism with a well-annotated genome, is to align the reads to mRNA (cDNA) sequences instead of the genome. To do that you need to extract the transcript sequences from a database. This is how to extract ensembl transcript sequences from UCSC from within R:
_________________________________________________

library(GenomicFeatures)
library(BSgenome.Hsapiens.UCSC.hg18)

tr <- makeTranscriptDbFromUCSC(genome="hg18", tablename="ensGene")
tr_seq <- extractTranscriptsFromGenome(Hsapiens, tr)
write.XStringSet(tr_seq, file="hg18.ensgene.transcripts.fasta", 'fasta', width=80, append=F)

_________________________________________________

Next steps can be to build a reference index for bowtie, perform the alignment, and count the number of reads aligned in R using table(). Differential expression analysis may be performed by DESeq.




2 comments:

  1. There are four types of the RNA and you have to study all types of the RNA. The information that you share in this article is so good and you give the complete process about RNA. Overall, I need my research paper with this information and thank you for giving the complete detail of this topic.

    ReplyDelete
  2. The most essential thing that one must know is the thing that sort of a content he or she is searching for and afterward begins the scan for one that has those highlights that one is searching for. It is basic not to change the necessity once you choose for one. screenplay analysis

    ReplyDelete