Monday, June 6, 2011

SeqSaw: The Bloodhound of Splice Sites

SeqSaw: The bloodhound of splice sites

            More than 90% of multi-exon human genes can be alternatively spliced. Genome-wide approaches to find alternative splice sites such as analysis of expressed sequence tags (EST’s) is labor intensive and only give low resolution information. Others use high-throughput exon/junction arrays, which can detect novel splice events, but they cannot detect events of un-annotated exons. using next-generation RNA-Sequencing and cDNA libraries seems to be a more favored way to obtain data these days. 
SeqSaw is a new tool to detect splice junctions. The way that SeqSaw works is that it first splits sequence reads into shorter segments with overlap, next the segments are mapped to a reference genome allowing for gapped alignments, then the mapping results of the short segments from the same read are assembled to generate valid alignments for full-length reads. Finally based on the mapping results, SeqSaw can predict junctions according to a series of optional filters.
            SeqSaw was compared to two other methods of finding splice sites. The first Tophat is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using a read aligner called Bowtie and analyzes mapping results to identify splice junctions between exons. The second MapSplice is an algorithm, which maps RNA-Seq data to a reference genome and looks for splice junctions.  SeqSaw proved to be the best tool to find novel splice sites. In the results table, it out preformed the other two tools. SeqSaw found 74.21%, MapSplice found 61.04% and Tophat found 68.32%. SeqSaw also achieved the highest validation rate   also found more splice sites than were originally labeled as “known sites”. One graph labeled D showed a very impressive find, there were no known junctions and SeqSaw was able to find about 6500 sites.
            The best part about SeqSaw is what it means for the future. Although the human genome has been fully sequenced, it has yet to be seen where all our genes are mapped to, what splices they use and how the proteins and mRNA are regulated.

Observations on novel splice junctions from RNA sequencing data.