HMMSplicer: a tool for efficient and sensitive discovery of known and novel splice junctions in RNA-Seq data.

High-throughput sequencing of an organism's transcriptome, or RNA-Seq, is a valuable and versatile new strategy for capturing snapshots of gene expression. However, transcriptome sequencing creates a new class of alignment problem: mapping short reads that span exon-exon junctions back to the r...

Full description

Bibliographic Details
Main Authors: Michelle T Dimon, Katherine Sorber, Joseph L DeRisi
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2010-11-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC2975632?pdf=render
id doaj-b2d50651417146699cf253b6699514cb
record_format Article
spelling doaj-b2d50651417146699cf253b6699514cb2020-11-24T21:35:12ZengPublic Library of Science (PLoS)PLoS ONE1932-62032010-11-01511e1387510.1371/journal.pone.0013875HMMSplicer: a tool for efficient and sensitive discovery of known and novel splice junctions in RNA-Seq data.Michelle T DimonKatherine SorberJoseph L DeRisiHigh-throughput sequencing of an organism's transcriptome, or RNA-Seq, is a valuable and versatile new strategy for capturing snapshots of gene expression. However, transcriptome sequencing creates a new class of alignment problem: mapping short reads that span exon-exon junctions back to the reference genome, especially in the case where a splice junction is previously unknown.Here we introduce HMMSplicer, an accurate and efficient algorithm for discovering canonical and non-canonical splice junctions in short read datasets. HMMSplicer identifies more splice junctions than currently available algorithms when tested on publicly available A. thaliana, P. falciparum, and H. sapiens datasets without a reduction in specificity.HMMSplicer was found to perform especially well in compact genomes and on genes with low expression levels, alternative splice isoforms, or non-canonical splice junctions. Because HHMSplicer does not rely on pre-built gene models, the products of inexact splicing are also detected. For H. sapiens, we find 3.6% of 3' splice sites and 1.4% of 5' splice sites are inexact, typically differing by 3 bases in either direction. In addition, HMMSplicer provides a score for every predicted junction allowing the user to set a threshold to tune false positive rates depending on the needs of the experiment. HMMSplicer is implemented in Python. Code and documentation are freely available at http://derisilab.ucsf.edu/software/hmmsplicer.http://europepmc.org/articles/PMC2975632?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Michelle T Dimon
Katherine Sorber
Joseph L DeRisi
spellingShingle Michelle T Dimon
Katherine Sorber
Joseph L DeRisi
HMMSplicer: a tool for efficient and sensitive discovery of known and novel splice junctions in RNA-Seq data.
PLoS ONE
author_facet Michelle T Dimon
Katherine Sorber
Joseph L DeRisi
author_sort Michelle T Dimon
title HMMSplicer: a tool for efficient and sensitive discovery of known and novel splice junctions in RNA-Seq data.
title_short HMMSplicer: a tool for efficient and sensitive discovery of known and novel splice junctions in RNA-Seq data.
title_full HMMSplicer: a tool for efficient and sensitive discovery of known and novel splice junctions in RNA-Seq data.
title_fullStr HMMSplicer: a tool for efficient and sensitive discovery of known and novel splice junctions in RNA-Seq data.
title_full_unstemmed HMMSplicer: a tool for efficient and sensitive discovery of known and novel splice junctions in RNA-Seq data.
title_sort hmmsplicer: a tool for efficient and sensitive discovery of known and novel splice junctions in rna-seq data.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2010-11-01
description High-throughput sequencing of an organism's transcriptome, or RNA-Seq, is a valuable and versatile new strategy for capturing snapshots of gene expression. However, transcriptome sequencing creates a new class of alignment problem: mapping short reads that span exon-exon junctions back to the reference genome, especially in the case where a splice junction is previously unknown.Here we introduce HMMSplicer, an accurate and efficient algorithm for discovering canonical and non-canonical splice junctions in short read datasets. HMMSplicer identifies more splice junctions than currently available algorithms when tested on publicly available A. thaliana, P. falciparum, and H. sapiens datasets without a reduction in specificity.HMMSplicer was found to perform especially well in compact genomes and on genes with low expression levels, alternative splice isoforms, or non-canonical splice junctions. Because HHMSplicer does not rely on pre-built gene models, the products of inexact splicing are also detected. For H. sapiens, we find 3.6% of 3' splice sites and 1.4% of 5' splice sites are inexact, typically differing by 3 bases in either direction. In addition, HMMSplicer provides a score for every predicted junction allowing the user to set a threshold to tune false positive rates depending on the needs of the experiment. HMMSplicer is implemented in Python. Code and documentation are freely available at http://derisilab.ucsf.edu/software/hmmsplicer.
url http://europepmc.org/articles/PMC2975632?pdf=render
work_keys_str_mv AT michelletdimon hmmspliceratoolforefficientandsensitivediscoveryofknownandnovelsplicejunctionsinrnaseqdata
AT katherinesorber hmmspliceratoolforefficientandsensitivediscoveryofknownandnovelsplicejunctionsinrnaseqdata
AT josephlderisi hmmspliceratoolforefficientandsensitivediscoveryofknownandnovelsplicejunctionsinrnaseqdata
_version_ 1725946053399674880