HMMSplicer: a tool for efficient and sensitive discovery of known and novel splice junctions in RNA-Seq data.
High-throughput sequencing of an organism's transcriptome, or RNA-Seq, is a valuable and versatile new strategy for capturing snapshots of gene expression. However, transcriptome sequencing creates a new class of alignment problem: mapping short reads that span exon-exon junctions back to the r...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2010-11-01
|
Series: | PLoS ONE |
Online Access: | http://europepmc.org/articles/PMC2975632?pdf=render |
id |
doaj-b2d50651417146699cf253b6699514cb |
---|---|
record_format |
Article |
spelling |
doaj-b2d50651417146699cf253b6699514cb2020-11-24T21:35:12ZengPublic Library of Science (PLoS)PLoS ONE1932-62032010-11-01511e1387510.1371/journal.pone.0013875HMMSplicer: a tool for efficient and sensitive discovery of known and novel splice junctions in RNA-Seq data.Michelle T DimonKatherine SorberJoseph L DeRisiHigh-throughput sequencing of an organism's transcriptome, or RNA-Seq, is a valuable and versatile new strategy for capturing snapshots of gene expression. However, transcriptome sequencing creates a new class of alignment problem: mapping short reads that span exon-exon junctions back to the reference genome, especially in the case where a splice junction is previously unknown.Here we introduce HMMSplicer, an accurate and efficient algorithm for discovering canonical and non-canonical splice junctions in short read datasets. HMMSplicer identifies more splice junctions than currently available algorithms when tested on publicly available A. thaliana, P. falciparum, and H. sapiens datasets without a reduction in specificity.HMMSplicer was found to perform especially well in compact genomes and on genes with low expression levels, alternative splice isoforms, or non-canonical splice junctions. Because HHMSplicer does not rely on pre-built gene models, the products of inexact splicing are also detected. For H. sapiens, we find 3.6% of 3' splice sites and 1.4% of 5' splice sites are inexact, typically differing by 3 bases in either direction. In addition, HMMSplicer provides a score for every predicted junction allowing the user to set a threshold to tune false positive rates depending on the needs of the experiment. HMMSplicer is implemented in Python. Code and documentation are freely available at http://derisilab.ucsf.edu/software/hmmsplicer.http://europepmc.org/articles/PMC2975632?pdf=render |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Michelle T Dimon Katherine Sorber Joseph L DeRisi |
spellingShingle |
Michelle T Dimon Katherine Sorber Joseph L DeRisi HMMSplicer: a tool for efficient and sensitive discovery of known and novel splice junctions in RNA-Seq data. PLoS ONE |
author_facet |
Michelle T Dimon Katherine Sorber Joseph L DeRisi |
author_sort |
Michelle T Dimon |
title |
HMMSplicer: a tool for efficient and sensitive discovery of known and novel splice junctions in RNA-Seq data. |
title_short |
HMMSplicer: a tool for efficient and sensitive discovery of known and novel splice junctions in RNA-Seq data. |
title_full |
HMMSplicer: a tool for efficient and sensitive discovery of known and novel splice junctions in RNA-Seq data. |
title_fullStr |
HMMSplicer: a tool for efficient and sensitive discovery of known and novel splice junctions in RNA-Seq data. |
title_full_unstemmed |
HMMSplicer: a tool for efficient and sensitive discovery of known and novel splice junctions in RNA-Seq data. |
title_sort |
hmmsplicer: a tool for efficient and sensitive discovery of known and novel splice junctions in rna-seq data. |
publisher |
Public Library of Science (PLoS) |
series |
PLoS ONE |
issn |
1932-6203 |
publishDate |
2010-11-01 |
description |
High-throughput sequencing of an organism's transcriptome, or RNA-Seq, is a valuable and versatile new strategy for capturing snapshots of gene expression. However, transcriptome sequencing creates a new class of alignment problem: mapping short reads that span exon-exon junctions back to the reference genome, especially in the case where a splice junction is previously unknown.Here we introduce HMMSplicer, an accurate and efficient algorithm for discovering canonical and non-canonical splice junctions in short read datasets. HMMSplicer identifies more splice junctions than currently available algorithms when tested on publicly available A. thaliana, P. falciparum, and H. sapiens datasets without a reduction in specificity.HMMSplicer was found to perform especially well in compact genomes and on genes with low expression levels, alternative splice isoforms, or non-canonical splice junctions. Because HHMSplicer does not rely on pre-built gene models, the products of inexact splicing are also detected. For H. sapiens, we find 3.6% of 3' splice sites and 1.4% of 5' splice sites are inexact, typically differing by 3 bases in either direction. In addition, HMMSplicer provides a score for every predicted junction allowing the user to set a threshold to tune false positive rates depending on the needs of the experiment. HMMSplicer is implemented in Python. Code and documentation are freely available at http://derisilab.ucsf.edu/software/hmmsplicer. |
url |
http://europepmc.org/articles/PMC2975632?pdf=render |
work_keys_str_mv |
AT michelletdimon hmmspliceratoolforefficientandsensitivediscoveryofknownandnovelsplicejunctionsinrnaseqdata AT katherinesorber hmmspliceratoolforefficientandsensitivediscoveryofknownandnovelsplicejunctionsinrnaseqdata AT josephlderisi hmmspliceratoolforefficientandsensitivediscoveryofknownandnovelsplicejunctionsinrnaseqdata |
_version_ |
1725946053399674880 |