Hon-yaku: a biology-driven Bayesian methodology for identifying translation initiation sites in prokaryotes

<p>Abstract</p> <p>Background</p> <p>Computational prediction methods are currently used to identify genes in prokaryote genomes. However, identification of the correct translation initiation sites remains a difficult task. Accurate translation initiation sites (TISs) a...

Full description

Bibliographic Details
Main Authors: de Hoon Michiel JL, Makita Yuko, Danchin Antoine
Format: Article
Language:English
Published: BMC 2007-02-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/8/47
id doaj-1b3b075b3d4e47ad825a673765b87574
record_format Article
spelling doaj-1b3b075b3d4e47ad825a673765b875742020-11-25T01:04:44ZengBMCBMC Bioinformatics1471-21052007-02-01814710.1186/1471-2105-8-47Hon-yaku: a biology-driven Bayesian methodology for identifying translation initiation sites in prokaryotesde Hoon Michiel JLMakita YukoDanchin Antoine<p>Abstract</p> <p>Background</p> <p>Computational prediction methods are currently used to identify genes in prokaryote genomes. However, identification of the correct translation initiation sites remains a difficult task. Accurate translation initiation sites (TISs) are important not only for the annotation of unknown proteins but also for the prediction of operons, promoters, and small non-coding RNA genes, as this typically makes use of the intergenic distance. A further problem is that most existing methods are optimized for <it>Escherichia coli </it>data sets; applying these methods to newly sequenced bacterial genomes may not result in an equivalent level of accuracy.</p> <p>Results</p> <p>Based on a biological representation of the translation process, we applied Bayesian statistics to create a score function for predicting translation initiation sites. In contrast to existing programs, our combination of methods uses supervised learning to optimally use the set of known translation initiation sites. We combined the Ribosome Binding Site (RBS) sequence, the distance between the translation initiation site and the RBS sequence, the base composition of the start codon, the nucleotide composition (A-rich sequences) following start codons, and the expected distribution of the protein length in a Bayesian scoring function. To further increase the prediction accuracy, we also took into account the operon orientation. The outcome of the procedure achieved a prediction accuracy of 93.2% in 858 <it>E. coli </it>genes from the EcoGene data set and 92.7% accuracy in a data set of 1243 <it>Bacillus subtilis </it>'non-y' genes. We confirmed the performance in the GC-rich Gamma-Proteobacteria <it>Herminiimonas arsenicoxydans, Pseudomonas aeruginosa</it>, and <it>Burkholderia pseudomallei </it>K96243.</p> <p>Conclusion</p> <p>Hon-yaku, being based on a careful choice of elements important in translation, improved the prediction accuracy in <it>B. subtilis </it>data sets and other bacteria except for <it>E. coli</it>. We believe that most remaining mispredictions are due to atypical ribosomal binding sequences used in specific translation control processes, or likely errors in the training data sets.</p> http://www.biomedcentral.com/1471-2105/8/47
collection DOAJ
language English
format Article
sources DOAJ
author de Hoon Michiel JL
Makita Yuko
Danchin Antoine
spellingShingle de Hoon Michiel JL
Makita Yuko
Danchin Antoine
Hon-yaku: a biology-driven Bayesian methodology for identifying translation initiation sites in prokaryotes
BMC Bioinformatics
author_facet de Hoon Michiel JL
Makita Yuko
Danchin Antoine
author_sort de Hoon Michiel JL
title Hon-yaku: a biology-driven Bayesian methodology for identifying translation initiation sites in prokaryotes
title_short Hon-yaku: a biology-driven Bayesian methodology for identifying translation initiation sites in prokaryotes
title_full Hon-yaku: a biology-driven Bayesian methodology for identifying translation initiation sites in prokaryotes
title_fullStr Hon-yaku: a biology-driven Bayesian methodology for identifying translation initiation sites in prokaryotes
title_full_unstemmed Hon-yaku: a biology-driven Bayesian methodology for identifying translation initiation sites in prokaryotes
title_sort hon-yaku: a biology-driven bayesian methodology for identifying translation initiation sites in prokaryotes
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2007-02-01
description <p>Abstract</p> <p>Background</p> <p>Computational prediction methods are currently used to identify genes in prokaryote genomes. However, identification of the correct translation initiation sites remains a difficult task. Accurate translation initiation sites (TISs) are important not only for the annotation of unknown proteins but also for the prediction of operons, promoters, and small non-coding RNA genes, as this typically makes use of the intergenic distance. A further problem is that most existing methods are optimized for <it>Escherichia coli </it>data sets; applying these methods to newly sequenced bacterial genomes may not result in an equivalent level of accuracy.</p> <p>Results</p> <p>Based on a biological representation of the translation process, we applied Bayesian statistics to create a score function for predicting translation initiation sites. In contrast to existing programs, our combination of methods uses supervised learning to optimally use the set of known translation initiation sites. We combined the Ribosome Binding Site (RBS) sequence, the distance between the translation initiation site and the RBS sequence, the base composition of the start codon, the nucleotide composition (A-rich sequences) following start codons, and the expected distribution of the protein length in a Bayesian scoring function. To further increase the prediction accuracy, we also took into account the operon orientation. The outcome of the procedure achieved a prediction accuracy of 93.2% in 858 <it>E. coli </it>genes from the EcoGene data set and 92.7% accuracy in a data set of 1243 <it>Bacillus subtilis </it>'non-y' genes. We confirmed the performance in the GC-rich Gamma-Proteobacteria <it>Herminiimonas arsenicoxydans, Pseudomonas aeruginosa</it>, and <it>Burkholderia pseudomallei </it>K96243.</p> <p>Conclusion</p> <p>Hon-yaku, being based on a careful choice of elements important in translation, improved the prediction accuracy in <it>B. subtilis </it>data sets and other bacteria except for <it>E. coli</it>. We believe that most remaining mispredictions are due to atypical ribosomal binding sequences used in specific translation control processes, or likely errors in the training data sets.</p>
url http://www.biomedcentral.com/1471-2105/8/47
work_keys_str_mv AT dehoonmichieljl honyakuabiologydrivenbayesianmethodologyforidentifyingtranslationinitiationsitesinprokaryotes
AT makitayuko honyakuabiologydrivenbayesianmethodologyforidentifyingtranslationinitiationsitesinprokaryotes
AT danchinantoine honyakuabiologydrivenbayesianmethodologyforidentifyingtranslationinitiationsitesinprokaryotes
_version_ 1725196351040389120