Applying Support Vector Machines for Gene ontology based gene function prediction

<p>Abstract</p> <p>Background</p> <p>The current progress in sequencing projects calls for rapid, reliable and accurate function assignments of gene products. A variety of methods has been designed to annotate sequences on a large scale. However, these methods can eithe...

Full description

Bibliographic Details
Main Authors: Eils Roland, Schubert Falk, Moormann Jutta, König Rainer, Vinayagam Arunachalam, Glatting Karl-Heinz, Suhai Sándor
Format: Article
Language:English
Published: BMC 2004-08-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/5/116
id doaj-722bf99803d14df38b7b120e719fc4f4
record_format Article
spelling doaj-722bf99803d14df38b7b120e719fc4f42020-11-25T00:25:00ZengBMCBMC Bioinformatics1471-21052004-08-015111610.1186/1471-2105-5-116Applying Support Vector Machines for Gene ontology based gene function predictionEils RolandSchubert FalkMoormann JuttaKönig RainerVinayagam ArunachalamGlatting Karl-HeinzSuhai Sándor<p>Abstract</p> <p>Background</p> <p>The current progress in sequencing projects calls for rapid, reliable and accurate function assignments of gene products. A variety of methods has been designed to annotate sequences on a large scale. However, these methods can either only be applied for specific subsets, or their results are not formalised, or they do not provide precise confidence estimates for their predictions.</p> <p>Results</p> <p>We have developed a large-scale annotation system that tackles all of these shortcomings. In our approach, annotation was provided through Gene Ontology terms by applying multiple Support Vector Machines (SVM) for the classification of correct and false predictions. The general performance of the system was benchmarked with a large dataset. An organism-wise cross-validation was performed to define confidence estimates, resulting in an average precision of 80% for 74% of all test sequences. The validation results show that the prediction performance was organism-independent and could reproduce the annotation of other automated systems as well as high-quality manual annotations. We applied our trained classification system to <it>Xenopus laevis </it>sequences, yielding functional annotation for more than half of the known expressed genome. Compared to the currently available annotation, we provided more than twice the number of contigs with good quality annotation, and additionally we assigned a confidence value to each predicted GO term.</p> <p>Conclusions</p> <p>We present a complete automated annotation system that overcomes many of the usual problems by applying a controlled vocabulary of Gene Ontology and an established classification method on large and well-described sequence data sets. In a case study, the function for <it>Xenopus laevis </it>contig sequences was predicted and the results are publicly available at <url>ftp://genome.dkfz-heidelberg.de/pub/agd/gene_association.agd_Xenopus</url>.</p> http://www.biomedcentral.com/1471-2105/5/116
collection DOAJ
language English
format Article
sources DOAJ
author Eils Roland
Schubert Falk
Moormann Jutta
König Rainer
Vinayagam Arunachalam
Glatting Karl-Heinz
Suhai Sándor
spellingShingle Eils Roland
Schubert Falk
Moormann Jutta
König Rainer
Vinayagam Arunachalam
Glatting Karl-Heinz
Suhai Sándor
Applying Support Vector Machines for Gene ontology based gene function prediction
BMC Bioinformatics
author_facet Eils Roland
Schubert Falk
Moormann Jutta
König Rainer
Vinayagam Arunachalam
Glatting Karl-Heinz
Suhai Sándor
author_sort Eils Roland
title Applying Support Vector Machines for Gene ontology based gene function prediction
title_short Applying Support Vector Machines for Gene ontology based gene function prediction
title_full Applying Support Vector Machines for Gene ontology based gene function prediction
title_fullStr Applying Support Vector Machines for Gene ontology based gene function prediction
title_full_unstemmed Applying Support Vector Machines for Gene ontology based gene function prediction
title_sort applying support vector machines for gene ontology based gene function prediction
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2004-08-01
description <p>Abstract</p> <p>Background</p> <p>The current progress in sequencing projects calls for rapid, reliable and accurate function assignments of gene products. A variety of methods has been designed to annotate sequences on a large scale. However, these methods can either only be applied for specific subsets, or their results are not formalised, or they do not provide precise confidence estimates for their predictions.</p> <p>Results</p> <p>We have developed a large-scale annotation system that tackles all of these shortcomings. In our approach, annotation was provided through Gene Ontology terms by applying multiple Support Vector Machines (SVM) for the classification of correct and false predictions. The general performance of the system was benchmarked with a large dataset. An organism-wise cross-validation was performed to define confidence estimates, resulting in an average precision of 80% for 74% of all test sequences. The validation results show that the prediction performance was organism-independent and could reproduce the annotation of other automated systems as well as high-quality manual annotations. We applied our trained classification system to <it>Xenopus laevis </it>sequences, yielding functional annotation for more than half of the known expressed genome. Compared to the currently available annotation, we provided more than twice the number of contigs with good quality annotation, and additionally we assigned a confidence value to each predicted GO term.</p> <p>Conclusions</p> <p>We present a complete automated annotation system that overcomes many of the usual problems by applying a controlled vocabulary of Gene Ontology and an established classification method on large and well-described sequence data sets. In a case study, the function for <it>Xenopus laevis </it>contig sequences was predicted and the results are publicly available at <url>ftp://genome.dkfz-heidelberg.de/pub/agd/gene_association.agd_Xenopus</url>.</p>
url http://www.biomedcentral.com/1471-2105/5/116
work_keys_str_mv AT eilsroland applyingsupportvectormachinesforgeneontologybasedgenefunctionprediction
AT schubertfalk applyingsupportvectormachinesforgeneontologybasedgenefunctionprediction
AT moormannjutta applyingsupportvectormachinesforgeneontologybasedgenefunctionprediction
AT konigrainer applyingsupportvectormachinesforgeneontologybasedgenefunctionprediction
AT vinayagamarunachalam applyingsupportvectormachinesforgeneontologybasedgenefunctionprediction
AT glattingkarlheinz applyingsupportvectormachinesforgeneontologybasedgenefunctionprediction
AT suhaisandor applyingsupportvectormachinesforgeneontologybasedgenefunctionprediction
_version_ 1725350317353074688