EMD: an ensemble algorithm for discovering regulatory motifs in DNA sequences

<p>Abstract</p> <p>Background</p> <p>Understanding gene regulatory networks has become one of the central research problems in bioinformatics. More than thirty algorithms have been proposed to identify DNA regulatory sites during the past thirty years. However, the pred...

Full description

Bibliographic Details
Main Authors: Kihara Daisuke, Yang Yifeng D, Hu Jianjun
Format: Article
Language:English
Published: BMC 2006-07-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/7/342
id doaj-e43149765b74482781e31f50b50b3d59
record_format Article
spelling doaj-e43149765b74482781e31f50b50b3d592020-11-25T00:15:21ZengBMCBMC Bioinformatics1471-21052006-07-017134210.1186/1471-2105-7-342EMD: an ensemble algorithm for discovering regulatory motifs in DNA sequencesKihara DaisukeYang Yifeng DHu Jianjun<p>Abstract</p> <p>Background</p> <p>Understanding gene regulatory networks has become one of the central research problems in bioinformatics. More than thirty algorithms have been proposed to identify DNA regulatory sites during the past thirty years. However, the prediction accuracy of these algorithms is still quite low. Ensemble algorithms have emerged as an effective strategy in bioinformatics for improving the prediction accuracy by exploiting the synergetic prediction capability of multiple algorithms.</p> <p>Results</p> <p>We proposed a novel clustering-based ensemble algorithm named EMD for <it>de novo </it>motif discovery by combining multiple predictions from multiple runs of one or more base component algorithms. The ensemble approach is applied to the motif discovery problem for the first time. The algorithm is tested on a benchmark dataset generated from <it>E. coli </it>RegulonDB. The EMD algorithm has achieved 22.4% improvement in terms of the nucleotide level prediction accuracy over the best stand-alone component algorithm. The advantage of the EMD algorithm is more significant for shorter input sequences, but most importantly, it always outperforms or at least stays at the same performance level of the stand-alone component algorithms even for longer sequences.</p> <p>Conclusion</p> <p>We proposed an ensemble approach for the motif discovery problem by taking advantage of the availability of a large number of motif discovery programs. We have shown that the ensemble approach is an effective strategy for improving both sensitivity and specificity, thus the accuracy of the prediction. The advantage of the EMD algorithm is its flexibility in the sense that a new powerful algorithm can be easily added to the system.</p> http://www.biomedcentral.com/1471-2105/7/342
collection DOAJ
language English
format Article
sources DOAJ
author Kihara Daisuke
Yang Yifeng D
Hu Jianjun
spellingShingle Kihara Daisuke
Yang Yifeng D
Hu Jianjun
EMD: an ensemble algorithm for discovering regulatory motifs in DNA sequences
BMC Bioinformatics
author_facet Kihara Daisuke
Yang Yifeng D
Hu Jianjun
author_sort Kihara Daisuke
title EMD: an ensemble algorithm for discovering regulatory motifs in DNA sequences
title_short EMD: an ensemble algorithm for discovering regulatory motifs in DNA sequences
title_full EMD: an ensemble algorithm for discovering regulatory motifs in DNA sequences
title_fullStr EMD: an ensemble algorithm for discovering regulatory motifs in DNA sequences
title_full_unstemmed EMD: an ensemble algorithm for discovering regulatory motifs in DNA sequences
title_sort emd: an ensemble algorithm for discovering regulatory motifs in dna sequences
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2006-07-01
description <p>Abstract</p> <p>Background</p> <p>Understanding gene regulatory networks has become one of the central research problems in bioinformatics. More than thirty algorithms have been proposed to identify DNA regulatory sites during the past thirty years. However, the prediction accuracy of these algorithms is still quite low. Ensemble algorithms have emerged as an effective strategy in bioinformatics for improving the prediction accuracy by exploiting the synergetic prediction capability of multiple algorithms.</p> <p>Results</p> <p>We proposed a novel clustering-based ensemble algorithm named EMD for <it>de novo </it>motif discovery by combining multiple predictions from multiple runs of one or more base component algorithms. The ensemble approach is applied to the motif discovery problem for the first time. The algorithm is tested on a benchmark dataset generated from <it>E. coli </it>RegulonDB. The EMD algorithm has achieved 22.4% improvement in terms of the nucleotide level prediction accuracy over the best stand-alone component algorithm. The advantage of the EMD algorithm is more significant for shorter input sequences, but most importantly, it always outperforms or at least stays at the same performance level of the stand-alone component algorithms even for longer sequences.</p> <p>Conclusion</p> <p>We proposed an ensemble approach for the motif discovery problem by taking advantage of the availability of a large number of motif discovery programs. We have shown that the ensemble approach is an effective strategy for improving both sensitivity and specificity, thus the accuracy of the prediction. The advantage of the EMD algorithm is its flexibility in the sense that a new powerful algorithm can be easily added to the system.</p>
url http://www.biomedcentral.com/1471-2105/7/342
work_keys_str_mv AT kiharadaisuke emdanensemblealgorithmfordiscoveringregulatorymotifsindnasequences
AT yangyifengd emdanensemblealgorithmfordiscoveringregulatorymotifsindnasequences
AT hujianjun emdanensemblealgorithmfordiscoveringregulatorymotifsindnasequences
_version_ 1725387355753283584