Gene prediction using a configurable system for the integration of data by dynamic programming

A new approach to the computational identification of protein-coding gene structures in genomic DNA sequence is described. It overcomes rigidities inherent in most existing gene prediction methods, for example those based on Hidden Markov Models (HMMs), by supporting a flexible computational model o...

Full description

Bibliographic Details
Main Author: Howe, K.
Published: University of Cambridge 2003
Subjects:
Online Access:http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.604280
id ndltd-bl.uk-oai-ethos.bl.uk-604280
record_format oai_dc
spelling ndltd-bl.uk-oai-ethos.bl.uk-6042802015-03-20T05:51:05ZGene prediction using a configurable system for the integration of data by dynamic programmingHowe, K.2003A new approach to the computational identification of protein-coding gene structures in genomic DNA sequence is described. It overcomes rigidities inherent in most existing gene prediction methods, for example those based on Hidden Markov Models (HMMs), by supporting a flexible computational model of how sequence signal signals fit together into complete gene structures. The primary result of the work is a gene prediction tool for the assembly of evidence for individual gene components (features) into predictions of complete gene structures. The system is completely configurable in that both the features themselves, and the model of gene structure against which candidate assemblies are validated and scored, are external to the system and supplied by the user. The gene prediction process is therefore tied neither to any specific techniques for the recognition of gene prediction signals, nor any specific underlying model of gene structure. The methodology is implemented in a piece of software called “GAZE” which uses a dynamic programming algorithm to obtain the highest scoring gene structure consistent with the user-supplied features and gene-structure model, and also posterior probabilities that each feature is part of a gene. The algorithm employs a novel pruning strategy, ensuring that it has a runtime effectively linear in the length of the sequence without compromising accuracy. The effectiveness of the strategy is explored by applying it to the prediction of gene structures in sequences of the nematode worm <i>C. elegans. </i> GAZE allows the integration of gene prediction data from multiple, arbitrary sources. It is important for the accuracy of the system that the various pieces of evidence are weighted appropriately with respect to each other. A novel strategy for the automatic determination of optimal values for these weights is described.572.8University of Cambridgehttp://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.604280Electronic Thesis or Dissertation
collection NDLTD
sources NDLTD
topic 572.8
spellingShingle 572.8
Howe, K.
Gene prediction using a configurable system for the integration of data by dynamic programming
description A new approach to the computational identification of protein-coding gene structures in genomic DNA sequence is described. It overcomes rigidities inherent in most existing gene prediction methods, for example those based on Hidden Markov Models (HMMs), by supporting a flexible computational model of how sequence signal signals fit together into complete gene structures. The primary result of the work is a gene prediction tool for the assembly of evidence for individual gene components (features) into predictions of complete gene structures. The system is completely configurable in that both the features themselves, and the model of gene structure against which candidate assemblies are validated and scored, are external to the system and supplied by the user. The gene prediction process is therefore tied neither to any specific techniques for the recognition of gene prediction signals, nor any specific underlying model of gene structure. The methodology is implemented in a piece of software called “GAZE” which uses a dynamic programming algorithm to obtain the highest scoring gene structure consistent with the user-supplied features and gene-structure model, and also posterior probabilities that each feature is part of a gene. The algorithm employs a novel pruning strategy, ensuring that it has a runtime effectively linear in the length of the sequence without compromising accuracy. The effectiveness of the strategy is explored by applying it to the prediction of gene structures in sequences of the nematode worm <i>C. elegans. </i> GAZE allows the integration of gene prediction data from multiple, arbitrary sources. It is important for the accuracy of the system that the various pieces of evidence are weighted appropriately with respect to each other. A novel strategy for the automatic determination of optimal values for these weights is described.
author Howe, K.
author_facet Howe, K.
author_sort Howe, K.
title Gene prediction using a configurable system for the integration of data by dynamic programming
title_short Gene prediction using a configurable system for the integration of data by dynamic programming
title_full Gene prediction using a configurable system for the integration of data by dynamic programming
title_fullStr Gene prediction using a configurable system for the integration of data by dynamic programming
title_full_unstemmed Gene prediction using a configurable system for the integration of data by dynamic programming
title_sort gene prediction using a configurable system for the integration of data by dynamic programming
publisher University of Cambridge
publishDate 2003
url http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.604280
work_keys_str_mv AT howek genepredictionusingaconfigurablesystemfortheintegrationofdatabydynamicprogramming
_version_ 1716794632511160320