Signal processing for DNA sequencing

Thesis (M.Eng. and S.B.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2002. === Includes bibliographical references (p. 83-86). === DNA sequencing is the process of determining the sequence of chemical bases in a particular DNA molecule-nature's...

Full description

Bibliographic Details
Main Author: Boufounos, Petros T., 1977-
Other Authors: Alan V. Oppenheim.
Format: Others
Language:English
Published: Massachusetts Institute of Technology 2005
Subjects:
Online Access:http://hdl.handle.net/1721.1/17536
Description
Summary:Thesis (M.Eng. and S.B.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2002. === Includes bibliographical references (p. 83-86). === DNA sequencing is the process of determining the sequence of chemical bases in a particular DNA molecule-nature's blueprint of how life works. The advancement of biological science in has created a vast demand for sequencing methods, which needs to be addressed by automated equipment. This thesis tries to address one part of that process, known as base calling: it is the conversion of the electrical signal-the electropherogram--collected by the sequencing equipment to a sequence of letters drawn from ( A,TC,G ) that corresponds to the sequence in the molecule sequenced. This work formulates the problem as a pattern recognition problem, and observes its striking resemblance to the speech recognition problem. We, therefore, propose combining Hidden Markov Models and Artificial Neural Networks to solve it. In the formulation we derive an algorithm for training both models together. Furthermore, we devise a method to create very accurate training data, requiring minimal hand-labeling. We compare our method with the de facto standard, PHRED, and produce comparable results. Finally, we propose alternative HMM topologies that have the potential to significantly improve the performance of the method. === by Petros T. Boufounos. === M.Eng.and S.B.