Probabilistic Graphical Models and Algorithms for

In this thesis I present research in two fields: machine learning and computational biology. First, I develop new machine learning methods for graphical models that can be applied to protein problems. Then I apply graphical model algorithms to protein problems, obtaining improvements in protein st...

Full description

Bibliographic Details
Main Author: Jiao, Feng
Language:en
Published: 2008
Subjects:
Online Access:http://hdl.handle.net/10012/3773
id ndltd-WATERLOO-oai-uwspace.uwaterloo.ca-10012-3773
record_format oai_dc
spelling ndltd-WATERLOO-oai-uwspace.uwaterloo.ca-10012-37732013-01-08T18:51:16ZJiao, Feng2008-05-26T16:24:08Z2008-05-26T16:24:08Z2008-05-26T16:24:08Z2008http://hdl.handle.net/10012/3773In this thesis I present research in two fields: machine learning and computational biology. First, I develop new machine learning methods for graphical models that can be applied to protein problems. Then I apply graphical model algorithms to protein problems, obtaining improvements in protein structure prediction and protein structure alignment. First,in the machine learning work, I focus on a special kind of graphical model---conditional random fields (CRFs). Here, I present a new semi-supervised training procedure for CRFs that can be used to train sequence segmentors and labellers from a combination of labeled and unlabeled training data. Such learning algorithms can be applied to protein and gene name entity recognition problems. This work provides one of the first semi-supervised discriminative training methods for structured classification. Second, in my computational biology work, I focus mainly on protein problems. In particular, I first propose a tree decomposition method for solving the protein structure prediction and protein structure alignment problems. In so doing, I reveal why tree decomposition is a good method for many protein problems. Then, I propose a computational framework for detection of similar structures of a target protein with sparse NMR data, which can help to predict protein structure using experimental data. Finally, I propose a new machine learning approach---LS_Boost---to solve the protein fold recognition problem, which is one of the key steps in protein structure prediction. After a thorough comparison, the algorithm is proved to be both more accurate and more efficient than traditional z-Score method and other machine learning methods.enmachine learningcomputational biologyProbabilistic Graphical Models and Algorithms forThesis or DissertationSchool of Computer ScienceDoctor of PhilosophyComputer Science
collection NDLTD
language en
sources NDLTD
topic machine learning
computational biology
Computer Science
spellingShingle machine learning
computational biology
Computer Science
Jiao, Feng
Probabilistic Graphical Models and Algorithms for
description In this thesis I present research in two fields: machine learning and computational biology. First, I develop new machine learning methods for graphical models that can be applied to protein problems. Then I apply graphical model algorithms to protein problems, obtaining improvements in protein structure prediction and protein structure alignment. First,in the machine learning work, I focus on a special kind of graphical model---conditional random fields (CRFs). Here, I present a new semi-supervised training procedure for CRFs that can be used to train sequence segmentors and labellers from a combination of labeled and unlabeled training data. Such learning algorithms can be applied to protein and gene name entity recognition problems. This work provides one of the first semi-supervised discriminative training methods for structured classification. Second, in my computational biology work, I focus mainly on protein problems. In particular, I first propose a tree decomposition method for solving the protein structure prediction and protein structure alignment problems. In so doing, I reveal why tree decomposition is a good method for many protein problems. Then, I propose a computational framework for detection of similar structures of a target protein with sparse NMR data, which can help to predict protein structure using experimental data. Finally, I propose a new machine learning approach---LS_Boost---to solve the protein fold recognition problem, which is one of the key steps in protein structure prediction. After a thorough comparison, the algorithm is proved to be both more accurate and more efficient than traditional z-Score method and other machine learning methods.
author Jiao, Feng
author_facet Jiao, Feng
author_sort Jiao, Feng
title Probabilistic Graphical Models and Algorithms for
title_short Probabilistic Graphical Models and Algorithms for
title_full Probabilistic Graphical Models and Algorithms for
title_fullStr Probabilistic Graphical Models and Algorithms for
title_full_unstemmed Probabilistic Graphical Models and Algorithms for
title_sort probabilistic graphical models and algorithms for
publishDate 2008
url http://hdl.handle.net/10012/3773
work_keys_str_mv AT jiaofeng probabilisticgraphicalmodelsandalgorithmsfor
_version_ 1716573138631786496