Using Peak Intensity and Fragmentation Patterns in Peptide SeQuence IDentification (SQID) - A Bayesian Learning Algorithm for Tandem Mass Spectra

As DNA sequence information becomes increasingly available, researchers are now tackling the great challenge of characterizing and identifying peptides and proteins from complex mixtures. Automatic database searching algorithms have been developed to meet this challenge. This dissertation is aimed a...

Full description

Bibliographic Details
Main Author:	Ji, Li
Other Authors:	Wysocki, Vicki H.
Language:	EN
Published:	The University of Arizona. 2006
Subjects:	tandem mass spectrometry peptide and protein identificaiton bayesian learning sequencing algorithm machine learning algorithm probability-based algorithm
Online Access:	http://hdl.handle.net/10150/193559

id	ndltd-arizona.edu-oai-arizona.openrepository.com-10150-193559
record_format	oai_dc
spelling	ndltd-arizona.edu-oai-arizona.openrepository.com-10150-1935592015-10-23T04:39:33Z Using Peak Intensity and Fragmentation Patterns in Peptide SeQuence IDentification (SQID) - A Bayesian Learning Algorithm for Tandem Mass Spectra Ji, Li Wysocki, Vicki H. Wysocki, Vicki H. Aspinwall, Craig A. Pemberton, Jeanne E. tandem mass spectrometry peptide and protein identificaiton bayesian learning sequencing algorithm machine learning algorithm probability-based algorithm As DNA sequence information becomes increasingly available, researchers are now tackling the great challenge of characterizing and identifying peptides and proteins from complex mixtures. Automatic database searching algorithms have been developed to meet this challenge. This dissertation is aimed at improving these algorithms to achieve more accurate and efficient peptide and protein identification with greater confidence by incorporating peak intensity information and peptide cleavage patterns obtained in gas-phase ion dissociation research. The underlying hypothesis is that these algorithms can benefit from knowledge about molecular level fragmentation behavior of particular amino acid residues or residue combinations.SeQuence IDentification (SQID), developed in this dissertation research, is a novel Bayesian learning-based method that attempts to incorporate intensity information from peptide cleavage patterns in a database searching algorithm. It directly makes use of the estimated peak intensity distributions for cleavage at amino acid pairs, derived from probability histograms generated from experimental MS/MS spectra. Rather than assuming amino acid cleavage patterns artificially or disregarding intensity information, SQID aims to take advantage of knowledge of observed fragmentation intensity behavior. In addition, SQID avoids the generation of a theoretical spectrum predication for each candidate sequence, needed by other sequencing methods including SEQUEST. As a result, computational efficiency is significantly improved.Extensive testing has been performed to evaluate SQID, by using datasets from the Pacific Northwest National Laboratory, University of Colorado, and the Institute for Systems Biology. The computational results show that by incorporating peak intensity distribution information, the program's ability to distinguish the correct peptides from incorrect matches is greatly enhanced. This observation is consistent with experiments involving various peptides and searches against larger databases with distraction proteins, which indirectly verifies that peptide dissociation behaviors determine the peptide sequencing and protein identification in MS/MS. Furthermore, testing SQID by using previously identified clusters of spectra associated with unique chemical structure motifs leads to the following conclusions: (1) the improvement in identification confidence is observed with a range of peptides displaying different fragmentation behaviors; (2) the magnitude of improvement is in agreement with the peptide cleavage selectivity, that is, more significant improvements are observed with more selective peptide cleavages. 2006 text Electronic Dissertation http://hdl.handle.net/10150/193559 659746549 1973 EN Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author. The University of Arizona.
collection	NDLTD
language	EN
sources	NDLTD
topic	tandem mass spectrometry peptide and protein identificaiton bayesian learning sequencing algorithm machine learning algorithm probability-based algorithm
spellingShingle	tandem mass spectrometry peptide and protein identificaiton bayesian learning sequencing algorithm machine learning algorithm probability-based algorithm Ji, Li Using Peak Intensity and Fragmentation Patterns in Peptide SeQuence IDentification (SQID) - A Bayesian Learning Algorithm for Tandem Mass Spectra
description	As DNA sequence information becomes increasingly available, researchers are now tackling the great challenge of characterizing and identifying peptides and proteins from complex mixtures. Automatic database searching algorithms have been developed to meet this challenge. This dissertation is aimed at improving these algorithms to achieve more accurate and efficient peptide and protein identification with greater confidence by incorporating peak intensity information and peptide cleavage patterns obtained in gas-phase ion dissociation research. The underlying hypothesis is that these algorithms can benefit from knowledge about molecular level fragmentation behavior of particular amino acid residues or residue combinations.SeQuence IDentification (SQID), developed in this dissertation research, is a novel Bayesian learning-based method that attempts to incorporate intensity information from peptide cleavage patterns in a database searching algorithm. It directly makes use of the estimated peak intensity distributions for cleavage at amino acid pairs, derived from probability histograms generated from experimental MS/MS spectra. Rather than assuming amino acid cleavage patterns artificially or disregarding intensity information, SQID aims to take advantage of knowledge of observed fragmentation intensity behavior. In addition, SQID avoids the generation of a theoretical spectrum predication for each candidate sequence, needed by other sequencing methods including SEQUEST. As a result, computational efficiency is significantly improved.Extensive testing has been performed to evaluate SQID, by using datasets from the Pacific Northwest National Laboratory, University of Colorado, and the Institute for Systems Biology. The computational results show that by incorporating peak intensity distribution information, the program's ability to distinguish the correct peptides from incorrect matches is greatly enhanced. This observation is consistent with experiments involving various peptides and searches against larger databases with distraction proteins, which indirectly verifies that peptide dissociation behaviors determine the peptide sequencing and protein identification in MS/MS. Furthermore, testing SQID by using previously identified clusters of spectra associated with unique chemical structure motifs leads to the following conclusions: (1) the improvement in identification confidence is observed with a range of peptides displaying different fragmentation behaviors; (2) the magnitude of improvement is in agreement with the peptide cleavage selectivity, that is, more significant improvements are observed with more selective peptide cleavages.
author2	Wysocki, Vicki H.
author_facet	Wysocki, Vicki H. Ji, Li
author	Ji, Li
author_sort	Ji, Li
title	Using Peak Intensity and Fragmentation Patterns in Peptide SeQuence IDentification (SQID) - A Bayesian Learning Algorithm for Tandem Mass Spectra
title_short	Using Peak Intensity and Fragmentation Patterns in Peptide SeQuence IDentification (SQID) - A Bayesian Learning Algorithm for Tandem Mass Spectra
title_full	Using Peak Intensity and Fragmentation Patterns in Peptide SeQuence IDentification (SQID) - A Bayesian Learning Algorithm for Tandem Mass Spectra
title_fullStr	Using Peak Intensity and Fragmentation Patterns in Peptide SeQuence IDentification (SQID) - A Bayesian Learning Algorithm for Tandem Mass Spectra
title_full_unstemmed	Using Peak Intensity and Fragmentation Patterns in Peptide SeQuence IDentification (SQID) - A Bayesian Learning Algorithm for Tandem Mass Spectra
title_sort	using peak intensity and fragmentation patterns in peptide sequence identification (sqid) - a bayesian learning algorithm for tandem mass spectra
publisher	The University of Arizona.
publishDate	2006
url	http://hdl.handle.net/10150/193559
work_keys_str_mv	AT jili usingpeakintensityandfragmentationpatternsinpeptidesequenceidentificationsqidabayesianlearningalgorithmfortandemmassspectra
_version_	1718098991465365504

Using Peak Intensity and Fragmentation Patterns in Peptide SeQuence IDentification (SQID) - A Bayesian Learning Algorithm for Tandem Mass Spectra

Similar Items