Improving the quality of multiple sequence alignment

Multiple sequence alignment is an important bioinformatics problem, with applications in diverse types of biological analysis, such as structure prediction, phylogenetic analysis and critical sites identification. In recent years, the quality of multiple sequence alignment was improved a lot by newl...

Full description

Bibliographic Details
Main Author:	Lu, Yue
Other Authors:	Sze, Sing-Hoi
Format:	Others
Language:	en_US
Published:	2010
Subjects:	Multiple Sequence Alignment Algorithms Bioinformatics
Online Access:	http://hdl.handle.net/1969.1/ETD-TAMU-3111 http://hdl.handle.net/1969.1/ETD-TAMU-3111

id	ndltd-tamu.edu-oai-repository.tamu.edu-1969.1-ETD-TAMU-3111
record_format	oai_dc
spelling	ndltd-tamu.edu-oai-repository.tamu.edu-1969.1-ETD-TAMU-31112013-01-08T10:40:06ZImproving the quality of multiple sequence alignmentLu, YueMultiple Sequence AlignmentAlgorithmsBioinformaticsMultiple sequence alignment is an important bioinformatics problem, with applications in diverse types of biological analysis, such as structure prediction, phylogenetic analysis and critical sites identification. In recent years, the quality of multiple sequence alignment was improved a lot by newly developed methods, although it remains a difficult task for constructing accurate alignments, especially for divergent sequences. In this dissertation, we propose three new methods (PSAlign, ISPAlign, and NRAlign) for further improving the quality of multiple sequences alignment. In PSAlign, we propose an alternative formulation of multiple sequence alignment based on the idea of finding a multiple alignment which preserves all the pairwise alignments specified by edges of a given tree. In contrast with traditional NP-hard formulations, our preserving alignment formulation can be solved in polynomial time without using a heuristic, while still retaining very good performance when compared to traditional heuristics. In ISPAlign, by using additional hits from database search of the input sequences, a few strategies have been proposed to significantly improve alignment accuracy, including the construction of profiles from the hits while performing profile alignment, the inclusion of high scoring hits into the input sequences, the use of intermediate sequence search to link distant homologs, and the use of secondary structure information. In NRAlign, we observe that it is possible to further improve alignment accuracy by taking into account alignment of neighboring residues when aligning two residues, thus making better use of horizontal information. By modifying existing multiple alignment algorithms to make use of horizontal information, we show that this strategy is able to consistently improve over existing algorithms on all the benchmarks that are commonly used to measure alignment accuracy.Sze, Sing-Hoi2010-01-15T00:11:56Z2010-01-16T01:19:03Z2010-01-15T00:11:56Z2010-01-16T01:19:03Z2008-122009-05-15BookThesisElectronic Dissertationtextelectronicapplication/pdfborn digitalhttp://hdl.handle.net/1969.1/ETD-TAMU-3111http://hdl.handle.net/1969.1/ETD-TAMU-3111en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
topic	Multiple Sequence Alignment Algorithms Bioinformatics
spellingShingle	Multiple Sequence Alignment Algorithms Bioinformatics Lu, Yue Improving the quality of multiple sequence alignment
description	Multiple sequence alignment is an important bioinformatics problem, with applications in diverse types of biological analysis, such as structure prediction, phylogenetic analysis and critical sites identification. In recent years, the quality of multiple sequence alignment was improved a lot by newly developed methods, although it remains a difficult task for constructing accurate alignments, especially for divergent sequences. In this dissertation, we propose three new methods (PSAlign, ISPAlign, and NRAlign) for further improving the quality of multiple sequences alignment. In PSAlign, we propose an alternative formulation of multiple sequence alignment based on the idea of finding a multiple alignment which preserves all the pairwise alignments specified by edges of a given tree. In contrast with traditional NP-hard formulations, our preserving alignment formulation can be solved in polynomial time without using a heuristic, while still retaining very good performance when compared to traditional heuristics. In ISPAlign, by using additional hits from database search of the input sequences, a few strategies have been proposed to significantly improve alignment accuracy, including the construction of profiles from the hits while performing profile alignment, the inclusion of high scoring hits into the input sequences, the use of intermediate sequence search to link distant homologs, and the use of secondary structure information. In NRAlign, we observe that it is possible to further improve alignment accuracy by taking into account alignment of neighboring residues when aligning two residues, thus making better use of horizontal information. By modifying existing multiple alignment algorithms to make use of horizontal information, we show that this strategy is able to consistently improve over existing algorithms on all the benchmarks that are commonly used to measure alignment accuracy.
author2	Sze, Sing-Hoi
author_facet	Sze, Sing-Hoi Lu, Yue
author	Lu, Yue
author_sort	Lu, Yue
title	Improving the quality of multiple sequence alignment
title_short	Improving the quality of multiple sequence alignment
title_full	Improving the quality of multiple sequence alignment
title_fullStr	Improving the quality of multiple sequence alignment
title_full_unstemmed	Improving the quality of multiple sequence alignment
title_sort	improving the quality of multiple sequence alignment
publishDate	2010
url	http://hdl.handle.net/1969.1/ETD-TAMU-3111 http://hdl.handle.net/1969.1/ETD-TAMU-3111
work_keys_str_mv	AT luyue improvingthequalityofmultiplesequencealignment
_version_	1716504215839309824

Improving the quality of multiple sequence alignment

Similar Items