Robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix

Selective pressures at the DNA level shape genes into pro les consisting of patterns of rapidly evolving sites and sites withstanding change. These pro les remain detectable even when protein sequences become extensively diverged. It has been hypothesised that these patterns can be used as gene i...

Full description

Bibliographic Details
Main Author: Ndhlovu, Andrew
Format: Others
Language:en
Published: 2015
Subjects:
Online Access:http://hdl.handle.net/10539/18660
id ndltd-netd.ac.za-oai-union.ndltd.org-wits-oai-wiredspace.wits.ac.za-10539-18660
record_format oai_dc
spelling ndltd-netd.ac.za-oai-union.ndltd.org-wits-oai-wiredspace.wits.ac.za-10539-186602019-05-11T03:40:12Z Robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix Ndhlovu, Andrew Amino Acid Substitution Selective pressures at the DNA level shape genes into pro les consisting of patterns of rapidly evolving sites and sites withstanding change. These pro les remain detectable even when protein sequences become extensively diverged. It has been hypothesised that these patterns can be used as gene identi ers. A common task in molecular biology is to infer functional, structural or evolutionary relationships by querying a database using an algorithm. However, problems arise when sequence similarity is low. The problem is that the algorithm produces numerous false positives when highly conserved datasets are aligned. To increase the sensitivity of the algorithm, the evolutionary rate based approach was reimplemented and coupled with a conventional BLOSUM substitution matrix to produce a new implementation called BLOSUM-FIRE. The two approaches are combined in a dynamic scoring function, which uses the selective pressure to score aligned residues. Analysis of quality of alignments produced, revealed that the new implementation of the FIRE algorithm performs as well as conventional algorithms. In addition, the Evolutionary rate Database (EvoDB), which is a compilation of evolutionary rate pro les of all the members of the PFAM-A protein domain database has been developed. The EvoDB database can be queried using FIRE to infer protein domain functions. The utility of this algorithm and database was tested by inferring the domain functions of the Hepatitis B X protein. Results show that the BLOSUM-FIRE algorithm was able to accurately identify the domain function of HBx as a trans-activation protein using EvoDB. The biological relevance of these results was not validated and requires further interrogation; however, these proteins share vital roles in viral replication. This study demonstrates the utility of an evolutionary rate based approach and demonstrates that such an approach is robust when coupled with an amino acid substitution matrix yielding results comparable to conventional algorithms. EvoDB is a catalogue of the evolutionary rate pro les and provides the corresponding phylogenetic trees, PFAM-A alignments and annotated accession identi er data. The BLOSUM-FIRE software and user manual including the EvoDB at le database and release notes have been made freely available at www.bioinf.wits.ac.za/software/fire. The BLOSUM-FIRE algorithm and EvoDB database present a tier of information untapped by current databases and tools. A dissertation submitted to the Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, in ful lment of the requirements of the degree of Master of Science (Medicine). 2015-09-15T13:18:27Z 2015-09-15T13:18:27Z 2014 Thesis http://hdl.handle.net/10539/18660 en application/pdf
collection NDLTD
language en
format Others
sources NDLTD
topic Amino Acid Substitution
spellingShingle Amino Acid Substitution
Ndhlovu, Andrew
Robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix
description Selective pressures at the DNA level shape genes into pro les consisting of patterns of rapidly evolving sites and sites withstanding change. These pro les remain detectable even when protein sequences become extensively diverged. It has been hypothesised that these patterns can be used as gene identi ers. A common task in molecular biology is to infer functional, structural or evolutionary relationships by querying a database using an algorithm. However, problems arise when sequence similarity is low. The problem is that the algorithm produces numerous false positives when highly conserved datasets are aligned. To increase the sensitivity of the algorithm, the evolutionary rate based approach was reimplemented and coupled with a conventional BLOSUM substitution matrix to produce a new implementation called BLOSUM-FIRE. The two approaches are combined in a dynamic scoring function, which uses the selective pressure to score aligned residues. Analysis of quality of alignments produced, revealed that the new implementation of the FIRE algorithm performs as well as conventional algorithms. In addition, the Evolutionary rate Database (EvoDB), which is a compilation of evolutionary rate pro les of all the members of the PFAM-A protein domain database has been developed. The EvoDB database can be queried using FIRE to infer protein domain functions. The utility of this algorithm and database was tested by inferring the domain functions of the Hepatitis B X protein. Results show that the BLOSUM-FIRE algorithm was able to accurately identify the domain function of HBx as a trans-activation protein using EvoDB. The biological relevance of these results was not validated and requires further interrogation; however, these proteins share vital roles in viral replication. This study demonstrates the utility of an evolutionary rate based approach and demonstrates that such an approach is robust when coupled with an amino acid substitution matrix yielding results comparable to conventional algorithms. EvoDB is a catalogue of the evolutionary rate pro les and provides the corresponding phylogenetic trees, PFAM-A alignments and annotated accession identi er data. The BLOSUM-FIRE software and user manual including the EvoDB at le database and release notes have been made freely available at www.bioinf.wits.ac.za/software/fire. The BLOSUM-FIRE algorithm and EvoDB database present a tier of information untapped by current databases and tools. === A dissertation submitted to the Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, in ful lment of the requirements of the degree of Master of Science (Medicine).
author Ndhlovu, Andrew
author_facet Ndhlovu, Andrew
author_sort Ndhlovu, Andrew
title Robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix
title_short Robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix
title_full Robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix
title_fullStr Robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix
title_full_unstemmed Robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix
title_sort robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix
publishDate 2015
url http://hdl.handle.net/10539/18660
work_keys_str_mv AT ndhlovuandrew robustsequencealignmentusingevolutionaryratescoupledwithanaminoacidsubstitutionmatrix
_version_ 1719081469840719872