Robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix
Selective pressures at the DNA level shape genes into pro les consisting of patterns of rapidly evolving sites and sites withstanding change. These pro les remain detectable even when protein sequences become extensively diverged. It has been hypothesised that these patterns can be used as gene i...
Main Author: | |
---|---|
Format: | Others |
Language: | en |
Published: |
2015
|
Subjects: | |
Online Access: | http://hdl.handle.net/10539/18660 |
id |
ndltd-netd.ac.za-oai-union.ndltd.org-wits-oai-wiredspace.wits.ac.za-10539-18660 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-netd.ac.za-oai-union.ndltd.org-wits-oai-wiredspace.wits.ac.za-10539-186602019-05-11T03:40:12Z Robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix Ndhlovu, Andrew Amino Acid Substitution Selective pressures at the DNA level shape genes into pro les consisting of patterns of rapidly evolving sites and sites withstanding change. These pro les remain detectable even when protein sequences become extensively diverged. It has been hypothesised that these patterns can be used as gene identi ers. A common task in molecular biology is to infer functional, structural or evolutionary relationships by querying a database using an algorithm. However, problems arise when sequence similarity is low. The problem is that the algorithm produces numerous false positives when highly conserved datasets are aligned. To increase the sensitivity of the algorithm, the evolutionary rate based approach was reimplemented and coupled with a conventional BLOSUM substitution matrix to produce a new implementation called BLOSUM-FIRE. The two approaches are combined in a dynamic scoring function, which uses the selective pressure to score aligned residues. Analysis of quality of alignments produced, revealed that the new implementation of the FIRE algorithm performs as well as conventional algorithms. In addition, the Evolutionary rate Database (EvoDB), which is a compilation of evolutionary rate pro les of all the members of the PFAM-A protein domain database has been developed. The EvoDB database can be queried using FIRE to infer protein domain functions. The utility of this algorithm and database was tested by inferring the domain functions of the Hepatitis B X protein. Results show that the BLOSUM-FIRE algorithm was able to accurately identify the domain function of HBx as a trans-activation protein using EvoDB. The biological relevance of these results was not validated and requires further interrogation; however, these proteins share vital roles in viral replication. This study demonstrates the utility of an evolutionary rate based approach and demonstrates that such an approach is robust when coupled with an amino acid substitution matrix yielding results comparable to conventional algorithms. EvoDB is a catalogue of the evolutionary rate pro les and provides the corresponding phylogenetic trees, PFAM-A alignments and annotated accession identi er data. The BLOSUM-FIRE software and user manual including the EvoDB at le database and release notes have been made freely available at www.bioinf.wits.ac.za/software/fire. The BLOSUM-FIRE algorithm and EvoDB database present a tier of information untapped by current databases and tools. A dissertation submitted to the Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, in ful lment of the requirements of the degree of Master of Science (Medicine). 2015-09-15T13:18:27Z 2015-09-15T13:18:27Z 2014 Thesis http://hdl.handle.net/10539/18660 en application/pdf |
collection |
NDLTD |
language |
en |
format |
Others
|
sources |
NDLTD |
topic |
Amino Acid Substitution |
spellingShingle |
Amino Acid Substitution Ndhlovu, Andrew Robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix |
description |
Selective pressures at the DNA level shape genes into pro les consisting of patterns of
rapidly evolving sites and sites withstanding change. These pro les remain detectable
even when protein sequences become extensively diverged. It has been hypothesised
that these patterns can be used as gene identi ers. A common task in molecular biology
is to infer functional, structural or evolutionary relationships by querying a database
using an algorithm. However, problems arise when sequence similarity is low.
The problem is that the algorithm produces numerous
false positives when highly conserved datasets are aligned. To increase the
sensitivity of the algorithm, the evolutionary rate based approach was reimplemented
and coupled with a conventional BLOSUM substitution matrix to produce a new implementation
called BLOSUM-FIRE. The two approaches are combined in a dynamic
scoring function, which uses the selective pressure to score aligned residues. Analysis
of quality of alignments produced, revealed that the new implementation of the FIRE
algorithm performs as well as conventional algorithms. In addition, the Evolutionary
rate Database (EvoDB), which is a compilation of evolutionary rate pro les of all the
members of the PFAM-A protein domain database has been developed. The EvoDB
database can be queried using FIRE to infer protein domain functions. The utility
of this algorithm and database was tested by inferring the domain functions of the
Hepatitis B X protein. Results show that the BLOSUM-FIRE algorithm was able
to accurately identify the domain function of HBx as a trans-activation protein using
EvoDB. The biological relevance
of these results was not validated and requires further interrogation; however, these
proteins share vital roles in viral replication. This study demonstrates the utility
of an evolutionary rate based approach and demonstrates that such an approach is
robust when coupled with an amino acid substitution matrix yielding results comparable
to conventional algorithms. EvoDB is a catalogue of the evolutionary rate
pro les and provides the corresponding phylogenetic trees, PFAM-A alignments and
annotated accession identi er data. The BLOSUM-FIRE software and user manual
including the EvoDB
at le database and release notes have been made freely available
at www.bioinf.wits.ac.za/software/fire. The BLOSUM-FIRE algorithm and
EvoDB database present a tier of information untapped by current databases and tools. === A dissertation submitted to the Faculty of Health Sciences, University of the
Witwatersrand, Johannesburg, in ful lment of the requirements of the degree
of
Master of Science (Medicine). |
author |
Ndhlovu, Andrew |
author_facet |
Ndhlovu, Andrew |
author_sort |
Ndhlovu, Andrew |
title |
Robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix |
title_short |
Robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix |
title_full |
Robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix |
title_fullStr |
Robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix |
title_full_unstemmed |
Robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix |
title_sort |
robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix |
publishDate |
2015 |
url |
http://hdl.handle.net/10539/18660 |
work_keys_str_mv |
AT ndhlovuandrew robustsequencealignmentusingevolutionaryratescoupledwithanaminoacidsubstitutionmatrix |
_version_ |
1719081469840719872 |