VESPA: Very large-scale Evolutionary and Selective Pressure Analyses

Background Large-scale molecular evolutionary analyses of protein coding sequences requires a number of preparatory inter-related steps from finding gene families, to generating alignments and phylogenetic trees and assessing selective pressure variation. Each phase of these analyses can represent s...

Full description

Bibliographic Details
Main Authors: Andrew E. Webb, Thomas A. Walsh, Mary J. O’Connell
Format: Article
Language:English
Published: PeerJ Inc. 2017-06-01
Series:PeerJ Computer Science
Subjects:
Online Access:https://peerj.com/articles/cs-118.pdf
id doaj-1987e4ef116b49c1ac4ffa0c1419b929
record_format Article
spelling doaj-1987e4ef116b49c1ac4ffa0c1419b9292020-11-24T22:41:38ZengPeerJ Inc.PeerJ Computer Science2376-59922017-06-013e11810.7717/peerj-cs.118VESPA: Very large-scale Evolutionary and Selective Pressure AnalysesAndrew E. Webb0Thomas A. Walsh1Mary J. O’Connell2Bioinformatics and Molecular Evolution Group, School of Biotechnology, Faculty of Science and Health, Dublin City University, Dublin, IrelandBioinformatics and Molecular Evolution Group, School of Biotechnology, Faculty of Science and Health, Dublin City University, Dublin, IrelandBioinformatics and Molecular Evolution Group, School of Biotechnology, Faculty of Science and Health, Dublin City University, Dublin, IrelandBackground Large-scale molecular evolutionary analyses of protein coding sequences requires a number of preparatory inter-related steps from finding gene families, to generating alignments and phylogenetic trees and assessing selective pressure variation. Each phase of these analyses can represent significant challenges, particularly when working with entire proteomes (all protein coding sequences in a genome) from a large number of species. Methods We present VESPA, software capable of automating a selective pressure analysis using codeML in addition to the preparatory analyses and summary statistics. VESPA is written in python and Perl and is designed to run within a UNIX environment. Results We have benchmarked VESPA and our results show that the method is consistent, performs well on both large scale and smaller scale datasets, and produces results in line with previously published datasets. Discussion Large-scale gene family identification, sequence alignment, and phylogeny reconstruction are all important aspects of large-scale molecular evolutionary analyses. VESPA provides flexible software for simplifying these processes along with downstream selective pressure variation analyses. The software automatically interprets results from codeML and produces simplified summary files to assist the user in better understanding the results. VESPA may be found at the following website: http://www.mol-evol.org/VESPA.https://peerj.com/articles/cs-118.pdfSelective pressure analysisProtein molecular evolutionLarge-scale comparative genomicsGene family evolutionPositive selection
collection DOAJ
language English
format Article
sources DOAJ
author Andrew E. Webb
Thomas A. Walsh
Mary J. O’Connell
spellingShingle Andrew E. Webb
Thomas A. Walsh
Mary J. O’Connell
VESPA: Very large-scale Evolutionary and Selective Pressure Analyses
PeerJ Computer Science
Selective pressure analysis
Protein molecular evolution
Large-scale comparative genomics
Gene family evolution
Positive selection
author_facet Andrew E. Webb
Thomas A. Walsh
Mary J. O’Connell
author_sort Andrew E. Webb
title VESPA: Very large-scale Evolutionary and Selective Pressure Analyses
title_short VESPA: Very large-scale Evolutionary and Selective Pressure Analyses
title_full VESPA: Very large-scale Evolutionary and Selective Pressure Analyses
title_fullStr VESPA: Very large-scale Evolutionary and Selective Pressure Analyses
title_full_unstemmed VESPA: Very large-scale Evolutionary and Selective Pressure Analyses
title_sort vespa: very large-scale evolutionary and selective pressure analyses
publisher PeerJ Inc.
series PeerJ Computer Science
issn 2376-5992
publishDate 2017-06-01
description Background Large-scale molecular evolutionary analyses of protein coding sequences requires a number of preparatory inter-related steps from finding gene families, to generating alignments and phylogenetic trees and assessing selective pressure variation. Each phase of these analyses can represent significant challenges, particularly when working with entire proteomes (all protein coding sequences in a genome) from a large number of species. Methods We present VESPA, software capable of automating a selective pressure analysis using codeML in addition to the preparatory analyses and summary statistics. VESPA is written in python and Perl and is designed to run within a UNIX environment. Results We have benchmarked VESPA and our results show that the method is consistent, performs well on both large scale and smaller scale datasets, and produces results in line with previously published datasets. Discussion Large-scale gene family identification, sequence alignment, and phylogeny reconstruction are all important aspects of large-scale molecular evolutionary analyses. VESPA provides flexible software for simplifying these processes along with downstream selective pressure variation analyses. The software automatically interprets results from codeML and produces simplified summary files to assist the user in better understanding the results. VESPA may be found at the following website: http://www.mol-evol.org/VESPA.
topic Selective pressure analysis
Protein molecular evolution
Large-scale comparative genomics
Gene family evolution
Positive selection
url https://peerj.com/articles/cs-118.pdf
work_keys_str_mv AT andrewewebb vespaverylargescaleevolutionaryandselectivepressureanalyses
AT thomasawalsh vespaverylargescaleevolutionaryandselectivepressureanalyses
AT maryjoconnell vespaverylargescaleevolutionaryandselectivepressureanalyses
_version_ 1725701492275412992