PhyKIT: A broadly applicable UNIX shell toolkit for processing and analyzing phylogenomic data

Motivation: Diverse disciplines in biology process and analyze multiple sequence alignments (MSAs) and phylogenetic trees to evaluate their information content, infer evolutionary events and processes and predict gene function. However, automated processing of MSAs and trees remains a challenge due...

Full description

Bibliographic Details
Main Authors: Buida, T.J (Author), Labella, A.L (Author), Li, Y. (Author), Rokas, A. (Author), Shen, X.-X (Author), Steenwyk, J.L (Author)
Format: Article
Language:English
Published: Oxford University Press 2021
Subjects:
Online Access:View Fulltext in Publisher
LEADER 02180nam a2200241Ia 4500
001 10.1093-bioinformatics-btab096
008 220427s2021 CNT 000 0 und d
020 |a 13674803 (ISSN) 
245 1 0 |a PhyKIT: A broadly applicable UNIX shell toolkit for processing and analyzing phylogenomic data 
260 0 |b Oxford University Press  |c 2021 
856 |z View Fulltext in Publisher  |u https://doi.org/10.1093/bioinformatics/btab096 
520 3 |a Motivation: Diverse disciplines in biology process and analyze multiple sequence alignments (MSAs) and phylogenetic trees to evaluate their information content, infer evolutionary events and processes and predict gene function. However, automated processing of MSAs and trees remains a challenge due to the lack of a unified toolkit. To fill this gap, we introduce PhyKIT, a toolkit for the UNIX shell environment with 30 functions that process MSAs and trees, including but not limited to estimation of mutation rate, evaluation of sequence composition biases, calculation of the degree of violation of a molecular clock and collapsing bipartitions (internal branches) with low support. Results: To demonstrate the utility of PhyKIT, we detail three use cases: (1) summarizing information content in MSAs and phylogenetic trees for diagnosing potential biases in sequence or tree data; (2) evaluating gene-gene covariation of evolutionary rates to identify functional relationships, including novel ones, among genes and (3) identify lack of resolution events or polytomies in phylogenetic trees, which are suggestive of rapid radiation events or lack of data. We anticipate PhyKIT will be useful for processing, examining and deriving biological meaning from increasingly large phylogenomic datasets. © 2021 The Author(s) 2021. Published by Oxford University Press. All rights reserved. 
650 0 4 |a article 
650 0 4 |a evolutionary rate 
650 0 4 |a phylogenetic tree 
650 0 4 |a sequence alignment 
700 1 |a Buida, T.J.  |e author 
700 1 |a Labella, A.L.  |e author 
700 1 |a Li, Y.  |e author 
700 1 |a Rokas, A.  |e author 
700 1 |a Shen, X.-X.  |e author 
700 1 |a Steenwyk, J.L.  |e author 
773 |t Bioinformatics