PPalign: optimal alignment of Potts models representing proteins with direct coupling information

Background: To assign structural and functional annotations to the ever increasing amount of sequenced proteins, the main approach relies on sequence-based homology search methods, e.g. BLAST or the current state-of-the-art methods based on profile Hidden Markov Models, which rely on significant ali...

Full description

Bibliographic Details
Main Authors:	Coste, F. (Author), Talibart, H. (Author)
Format:	Article
Language:	English
Published:	BioMed Central Ltd 2021
Subjects:	algorithm Algorithms Alignment amino acid sequence Amino Acid Sequence article coevolution Coevolution Computational bottlenecks Couplings Direct coupling analysis Functional annotation genetics Hidden Markov models Homology human Humans Integer linear programming Integer linear programming formulation Integer programming Optimal alignments Pairwise alignment Pairwise sequence alignment Potts model prediction Profile hidden Markov model protein Proteins sequence alignment Sequence alignment Sequence Alignment sequence homology Sequence Homology State-of-the-art methods system analysis
Online Access:	View Fulltext in Publisher


LEADER	04092nam a2200613Ia 4500
001	10.1186-s12859-021-04222-4
008	220427s2021 CNT 000 0 und d
020			\|a 14712105 (ISSN)
245	1	0	\|a PPalign: optimal alignment of Potts models representing proteins with direct coupling information
260		0	\|b BioMed Central Ltd \|c 2021
856			\|z View Fulltext in Publisher \|u https://doi.org/10.1186/s12859-021-04222-4
520	3		\|a Background: To assign structural and functional annotations to the ever increasing amount of sequenced proteins, the main approach relies on sequence-based homology search methods, e.g. BLAST or the current state-of-the-art methods based on profile Hidden Markov Models, which rely on significant alignments of query sequences to annotated proteins or protein families. While powerful, these approaches do not take coevolution between residues into account. Taking advantage of recent advances in the field of contact prediction, we propose here to represent proteins by Potts models, which model direct couplings between positions in addition to positional composition, and to compare proteins by aligning these models. Due to non-local dependencies, the problem of aligning Potts models is hard and remains the main computational bottleneck for their use. Methods: We introduce here an Integer Linear Programming formulation of the problem and PPalign, a program based on this formulation, to compute the optimal pairwise alignment of Potts models representing proteins in tractable time. The approach is assessed with respect to a non-redundant set of reference pairwise sequence alignments from SISYPHUS benchmark which have lowest sequence identity (between 3 % and 20 %) and enable to build reliable Potts models for each sequence to be aligned. This experimentation confirms that Potts models can be aligned in reasonable time (1 ′37 ′ ′ in average on these alignments). The contribution of couplings is evaluated in comparison with HHalign and independent-site PPalign. Although Potts models were not fully optimized for alignment purposes and simple gap scores were used, PPalign yields a better mean F1 score and finds significantly better alignments than HHalign and PPalign without couplings in some cases. Conclusions: These results show that pairwise couplings from protein Potts models can be used to improve the alignment of remotely related protein sequences in tractable time. Our experimentation suggests yet that new research on the inference of Potts models is now needed to make them more comparable and suitable for homology search. We think that PPalign’s guaranteed optimality will be a powerful asset to perform unbiased investigations in this direction. © 2021, The Author(s).
650	0	4	\|a algorithm
650	0	4	\|a Algorithms
650	0	4	\|a Alignment
650	0	4	\|a amino acid sequence
650	0	4	\|a Amino Acid Sequence
650	0	4	\|a article
650	0	4	\|a coevolution
650	0	4	\|a Coevolution
650	0	4	\|a Computational bottlenecks
650	0	4	\|a Couplings
650	0	4	\|a Direct coupling analysis
650	0	4	\|a Functional annotation
650	0	4	\|a genetics
650	0	4	\|a Hidden Markov models
650	0	4	\|a Homology
650	0	4	\|a human
650	0	4	\|a Humans
650	0	4	\|a Integer linear programming
650	0	4	\|a Integer linear programming formulation
650	0	4	\|a Integer programming
650	0	4	\|a Optimal alignments
650	0	4	\|a Pairwise alignment
650	0	4	\|a Pairwise sequence alignment
650	0	4	\|a Potts model
650	0	4	\|a Potts model
650	0	4	\|a prediction
650	0	4	\|a Profile hidden Markov model
650	0	4	\|a protein
650	0	4	\|a Proteins
650	0	4	\|a Proteins
650	0	4	\|a Proteins
650	0	4	\|a sequence alignment
650	0	4	\|a sequence alignment
650	0	4	\|a Sequence alignment
650	0	4	\|a Sequence Alignment
650	0	4	\|a sequence homology
650	0	4	\|a Sequence Homology
650	0	4	\|a State-of-the-art methods
650	0	4	\|a system analysis
700	1		\|a Coste, F. \|e author
700	1		\|a Talibart, H. \|e author
773			\|t BMC Bioinformatics

PPalign: optimal alignment of Potts models representing proteins with direct coupling information

Similar Items