Protein complex prediction via dense subgraphs and false positive analysis.

Many proteins work together with others in groups called complexes in order to achieve a specific function. Discovering protein complexes is important for understanding biological processes and predict protein functions in living organisms. Large-scale and throughput techniques have made possible to...

Full description

Bibliographic Details
Main Authors: Cecilia Hernandez, Carlos Mella, Gonzalo Navarro, Alvaro Olivera-Nappa, Jaime Araya
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2017-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC5609739?pdf=render
id doaj-0251d8eefdec4b3295cf33473933b1be
record_format Article
spelling doaj-0251d8eefdec4b3295cf33473933b1be2020-11-24T21:30:01ZengPublic Library of Science (PLoS)PLoS ONE1932-62032017-01-01129e018346010.1371/journal.pone.0183460Protein complex prediction via dense subgraphs and false positive analysis.Cecilia HernandezCarlos MellaGonzalo NavarroAlvaro Olivera-NappaJaime ArayaMany proteins work together with others in groups called complexes in order to achieve a specific function. Discovering protein complexes is important for understanding biological processes and predict protein functions in living organisms. Large-scale and throughput techniques have made possible to compile protein-protein interaction networks (PPI networks), which have been used in several computational approaches for detecting protein complexes. Those predictions might guide future biologic experimental research. Some approaches are topology-based, where highly connected proteins are predicted to be complexes; some propose different clustering algorithms using partitioning, overlaps among clusters for networks modeled with unweighted or weighted graphs; and others use density of clusters and information based on protein functionality. However, some schemes still require much processing time or the quality of their results can be improved. Furthermore, most of the results obtained with computational tools are not accompanied by an analysis of false positives. We propose an effective and efficient mining algorithm for discovering highly connected subgraphs, which is our base for defining protein complexes. Our representation is based on transforming the PPI network into a directed acyclic graph that reduces the number of represented edges and the search space for discovering subgraphs. Our approach considers weighted and unweighted PPI networks. We compare our best alternative using PPI networks from Saccharomyces cerevisiae (yeast) and Homo sapiens (human) with state-of-the-art approaches in terms of clustering, biological metrics and execution times, as well as three gold standards for yeast and two for human. Furthermore, we analyze false positive predicted complexes searching the PDBe (Protein Data Bank in Europe) database in order to identify matching protein complexes that have been purified and structurally characterized. Our analysis shows that more than 50 yeast protein complexes and more than 300 human protein complexes found to be false positives according to our prediction method, i.e., not described in the gold standard complex databases, in fact contain protein complexes that have been characterized structurally and documented in PDBe. We also found that some of these protein complexes have recently been classified as part of a Periodic Table of Protein Complexes. The latest version of our software is publicly available at http://doi.org/10.6084/m9.figshare.5297314.v1.http://europepmc.org/articles/PMC5609739?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Cecilia Hernandez
Carlos Mella
Gonzalo Navarro
Alvaro Olivera-Nappa
Jaime Araya
spellingShingle Cecilia Hernandez
Carlos Mella
Gonzalo Navarro
Alvaro Olivera-Nappa
Jaime Araya
Protein complex prediction via dense subgraphs and false positive analysis.
PLoS ONE
author_facet Cecilia Hernandez
Carlos Mella
Gonzalo Navarro
Alvaro Olivera-Nappa
Jaime Araya
author_sort Cecilia Hernandez
title Protein complex prediction via dense subgraphs and false positive analysis.
title_short Protein complex prediction via dense subgraphs and false positive analysis.
title_full Protein complex prediction via dense subgraphs and false positive analysis.
title_fullStr Protein complex prediction via dense subgraphs and false positive analysis.
title_full_unstemmed Protein complex prediction via dense subgraphs and false positive analysis.
title_sort protein complex prediction via dense subgraphs and false positive analysis.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2017-01-01
description Many proteins work together with others in groups called complexes in order to achieve a specific function. Discovering protein complexes is important for understanding biological processes and predict protein functions in living organisms. Large-scale and throughput techniques have made possible to compile protein-protein interaction networks (PPI networks), which have been used in several computational approaches for detecting protein complexes. Those predictions might guide future biologic experimental research. Some approaches are topology-based, where highly connected proteins are predicted to be complexes; some propose different clustering algorithms using partitioning, overlaps among clusters for networks modeled with unweighted or weighted graphs; and others use density of clusters and information based on protein functionality. However, some schemes still require much processing time or the quality of their results can be improved. Furthermore, most of the results obtained with computational tools are not accompanied by an analysis of false positives. We propose an effective and efficient mining algorithm for discovering highly connected subgraphs, which is our base for defining protein complexes. Our representation is based on transforming the PPI network into a directed acyclic graph that reduces the number of represented edges and the search space for discovering subgraphs. Our approach considers weighted and unweighted PPI networks. We compare our best alternative using PPI networks from Saccharomyces cerevisiae (yeast) and Homo sapiens (human) with state-of-the-art approaches in terms of clustering, biological metrics and execution times, as well as three gold standards for yeast and two for human. Furthermore, we analyze false positive predicted complexes searching the PDBe (Protein Data Bank in Europe) database in order to identify matching protein complexes that have been purified and structurally characterized. Our analysis shows that more than 50 yeast protein complexes and more than 300 human protein complexes found to be false positives according to our prediction method, i.e., not described in the gold standard complex databases, in fact contain protein complexes that have been characterized structurally and documented in PDBe. We also found that some of these protein complexes have recently been classified as part of a Periodic Table of Protein Complexes. The latest version of our software is publicly available at http://doi.org/10.6084/m9.figshare.5297314.v1.
url http://europepmc.org/articles/PMC5609739?pdf=render
work_keys_str_mv AT ceciliahernandez proteincomplexpredictionviadensesubgraphsandfalsepositiveanalysis
AT carlosmella proteincomplexpredictionviadensesubgraphsandfalsepositiveanalysis
AT gonzalonavarro proteincomplexpredictionviadensesubgraphsandfalsepositiveanalysis
AT alvarooliveranappa proteincomplexpredictionviadensesubgraphsandfalsepositiveanalysis
AT jaimearaya proteincomplexpredictionviadensesubgraphsandfalsepositiveanalysis
_version_ 1725964520922284032