Integrating data and knowledge to identify functional modules of genes: a multilayer approach

Abstract Background Characterizing the modular structure of cellular network is an important way to identify novel genes for targeted therapeutics. This is made possible by the rising of high-throughput technology. Unfortunately, computational methods to identify functional modules were limited by t...

Full description

Bibliographic Details
Main Authors: Lifan Liang, Vicky Chen, Kunju Zhu, Xiaonan Fan, Xinghua Lu, Songjian Lu
Format: Article
Language:English
Published: BMC 2019-05-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-019-2800-y
id doaj-d45cda6f9cf24e3a985e3f3f0819864f
record_format Article
spelling doaj-d45cda6f9cf24e3a985e3f3f0819864f2020-11-25T02:38:16ZengBMCBMC Bioinformatics1471-21052019-05-0120111510.1186/s12859-019-2800-yIntegrating data and knowledge to identify functional modules of genes: a multilayer approachLifan Liang0Vicky Chen1Kunju Zhu2Xiaonan Fan3Xinghua Lu4Songjian Lu5Department of Biomedical Informatics, University of PittsburghDepartment of Biomedical Informatics, University of PittsburghDepartment of Biomedical Informatics, University of PittsburghDepartment of Biomedical Informatics, University of PittsburghDepartment of Biomedical Informatics, University of PittsburghDepartment of Biomedical Informatics, University of PittsburghAbstract Background Characterizing the modular structure of cellular network is an important way to identify novel genes for targeted therapeutics. This is made possible by the rising of high-throughput technology. Unfortunately, computational methods to identify functional modules were limited by the data quality issues of high-throughput techniques. This study aims to integrate knowledge extracted from literature to further improve the accuracy of functional module identification. Results Our new model and algorithm were applied to both yeast and human interactomes. Predicted functional modules have covered over 90% of the proteins in both organisms, while maintaining a comparable overall accuracy. We found that the combination of both mRNA expression information and biomedical knowledge greatly improved the performance of functional module identification, which is better than those only using protein interaction network weighted with transcriptomic data, literature knowledge, or simply unweighted protein interaction network. Our new algorithm also achieved better performance when comparing with some other well-known methods, especially in terms of the positive predictive value (PPV), which indicated the confidence of novel discovery. Conclusion Higher PPV with the multiplex approach suggested that information from both sources has been effectively integrated to reduce false positive. With protein coverage higher than 90%, our algorithm is able to generate more novel biological hypothesis with higher confidence.http://link.springer.com/article/10.1186/s12859-019-2800-yProtein-protein interactionGraph clusteringRandom walkMultiplexTopic modelingGene expression
collection DOAJ
language English
format Article
sources DOAJ
author Lifan Liang
Vicky Chen
Kunju Zhu
Xiaonan Fan
Xinghua Lu
Songjian Lu
spellingShingle Lifan Liang
Vicky Chen
Kunju Zhu
Xiaonan Fan
Xinghua Lu
Songjian Lu
Integrating data and knowledge to identify functional modules of genes: a multilayer approach
BMC Bioinformatics
Protein-protein interaction
Graph clustering
Random walk
Multiplex
Topic modeling
Gene expression
author_facet Lifan Liang
Vicky Chen
Kunju Zhu
Xiaonan Fan
Xinghua Lu
Songjian Lu
author_sort Lifan Liang
title Integrating data and knowledge to identify functional modules of genes: a multilayer approach
title_short Integrating data and knowledge to identify functional modules of genes: a multilayer approach
title_full Integrating data and knowledge to identify functional modules of genes: a multilayer approach
title_fullStr Integrating data and knowledge to identify functional modules of genes: a multilayer approach
title_full_unstemmed Integrating data and knowledge to identify functional modules of genes: a multilayer approach
title_sort integrating data and knowledge to identify functional modules of genes: a multilayer approach
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2019-05-01
description Abstract Background Characterizing the modular structure of cellular network is an important way to identify novel genes for targeted therapeutics. This is made possible by the rising of high-throughput technology. Unfortunately, computational methods to identify functional modules were limited by the data quality issues of high-throughput techniques. This study aims to integrate knowledge extracted from literature to further improve the accuracy of functional module identification. Results Our new model and algorithm were applied to both yeast and human interactomes. Predicted functional modules have covered over 90% of the proteins in both organisms, while maintaining a comparable overall accuracy. We found that the combination of both mRNA expression information and biomedical knowledge greatly improved the performance of functional module identification, which is better than those only using protein interaction network weighted with transcriptomic data, literature knowledge, or simply unweighted protein interaction network. Our new algorithm also achieved better performance when comparing with some other well-known methods, especially in terms of the positive predictive value (PPV), which indicated the confidence of novel discovery. Conclusion Higher PPV with the multiplex approach suggested that information from both sources has been effectively integrated to reduce false positive. With protein coverage higher than 90%, our algorithm is able to generate more novel biological hypothesis with higher confidence.
topic Protein-protein interaction
Graph clustering
Random walk
Multiplex
Topic modeling
Gene expression
url http://link.springer.com/article/10.1186/s12859-019-2800-y
work_keys_str_mv AT lifanliang integratingdataandknowledgetoidentifyfunctionalmodulesofgenesamultilayerapproach
AT vickychen integratingdataandknowledgetoidentifyfunctionalmodulesofgenesamultilayerapproach
AT kunjuzhu integratingdataandknowledgetoidentifyfunctionalmodulesofgenesamultilayerapproach
AT xiaonanfan integratingdataandknowledgetoidentifyfunctionalmodulesofgenesamultilayerapproach
AT xinghualu integratingdataandknowledgetoidentifyfunctionalmodulesofgenesamultilayerapproach
AT songjianlu integratingdataandknowledgetoidentifyfunctionalmodulesofgenesamultilayerapproach
_version_ 1724791702930063360