LPI-deepGBDT: a multiple-layer deep framework based on gradient boosting decision trees for lncRNA–protein interaction identification

Abstract Background Long noncoding RNAs (lncRNAs) play important roles in various biological and pathological processes. Discovery of lncRNA–protein interactions (LPIs) contributes to understand the biological functions and mechanisms of lncRNAs. Although wet experiments find a few interactions betw...

Full description

Bibliographic Details
Main Authors: Liqian Zhou, Zhao Wang, Xiongfei Tian, Lihong Peng
Format: Article
Language:English
Published: BMC 2021-10-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-021-04399-8
id doaj-3d8aa4916bf94aa9859d4a7503c867b0
record_format Article
spelling doaj-3d8aa4916bf94aa9859d4a7503c867b02021-10-10T11:14:34ZengBMCBMC Bioinformatics1471-21052021-10-0122112410.1186/s12859-021-04399-8LPI-deepGBDT: a multiple-layer deep framework based on gradient boosting decision trees for lncRNA–protein interaction identificationLiqian Zhou0Zhao Wang1Xiongfei Tian2Lihong Peng3School of Computer Science, Hunan University of TechnologySchool of Computer Science, Hunan University of TechnologySchool of Computer Science, Hunan University of TechnologySchool of Computer Science, Hunan University of TechnologyAbstract Background Long noncoding RNAs (lncRNAs) play important roles in various biological and pathological processes. Discovery of lncRNA–protein interactions (LPIs) contributes to understand the biological functions and mechanisms of lncRNAs. Although wet experiments find a few interactions between lncRNAs and proteins, experimental techniques are costly and time-consuming. Therefore, computational methods are increasingly exploited to uncover the possible associations. However, existing computational methods have several limitations. First, majority of them were measured based on one simple dataset, which may result in the prediction bias. Second, few of them are applied to identify relevant data for new lncRNAs (or proteins). Finally, they failed to utilize diverse biological information of lncRNAs and proteins. Results Under the feed-forward deep architecture based on gradient boosting decision trees (LPI-deepGBDT), this work focuses on classify unobserved LPIs. First, three human LPI datasets and two plant LPI datasets are arranged. Second, the biological features of lncRNAs and proteins are extracted by Pyfeat and BioProt, respectively. Thirdly, the features are dimensionally reduced and concatenated as a vector to represent an lncRNA–protein pair. Finally, a deep architecture composed of forward mappings and inverse mappings is developed to predict underlying linkages between lncRNAs and proteins. LPI-deepGBDT is compared with five classical LPI prediction models (LPI-BLS, LPI-CatBoost, PLIPCOM, LPI-SKF, and LPI-HNM) under three cross validations on lncRNAs, proteins, lncRNA–protein pairs, respectively. It obtains the best average AUC and AUPR values under the majority of situations, significantly outperforming other five LPI identification methods. That is, AUCs computed by LPI-deepGBDT are 0.8321, 0.6815, and 0.9073, respectively and AUPRs are 0.8095, 0.6771, and 0.8849, respectively. The results demonstrate the powerful classification ability of LPI-deepGBDT. Case study analyses show that there may be interactions between GAS5 and Q15717, RAB30-AS1 and O00425, and LINC-01572 and P35637. Conclusions Integrating ensemble learning and hierarchical distributed representations and building a multiple-layered deep architecture, this work improves LPI prediction performance as well as effectively probes interaction data for new lncRNAs/proteins.https://doi.org/10.1186/s12859-021-04399-8lncRNA–protein interactionMultiple-layer deep architectureGradient boosting decision tree
collection DOAJ
language English
format Article
sources DOAJ
author Liqian Zhou
Zhao Wang
Xiongfei Tian
Lihong Peng
spellingShingle Liqian Zhou
Zhao Wang
Xiongfei Tian
Lihong Peng
LPI-deepGBDT: a multiple-layer deep framework based on gradient boosting decision trees for lncRNA–protein interaction identification
BMC Bioinformatics
lncRNA–protein interaction
Multiple-layer deep architecture
Gradient boosting decision tree
author_facet Liqian Zhou
Zhao Wang
Xiongfei Tian
Lihong Peng
author_sort Liqian Zhou
title LPI-deepGBDT: a multiple-layer deep framework based on gradient boosting decision trees for lncRNA–protein interaction identification
title_short LPI-deepGBDT: a multiple-layer deep framework based on gradient boosting decision trees for lncRNA–protein interaction identification
title_full LPI-deepGBDT: a multiple-layer deep framework based on gradient boosting decision trees for lncRNA–protein interaction identification
title_fullStr LPI-deepGBDT: a multiple-layer deep framework based on gradient boosting decision trees for lncRNA–protein interaction identification
title_full_unstemmed LPI-deepGBDT: a multiple-layer deep framework based on gradient boosting decision trees for lncRNA–protein interaction identification
title_sort lpi-deepgbdt: a multiple-layer deep framework based on gradient boosting decision trees for lncrna–protein interaction identification
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2021-10-01
description Abstract Background Long noncoding RNAs (lncRNAs) play important roles in various biological and pathological processes. Discovery of lncRNA–protein interactions (LPIs) contributes to understand the biological functions and mechanisms of lncRNAs. Although wet experiments find a few interactions between lncRNAs and proteins, experimental techniques are costly and time-consuming. Therefore, computational methods are increasingly exploited to uncover the possible associations. However, existing computational methods have several limitations. First, majority of them were measured based on one simple dataset, which may result in the prediction bias. Second, few of them are applied to identify relevant data for new lncRNAs (or proteins). Finally, they failed to utilize diverse biological information of lncRNAs and proteins. Results Under the feed-forward deep architecture based on gradient boosting decision trees (LPI-deepGBDT), this work focuses on classify unobserved LPIs. First, three human LPI datasets and two plant LPI datasets are arranged. Second, the biological features of lncRNAs and proteins are extracted by Pyfeat and BioProt, respectively. Thirdly, the features are dimensionally reduced and concatenated as a vector to represent an lncRNA–protein pair. Finally, a deep architecture composed of forward mappings and inverse mappings is developed to predict underlying linkages between lncRNAs and proteins. LPI-deepGBDT is compared with five classical LPI prediction models (LPI-BLS, LPI-CatBoost, PLIPCOM, LPI-SKF, and LPI-HNM) under three cross validations on lncRNAs, proteins, lncRNA–protein pairs, respectively. It obtains the best average AUC and AUPR values under the majority of situations, significantly outperforming other five LPI identification methods. That is, AUCs computed by LPI-deepGBDT are 0.8321, 0.6815, and 0.9073, respectively and AUPRs are 0.8095, 0.6771, and 0.8849, respectively. The results demonstrate the powerful classification ability of LPI-deepGBDT. Case study analyses show that there may be interactions between GAS5 and Q15717, RAB30-AS1 and O00425, and LINC-01572 and P35637. Conclusions Integrating ensemble learning and hierarchical distributed representations and building a multiple-layered deep architecture, this work improves LPI prediction performance as well as effectively probes interaction data for new lncRNAs/proteins.
topic lncRNA–protein interaction
Multiple-layer deep architecture
Gradient boosting decision tree
url https://doi.org/10.1186/s12859-021-04399-8
work_keys_str_mv AT liqianzhou lpideepgbdtamultiplelayerdeepframeworkbasedongradientboostingdecisiontreesforlncrnaproteininteractionidentification
AT zhaowang lpideepgbdtamultiplelayerdeepframeworkbasedongradientboostingdecisiontreesforlncrnaproteininteractionidentification
AT xiongfeitian lpideepgbdtamultiplelayerdeepframeworkbasedongradientboostingdecisiontreesforlncrnaproteininteractionidentification
AT lihongpeng lpideepgbdtamultiplelayerdeepframeworkbasedongradientboostingdecisiontreesforlncrnaproteininteractionidentification
_version_ 1716829894486261760