PlncRNA-HDeep: plant long noncoding RNA prediction using hybrid deep learning based on two encoding styles

Abstract Background Long noncoding RNAs (lncRNAs) play an important role in regulating biological activities and their prediction is significant for exploring biological processes. Long short-term memory (LSTM) and convolutional neural network (CNN) can automatically extract and learn the abstract i...

Full description

Bibliographic Details
Main Authors: Jun Meng, Qiang Kang, Zheng Chang, Yushi Luan
Format: Article
Language:English
Published: BMC 2021-05-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-020-03870-2
id doaj-57c97dee0ba54b1d8730744ada989432
record_format Article
spelling doaj-57c97dee0ba54b1d8730744ada9894322021-05-16T11:36:16ZengBMCBMC Bioinformatics1471-21052021-05-0122S311610.1186/s12859-020-03870-2PlncRNA-HDeep: plant long noncoding RNA prediction using hybrid deep learning based on two encoding stylesJun Meng0Qiang Kang1Zheng Chang2Yushi Luan3School of Computer Science and Technology, Dalian University of TechnologySchool of Computer Science and Technology, Dalian University of TechnologySchool of Computer Science and Technology, Dalian University of TechnologySchool of Bioengineering, Dalian University of TechnologyAbstract Background Long noncoding RNAs (lncRNAs) play an important role in regulating biological activities and their prediction is significant for exploring biological processes. Long short-term memory (LSTM) and convolutional neural network (CNN) can automatically extract and learn the abstract information from the encoded RNA sequences to avoid complex feature engineering. An ensemble model learns the information from multiple perspectives and shows better performance than a single model. It is feasible and interesting that the RNA sequence is considered as sentence and image to train LSTM and CNN respectively, and then the trained models are hybridized to predict lncRNAs. Up to present, there are various predictors for lncRNAs, but few of them are proposed for plant. A reliable and powerful predictor for plant lncRNAs is necessary. Results To boost the performance of predicting lncRNAs, this paper proposes a hybrid deep learning model based on two encoding styles (PlncRNA-HDeep), which does not require prior knowledge and only uses RNA sequences to train the models for predicting plant lncRNAs. It not only learns the diversified information from RNA sequences encoded by p-nucleotide and one-hot encodings, but also takes advantages of lncRNA-LSTM proposed in our previous study and CNN. The parameters are adjusted and three hybrid strategies are tested to maximize its performance. Experiment results show that PlncRNA-HDeep is more effective than lncRNA-LSTM and CNN and obtains 97.9% sensitivity, 95.1% precision, 96.5% accuracy and 96.5% F1 score on Zea mays dataset which are better than those of several shallow machine learning methods (support vector machine, random forest, k-nearest neighbor, decision tree, naive Bayes and logistic regression) and some existing tools (CNCI, PLEK, CPC2, LncADeep and lncRNAnet). Conclusions PlncRNA-HDeep is feasible and obtains the credible predictive results. It may also provide valuable references for other related research.https://doi.org/10.1186/s12859-020-03870-2Deep learningLong short-term memoryConvolutional neural networkPlantlncRNAPrediction
collection DOAJ
language English
format Article
sources DOAJ
author Jun Meng
Qiang Kang
Zheng Chang
Yushi Luan
spellingShingle Jun Meng
Qiang Kang
Zheng Chang
Yushi Luan
PlncRNA-HDeep: plant long noncoding RNA prediction using hybrid deep learning based on two encoding styles
BMC Bioinformatics
Deep learning
Long short-term memory
Convolutional neural network
Plant
lncRNA
Prediction
author_facet Jun Meng
Qiang Kang
Zheng Chang
Yushi Luan
author_sort Jun Meng
title PlncRNA-HDeep: plant long noncoding RNA prediction using hybrid deep learning based on two encoding styles
title_short PlncRNA-HDeep: plant long noncoding RNA prediction using hybrid deep learning based on two encoding styles
title_full PlncRNA-HDeep: plant long noncoding RNA prediction using hybrid deep learning based on two encoding styles
title_fullStr PlncRNA-HDeep: plant long noncoding RNA prediction using hybrid deep learning based on two encoding styles
title_full_unstemmed PlncRNA-HDeep: plant long noncoding RNA prediction using hybrid deep learning based on two encoding styles
title_sort plncrna-hdeep: plant long noncoding rna prediction using hybrid deep learning based on two encoding styles
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2021-05-01
description Abstract Background Long noncoding RNAs (lncRNAs) play an important role in regulating biological activities and their prediction is significant for exploring biological processes. Long short-term memory (LSTM) and convolutional neural network (CNN) can automatically extract and learn the abstract information from the encoded RNA sequences to avoid complex feature engineering. An ensemble model learns the information from multiple perspectives and shows better performance than a single model. It is feasible and interesting that the RNA sequence is considered as sentence and image to train LSTM and CNN respectively, and then the trained models are hybridized to predict lncRNAs. Up to present, there are various predictors for lncRNAs, but few of them are proposed for plant. A reliable and powerful predictor for plant lncRNAs is necessary. Results To boost the performance of predicting lncRNAs, this paper proposes a hybrid deep learning model based on two encoding styles (PlncRNA-HDeep), which does not require prior knowledge and only uses RNA sequences to train the models for predicting plant lncRNAs. It not only learns the diversified information from RNA sequences encoded by p-nucleotide and one-hot encodings, but also takes advantages of lncRNA-LSTM proposed in our previous study and CNN. The parameters are adjusted and three hybrid strategies are tested to maximize its performance. Experiment results show that PlncRNA-HDeep is more effective than lncRNA-LSTM and CNN and obtains 97.9% sensitivity, 95.1% precision, 96.5% accuracy and 96.5% F1 score on Zea mays dataset which are better than those of several shallow machine learning methods (support vector machine, random forest, k-nearest neighbor, decision tree, naive Bayes and logistic regression) and some existing tools (CNCI, PLEK, CPC2, LncADeep and lncRNAnet). Conclusions PlncRNA-HDeep is feasible and obtains the credible predictive results. It may also provide valuable references for other related research.
topic Deep learning
Long short-term memory
Convolutional neural network
Plant
lncRNA
Prediction
url https://doi.org/10.1186/s12859-020-03870-2
work_keys_str_mv AT junmeng plncrnahdeepplantlongnoncodingrnapredictionusinghybriddeeplearningbasedontwoencodingstyles
AT qiangkang plncrnahdeepplantlongnoncodingrnapredictionusinghybriddeeplearningbasedontwoencodingstyles
AT zhengchang plncrnahdeepplantlongnoncodingrnapredictionusinghybriddeeplearningbasedontwoencodingstyles
AT yushiluan plncrnahdeepplantlongnoncodingrnapredictionusinghybriddeeplearningbasedontwoencodingstyles
_version_ 1721439353492733952