A machine learning framework for accurately recognizing circular RNAs for clinical decision-supporting

Abstract Background Circular RNAs (circRNAs) are those RNA molecules that lack the poly (A) tails, which present the closed-loop structure. Recent studies emphasized that some circRNAs imply different functions from canonical transcripts, and further associated with complex diseases. Several computa...

Full description

Bibliographic Details
Main Authors: Yidan Wang, Xuanping Zhang, Tao Wang, Jinchun Xing, Zhun Wu, Wei Li, Jiayin Wang
Format: Article
Language:English
Published: BMC 2020-07-01
Series:BMC Medical Informatics and Decision Making
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12911-020-1117-0
id doaj-99863304283f4014bbcd4b17c795aac1
record_format Article
spelling doaj-99863304283f4014bbcd4b17c795aac12020-11-25T03:29:07ZengBMCBMC Medical Informatics and Decision Making1472-69472020-07-0120S311210.1186/s12911-020-1117-0A machine learning framework for accurately recognizing circular RNAs for clinical decision-supportingYidan Wang0Xuanping Zhang1Tao Wang2Jinchun Xing3Zhun Wu4Wei Li5Jiayin Wang6School of Computer Science and Technology, Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong UniversitySchool of Computer Science and Technology, Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong UniversityThe Key Laboratory of Urinary Tract Tumors and Calculi, Department of Urology Surgery, The First Affiliated Hospital, School of Medicine, Xiamen UniversityThe Key Laboratory of Urinary Tract Tumors and Calculi, Department of Urology Surgery, The First Affiliated Hospital, School of Medicine, Xiamen UniversityThe Key Laboratory of Urinary Tract Tumors and Calculi, Department of Urology Surgery, The First Affiliated Hospital, School of Medicine, Xiamen UniversityThe Key Laboratory of Urinary Tract Tumors and Calculi, Department of Urology Surgery, The First Affiliated Hospital, School of Medicine, Xiamen UniversitySchool of Computer Science and Technology, Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong UniversityAbstract Background Circular RNAs (circRNAs) are those RNA molecules that lack the poly (A) tails, which present the closed-loop structure. Recent studies emphasized that some circRNAs imply different functions from canonical transcripts, and further associated with complex diseases. Several computational methods have been developed for detecting circRNAs from RNA-seq data. However, the existing methods prefer to high sensitivity strategies, which always introduce many false positives. Thus, in clinical decision-supporting system, a comprehensive filtering approach is needed for accurately recognizing real circRNAs for decision models. Methods In this paper, we first reviewed the detection strategies of the existing methods. According to the features from RNA-seq data, we showed that any single feature (data signal) selected by the existing strategies cannot accurately distinguish a circRNA. However, we found that some combinations of those features (data signals) could be used as signatures for recognizing circRNAs. To avoid the high computational complexity of the combinational optimization problem, we present CIRCPlus2, which adopts a machine learning framework to recognize real circRNAs according to multiple data signals captured from RNA-seq data. By comparing multiple machine learning frameworks, CIRCPlus2 adopts a Gradient Boosting Decision Tree (GBDT) framework. Results Given a set of candidate circRNAs, reported by any existing detection tool(s), the features of each candidate are extracted from the aligned reads. The GBDT framework can be trained by a training dataset. By applying the selected features on the framework, the predictions on true/false positives are reported. To verify the performance of the proposed approach, we conducted several groups of experiments on both real RNA-seq datasets and a series of simulation datasets with different preset configurations. The results demonstrated that CIRCPlus2 clearly improved the specificities, while it also maintained high levels of sensitivities. Conclusions Filtering false positives is quite important in RNA-seq data analysis pipeline. Machine learning framework is suitable for solving this filtering problem. CIRCPlus2 is an efficient approach to identify the false positive circRNAs from the real ones.http://link.springer.com/article/10.1186/s12911-020-1117-0RNA-seq data analysisCircular RNADetection methodMachine learningHigh precision
collection DOAJ
language English
format Article
sources DOAJ
author Yidan Wang
Xuanping Zhang
Tao Wang
Jinchun Xing
Zhun Wu
Wei Li
Jiayin Wang
spellingShingle Yidan Wang
Xuanping Zhang
Tao Wang
Jinchun Xing
Zhun Wu
Wei Li
Jiayin Wang
A machine learning framework for accurately recognizing circular RNAs for clinical decision-supporting
BMC Medical Informatics and Decision Making
RNA-seq data analysis
Circular RNA
Detection method
Machine learning
High precision
author_facet Yidan Wang
Xuanping Zhang
Tao Wang
Jinchun Xing
Zhun Wu
Wei Li
Jiayin Wang
author_sort Yidan Wang
title A machine learning framework for accurately recognizing circular RNAs for clinical decision-supporting
title_short A machine learning framework for accurately recognizing circular RNAs for clinical decision-supporting
title_full A machine learning framework for accurately recognizing circular RNAs for clinical decision-supporting
title_fullStr A machine learning framework for accurately recognizing circular RNAs for clinical decision-supporting
title_full_unstemmed A machine learning framework for accurately recognizing circular RNAs for clinical decision-supporting
title_sort machine learning framework for accurately recognizing circular rnas for clinical decision-supporting
publisher BMC
series BMC Medical Informatics and Decision Making
issn 1472-6947
publishDate 2020-07-01
description Abstract Background Circular RNAs (circRNAs) are those RNA molecules that lack the poly (A) tails, which present the closed-loop structure. Recent studies emphasized that some circRNAs imply different functions from canonical transcripts, and further associated with complex diseases. Several computational methods have been developed for detecting circRNAs from RNA-seq data. However, the existing methods prefer to high sensitivity strategies, which always introduce many false positives. Thus, in clinical decision-supporting system, a comprehensive filtering approach is needed for accurately recognizing real circRNAs for decision models. Methods In this paper, we first reviewed the detection strategies of the existing methods. According to the features from RNA-seq data, we showed that any single feature (data signal) selected by the existing strategies cannot accurately distinguish a circRNA. However, we found that some combinations of those features (data signals) could be used as signatures for recognizing circRNAs. To avoid the high computational complexity of the combinational optimization problem, we present CIRCPlus2, which adopts a machine learning framework to recognize real circRNAs according to multiple data signals captured from RNA-seq data. By comparing multiple machine learning frameworks, CIRCPlus2 adopts a Gradient Boosting Decision Tree (GBDT) framework. Results Given a set of candidate circRNAs, reported by any existing detection tool(s), the features of each candidate are extracted from the aligned reads. The GBDT framework can be trained by a training dataset. By applying the selected features on the framework, the predictions on true/false positives are reported. To verify the performance of the proposed approach, we conducted several groups of experiments on both real RNA-seq datasets and a series of simulation datasets with different preset configurations. The results demonstrated that CIRCPlus2 clearly improved the specificities, while it also maintained high levels of sensitivities. Conclusions Filtering false positives is quite important in RNA-seq data analysis pipeline. Machine learning framework is suitable for solving this filtering problem. CIRCPlus2 is an efficient approach to identify the false positive circRNAs from the real ones.
topic RNA-seq data analysis
Circular RNA
Detection method
Machine learning
High precision
url http://link.springer.com/article/10.1186/s12911-020-1117-0
work_keys_str_mv AT yidanwang amachinelearningframeworkforaccuratelyrecognizingcircularrnasforclinicaldecisionsupporting
AT xuanpingzhang amachinelearningframeworkforaccuratelyrecognizingcircularrnasforclinicaldecisionsupporting
AT taowang amachinelearningframeworkforaccuratelyrecognizingcircularrnasforclinicaldecisionsupporting
AT jinchunxing amachinelearningframeworkforaccuratelyrecognizingcircularrnasforclinicaldecisionsupporting
AT zhunwu amachinelearningframeworkforaccuratelyrecognizingcircularrnasforclinicaldecisionsupporting
AT weili amachinelearningframeworkforaccuratelyrecognizingcircularrnasforclinicaldecisionsupporting
AT jiayinwang amachinelearningframeworkforaccuratelyrecognizingcircularrnasforclinicaldecisionsupporting
AT yidanwang machinelearningframeworkforaccuratelyrecognizingcircularrnasforclinicaldecisionsupporting
AT xuanpingzhang machinelearningframeworkforaccuratelyrecognizingcircularrnasforclinicaldecisionsupporting
AT taowang machinelearningframeworkforaccuratelyrecognizingcircularrnasforclinicaldecisionsupporting
AT jinchunxing machinelearningframeworkforaccuratelyrecognizingcircularrnasforclinicaldecisionsupporting
AT zhunwu machinelearningframeworkforaccuratelyrecognizingcircularrnasforclinicaldecisionsupporting
AT weili machinelearningframeworkforaccuratelyrecognizingcircularrnasforclinicaldecisionsupporting
AT jiayinwang machinelearningframeworkforaccuratelyrecognizingcircularrnasforclinicaldecisionsupporting
_version_ 1724580512757972992