Type2 diabetes mellitus prediction using data mining algorithms based on the long-noncoding RNAs expression: a comparison of four data mining approaches

Abstract Background About 90% of patients who have diabetes suffer from Type 2 DM (T2DM). Many studies suggest using the significant role of lncRNAs to improve the diagnosis of T2DM. Machine learning and Data Mining techniques are tools that can improve the analysis and interpretation or extraction...

Full description

Bibliographic Details
Main Authors: Faranak Kazerouni, Azadeh Bayani, Farkhondeh Asadi, Leyla Saeidi, Nasrin Parvizi, Zahra Mansoori
Format: Article
Language:English
Published: BMC 2020-08-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-020-03719-8
id doaj-f08f36519324430dadbf3f558887e2df
record_format Article
spelling doaj-f08f36519324430dadbf3f558887e2df2020-11-25T03:01:11ZengBMCBMC Bioinformatics1471-21052020-08-0121111310.1186/s12859-020-03719-8Type2 diabetes mellitus prediction using data mining algorithms based on the long-noncoding RNAs expression: a comparison of four data mining approachesFaranak Kazerouni0Azadeh Bayani1Farkhondeh Asadi2Leyla Saeidi3Nasrin Parvizi4Zahra Mansoori5Department of Laboratory Medicine, School of Allied Medical Sciences, Shahid Beheshti University of Medical SciencesDepartment of Health Information Technology and Management, School of Allied Medical Sciences, Shahid Beheshti University of Medical SciencesDepartment of Health Information Technology and Management, School of Allied Medical Sciences, Shahid Beheshti University of Medical SciencesDepartment of Clinical Biochemistry, School of Medicine, Tehran University of Medical SciencesDepartment of Genetics, Faculty of Medicine, Babol University of Medical SciencesDepartment of Laboratory Medicine, School of Allied Medical Sciences, Shahid Beheshti University of Medical SciencesAbstract Background About 90% of patients who have diabetes suffer from Type 2 DM (T2DM). Many studies suggest using the significant role of lncRNAs to improve the diagnosis of T2DM. Machine learning and Data Mining techniques are tools that can improve the analysis and interpretation or extraction of knowledge from the data. These techniques may enhance the prognosis and diagnosis associated with reducing diseases such as T2DM. We applied four classification models, including K-nearest neighbor (KNN), support vector machine (SVM), logistic regression, and artificial neural networks (ANN) for diagnosing T2DM, and we compared the diagnostic power of these algorithms with each other. We performed the algorithms on six LncRNA variables (LINC00523, LINC00995, HCG27_201, TPT1-AS1, LY86-AS1, DKFZP) and demographic data. Results To select the best performance, we considered the AUC, sensitivity, specificity, plotted the ROC curve, and showed the average curve and range. The mean AUC for the KNN algorithm was 91% with 0.09 standard deviation (SD); the mean sensitivity and specificity were 96 and 85%, respectively. After applying the SVM algorithm, the mean AUC obtained 95% after stratified 10-fold cross-validation, and the SD obtained 0.05. The mean sensitivity and specificity were 95 and 86%, respectively. The mean AUC for ANN and the SD were 93% and 0.03, also the mean sensitivity and specificity were 78 and 85%. At last, for the logistic regression algorithm, our results showed 95% of mean AUC, and the SD of 0.05, the mean sensitivity and specificity were 92 and 85%, respectively. According to the ROCs, the Logistic Regression and SVM had a better area under the curve compared to the others. Conclusion We aimed to find the best data mining approach for the prediction of T2DM using six lncRNA expression. According to the finding, the maximum AUC dedicated to SVM and logistic regression, among others, KNN and ANN also had the high mean AUC and small standard deviations of AUC scores among the approaches, KNN had the highest mean sensitivity and the highest specificity belonged to SVM. This study’s result could improve our knowledge about the early detection and diagnosis of T2DM using the lncRNAs as biomarkers.http://link.springer.com/article/10.1186/s12859-020-03719-8Data miningGene expressionMachine learning algorithmsType 2 diabetes mellitus
collection DOAJ
language English
format Article
sources DOAJ
author Faranak Kazerouni
Azadeh Bayani
Farkhondeh Asadi
Leyla Saeidi
Nasrin Parvizi
Zahra Mansoori
spellingShingle Faranak Kazerouni
Azadeh Bayani
Farkhondeh Asadi
Leyla Saeidi
Nasrin Parvizi
Zahra Mansoori
Type2 diabetes mellitus prediction using data mining algorithms based on the long-noncoding RNAs expression: a comparison of four data mining approaches
BMC Bioinformatics
Data mining
Gene expression
Machine learning algorithms
Type 2 diabetes mellitus
author_facet Faranak Kazerouni
Azadeh Bayani
Farkhondeh Asadi
Leyla Saeidi
Nasrin Parvizi
Zahra Mansoori
author_sort Faranak Kazerouni
title Type2 diabetes mellitus prediction using data mining algorithms based on the long-noncoding RNAs expression: a comparison of four data mining approaches
title_short Type2 diabetes mellitus prediction using data mining algorithms based on the long-noncoding RNAs expression: a comparison of four data mining approaches
title_full Type2 diabetes mellitus prediction using data mining algorithms based on the long-noncoding RNAs expression: a comparison of four data mining approaches
title_fullStr Type2 diabetes mellitus prediction using data mining algorithms based on the long-noncoding RNAs expression: a comparison of four data mining approaches
title_full_unstemmed Type2 diabetes mellitus prediction using data mining algorithms based on the long-noncoding RNAs expression: a comparison of four data mining approaches
title_sort type2 diabetes mellitus prediction using data mining algorithms based on the long-noncoding rnas expression: a comparison of four data mining approaches
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2020-08-01
description Abstract Background About 90% of patients who have diabetes suffer from Type 2 DM (T2DM). Many studies suggest using the significant role of lncRNAs to improve the diagnosis of T2DM. Machine learning and Data Mining techniques are tools that can improve the analysis and interpretation or extraction of knowledge from the data. These techniques may enhance the prognosis and diagnosis associated with reducing diseases such as T2DM. We applied four classification models, including K-nearest neighbor (KNN), support vector machine (SVM), logistic regression, and artificial neural networks (ANN) for diagnosing T2DM, and we compared the diagnostic power of these algorithms with each other. We performed the algorithms on six LncRNA variables (LINC00523, LINC00995, HCG27_201, TPT1-AS1, LY86-AS1, DKFZP) and demographic data. Results To select the best performance, we considered the AUC, sensitivity, specificity, plotted the ROC curve, and showed the average curve and range. The mean AUC for the KNN algorithm was 91% with 0.09 standard deviation (SD); the mean sensitivity and specificity were 96 and 85%, respectively. After applying the SVM algorithm, the mean AUC obtained 95% after stratified 10-fold cross-validation, and the SD obtained 0.05. The mean sensitivity and specificity were 95 and 86%, respectively. The mean AUC for ANN and the SD were 93% and 0.03, also the mean sensitivity and specificity were 78 and 85%. At last, for the logistic regression algorithm, our results showed 95% of mean AUC, and the SD of 0.05, the mean sensitivity and specificity were 92 and 85%, respectively. According to the ROCs, the Logistic Regression and SVM had a better area under the curve compared to the others. Conclusion We aimed to find the best data mining approach for the prediction of T2DM using six lncRNA expression. According to the finding, the maximum AUC dedicated to SVM and logistic regression, among others, KNN and ANN also had the high mean AUC and small standard deviations of AUC scores among the approaches, KNN had the highest mean sensitivity and the highest specificity belonged to SVM. This study’s result could improve our knowledge about the early detection and diagnosis of T2DM using the lncRNAs as biomarkers.
topic Data mining
Gene expression
Machine learning algorithms
Type 2 diabetes mellitus
url http://link.springer.com/article/10.1186/s12859-020-03719-8
work_keys_str_mv AT faranakkazerouni type2diabetesmellituspredictionusingdataminingalgorithmsbasedonthelongnoncodingrnasexpressionacomparisonoffourdataminingapproaches
AT azadehbayani type2diabetesmellituspredictionusingdataminingalgorithmsbasedonthelongnoncodingrnasexpressionacomparisonoffourdataminingapproaches
AT farkhondehasadi type2diabetesmellituspredictionusingdataminingalgorithmsbasedonthelongnoncodingrnasexpressionacomparisonoffourdataminingapproaches
AT leylasaeidi type2diabetesmellituspredictionusingdataminingalgorithmsbasedonthelongnoncodingrnasexpressionacomparisonoffourdataminingapproaches
AT nasrinparvizi type2diabetesmellituspredictionusingdataminingalgorithmsbasedonthelongnoncodingrnasexpressionacomparisonoffourdataminingapproaches
AT zahramansoori type2diabetesmellituspredictionusingdataminingalgorithmsbasedonthelongnoncodingrnasexpressionacomparisonoffourdataminingapproaches
_version_ 1724694492265578496