Prediction of Type 2 Diabetes Based on Machine Learning Algorithm

Prediction of type 2 diabetes (T2D) occurrence allows a person at risk to take actions that can prevent onset or delay the progression of the disease. In this study, we developed a machine learning (ML) model to predict T2D occurrence in the following year (Y + 1) using variables in the current year...

Full description

Bibliographic Details
Main Authors:	Henock M. Deberneh, Intaek Kim
Format:	Article
Language:	English
Published:	MDPI AG 2021-03-01
Series:	International Journal of Environmental Research and Public Health
Subjects:	type 2 diabetes machine learning prediction
Online Access:	https://www.mdpi.com/1660-4601/18/6/3317

id	doaj-5d32856e915a4bd39cb4da8af74279ea
record_format	Article
spelling	doaj-5d32856e915a4bd39cb4da8af74279ea2021-03-24T00:04:58ZengMDPI AGInternational Journal of Environmental Research and Public Health1661-78271660-46012021-03-01183317331710.3390/ijerph18063317Prediction of Type 2 Diabetes Based on Machine Learning AlgorithmHenock M. Deberneh0Intaek Kim1Department of Information and Communications Engineering, Myongji University, 116 Myongji-ro, Yongin, Gyeonggi 17058, KoreaDepartment of Information and Communications Engineering, Myongji University, 116 Myongji-ro, Yongin, Gyeonggi 17058, KoreaPrediction of type 2 diabetes (T2D) occurrence allows a person at risk to take actions that can prevent onset or delay the progression of the disease. In this study, we developed a machine learning (ML) model to predict T2D occurrence in the following year (Y + 1) using variables in the current year (Y). The dataset for this study was collected at a private medical institute as electronic health records from 2013 to 2018. To construct the prediction model, key features were first selected using ANOVA tests, chi-squared tests, and recursive feature elimination methods. The resultant features were fasting plasma glucose (FPG), HbA1c, triglycerides, BMI, gamma-GTP, age, uric acid, sex, smoking, drinking, physical activity, and family history. We then employed logistic regression, random forest, support vector machine, XGBoost, and ensemble machine learning algorithms based on these variables to predict the outcome as normal (non-diabetic), prediabetes, or diabetes. Based on the experimental results, the performance of the prediction model proved to be reasonably good at forecasting the occurrence of T2D in the Korean population. The model can provide clinicians and patients with valuable predictive information on the likelihood of developing T2D. The cross-validation (CV) results showed that the ensemble models had a superior performance to that of the single models. The CV performance of the prediction models was improved by incorporating more medical history from the dataset.https://www.mdpi.com/1660-4601/18/6/3317type 2 diabetesmachine learningprediction
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Henock M. Deberneh Intaek Kim
spellingShingle	Henock M. Deberneh Intaek Kim Prediction of Type 2 Diabetes Based on Machine Learning Algorithm International Journal of Environmental Research and Public Health type 2 diabetes machine learning prediction
author_facet	Henock M. Deberneh Intaek Kim
author_sort	Henock M. Deberneh
title	Prediction of Type 2 Diabetes Based on Machine Learning Algorithm
title_short	Prediction of Type 2 Diabetes Based on Machine Learning Algorithm
title_full	Prediction of Type 2 Diabetes Based on Machine Learning Algorithm
title_fullStr	Prediction of Type 2 Diabetes Based on Machine Learning Algorithm
title_full_unstemmed	Prediction of Type 2 Diabetes Based on Machine Learning Algorithm
title_sort	prediction of type 2 diabetes based on machine learning algorithm
publisher	MDPI AG
series	International Journal of Environmental Research and Public Health
issn	1661-7827 1660-4601
publishDate	2021-03-01
description	Prediction of type 2 diabetes (T2D) occurrence allows a person at risk to take actions that can prevent onset or delay the progression of the disease. In this study, we developed a machine learning (ML) model to predict T2D occurrence in the following year (Y + 1) using variables in the current year (Y). The dataset for this study was collected at a private medical institute as electronic health records from 2013 to 2018. To construct the prediction model, key features were first selected using ANOVA tests, chi-squared tests, and recursive feature elimination methods. The resultant features were fasting plasma glucose (FPG), HbA1c, triglycerides, BMI, gamma-GTP, age, uric acid, sex, smoking, drinking, physical activity, and family history. We then employed logistic regression, random forest, support vector machine, XGBoost, and ensemble machine learning algorithms based on these variables to predict the outcome as normal (non-diabetic), prediabetes, or diabetes. Based on the experimental results, the performance of the prediction model proved to be reasonably good at forecasting the occurrence of T2D in the Korean population. The model can provide clinicians and patients with valuable predictive information on the likelihood of developing T2D. The cross-validation (CV) results showed that the ensemble models had a superior performance to that of the single models. The CV performance of the prediction models was improved by incorporating more medical history from the dataset.
topic	type 2 diabetes machine learning prediction
url	https://www.mdpi.com/1660-4601/18/6/3317
work_keys_str_mv	AT henockmdeberneh predictionoftype2diabetesbasedonmachinelearningalgorithm AT intaekkim predictionoftype2diabetesbasedonmachinelearningalgorithm
_version_	1724205372249473024

Prediction of Type 2 Diabetes Based on Machine Learning Algorithm

Similar Items