Effect of Data Scaling Methods on Machine Learning Algorithms and Model Performance

Heart disease, one of the main reasons behind the high mortality rate around the world, requires a sophisticated and expensive diagnosis process. In the recent past, much literature has demonstrated machine learning approaches as an opportunity to efficiently diagnose heart disease patients. However...

Full description

Bibliographic Details
Main Authors: Md Manjurul Ahsan, M. A. Parvez Mahmud, Pritom Kumar Saha, Kishor Datta Gupta, Zahed Siddique
Format: Article
Language:English
Published: MDPI AG 2021-07-01
Series:Technologies
Subjects:
Online Access:https://www.mdpi.com/2227-7080/9/3/52
id doaj-56781786dd5e4c53aa34aded6b16b12e
record_format Article
spelling doaj-56781786dd5e4c53aa34aded6b16b12e2021-09-26T01:32:21ZengMDPI AGTechnologies2227-70802021-07-019525210.3390/technologies9030052Effect of Data Scaling Methods on Machine Learning Algorithms and Model PerformanceMd Manjurul Ahsan0M. A. Parvez Mahmud1Pritom Kumar Saha2Kishor Datta Gupta3Zahed Siddique4School of Industrial and Systems Engineering, University of Oklahoma, Norman, OK 73019, USASchool of Engineering, Deakin University, Waurn Ponds, VIC 3216, AustraliaMewbourne College of Earth and Energy, University of Oklahoma, Norman, OK 73019, USADepartment of Computer Science, University of Memphis, Memphis, TN 38111, USASchool of Aerospace and Mechanical Engineering, University of Oklahoma, Norman, OK 73019, USAHeart disease, one of the main reasons behind the high mortality rate around the world, requires a sophisticated and expensive diagnosis process. In the recent past, much literature has demonstrated machine learning approaches as an opportunity to efficiently diagnose heart disease patients. However, challenges associated with datasets such as missing data, inconsistent data, and mixed data (containing inconsistent missing data both as numerical and categorical) are often obstacles in medical diagnosis. This inconsistency led to a higher probability of misprediction and a misled result. Data preprocessing steps like feature reduction, data conversion, and data scaling are employed to form a standard dataset—such measures play a crucial role in reducing inaccuracy in final prediction. This paper aims to evaluate eleven machine learning (ML) algorithms—Logistic Regression (LR), Linear Discriminant Analysis (LDA), K-Nearest Neighbors (KNN), Classification and Regression Trees (CART), Naive Bayes (NB), Support Vector Machine (SVM), XGBoost (XGB), Random Forest Classifier (RF), Gradient Boost (GB), AdaBoost (AB), Extra Tree Classifier (ET)—and six different data scaling methods—Normalization (NR), Standscale (SS), MinMax (MM), MaxAbs (MA), Robust Scaler (RS), and Quantile Transformer (QT) on a dataset comprising of information of patients with heart disease. The result shows that CART, along with RS or QT, outperforms all other ML algorithms with 100% accuracy, 100% precision, 99% recall, and 100% F1 score. The study outcomes demonstrate that the model’s performance varies depending on the data scaling method.https://www.mdpi.com/2227-7080/9/3/52heart diseasemachine learning algorithmdata scalingpredictionautomated model
collection DOAJ
language English
format Article
sources DOAJ
author Md Manjurul Ahsan
M. A. Parvez Mahmud
Pritom Kumar Saha
Kishor Datta Gupta
Zahed Siddique
spellingShingle Md Manjurul Ahsan
M. A. Parvez Mahmud
Pritom Kumar Saha
Kishor Datta Gupta
Zahed Siddique
Effect of Data Scaling Methods on Machine Learning Algorithms and Model Performance
Technologies
heart disease
machine learning algorithm
data scaling
prediction
automated model
author_facet Md Manjurul Ahsan
M. A. Parvez Mahmud
Pritom Kumar Saha
Kishor Datta Gupta
Zahed Siddique
author_sort Md Manjurul Ahsan
title Effect of Data Scaling Methods on Machine Learning Algorithms and Model Performance
title_short Effect of Data Scaling Methods on Machine Learning Algorithms and Model Performance
title_full Effect of Data Scaling Methods on Machine Learning Algorithms and Model Performance
title_fullStr Effect of Data Scaling Methods on Machine Learning Algorithms and Model Performance
title_full_unstemmed Effect of Data Scaling Methods on Machine Learning Algorithms and Model Performance
title_sort effect of data scaling methods on machine learning algorithms and model performance
publisher MDPI AG
series Technologies
issn 2227-7080
publishDate 2021-07-01
description Heart disease, one of the main reasons behind the high mortality rate around the world, requires a sophisticated and expensive diagnosis process. In the recent past, much literature has demonstrated machine learning approaches as an opportunity to efficiently diagnose heart disease patients. However, challenges associated with datasets such as missing data, inconsistent data, and mixed data (containing inconsistent missing data both as numerical and categorical) are often obstacles in medical diagnosis. This inconsistency led to a higher probability of misprediction and a misled result. Data preprocessing steps like feature reduction, data conversion, and data scaling are employed to form a standard dataset—such measures play a crucial role in reducing inaccuracy in final prediction. This paper aims to evaluate eleven machine learning (ML) algorithms—Logistic Regression (LR), Linear Discriminant Analysis (LDA), K-Nearest Neighbors (KNN), Classification and Regression Trees (CART), Naive Bayes (NB), Support Vector Machine (SVM), XGBoost (XGB), Random Forest Classifier (RF), Gradient Boost (GB), AdaBoost (AB), Extra Tree Classifier (ET)—and six different data scaling methods—Normalization (NR), Standscale (SS), MinMax (MM), MaxAbs (MA), Robust Scaler (RS), and Quantile Transformer (QT) on a dataset comprising of information of patients with heart disease. The result shows that CART, along with RS or QT, outperforms all other ML algorithms with 100% accuracy, 100% precision, 99% recall, and 100% F1 score. The study outcomes demonstrate that the model’s performance varies depending on the data scaling method.
topic heart disease
machine learning algorithm
data scaling
prediction
automated model
url https://www.mdpi.com/2227-7080/9/3/52
work_keys_str_mv AT mdmanjurulahsan effectofdatascalingmethodsonmachinelearningalgorithmsandmodelperformance
AT maparvezmahmud effectofdatascalingmethodsonmachinelearningalgorithmsandmodelperformance
AT pritomkumarsaha effectofdatascalingmethodsonmachinelearningalgorithmsandmodelperformance
AT kishordattagupta effectofdatascalingmethodsonmachinelearningalgorithmsandmodelperformance
AT zahedsiddique effectofdatascalingmethodsonmachinelearningalgorithmsandmodelperformance
_version_ 1716868767817924608