The Effects of Missing Data Characteristics on the Choice of Imputation Techniques

One major characteristic of data is completeness. Missing data is a significant problem in medical datasets. It leads to incorrect classification of patients and is dangerous to the health management of patients. Many factors lead to the missingness of values in databases in medical datasets. In thi...

Full description

Bibliographic Details
Published in:Vietnam Journal of Computer Science
Main Authors: Oyekale Abel Alade, Ali Selamat, Roselina Sallehuddin
Format: Article
Language:English
Published: World Scientific Publishing 2020-05-01
Subjects:
Online Access:http://www.worldscientific.com/doi/pdf/10.1142/S2196888820500098
_version_ 1851894372902109184
author Oyekale Abel Alade
Ali Selamat
Roselina Sallehuddin
author_facet Oyekale Abel Alade
Ali Selamat
Roselina Sallehuddin
author_sort Oyekale Abel Alade
collection DOAJ
container_title Vietnam Journal of Computer Science
description One major characteristic of data is completeness. Missing data is a significant problem in medical datasets. It leads to incorrect classification of patients and is dangerous to the health management of patients. Many factors lead to the missingness of values in databases in medical datasets. In this paper, we propose the need to examine the causes of missing data in a medical dataset to ensure that the right imputation method is used in solving the problem. The mechanism of missingness in datasets was studied to know the missing pattern of datasets and determine a suitable imputation technique to generate complete datasets. The pattern shows that the missingness of the dataset used in this study is not a monotone missing pattern. Also, single imputation techniques underestimate variance and ignore relationships among the variables; therefore, we used multiple imputations technique that runs in five iterations for the imputation of each missing value. The whole missing values in the dataset were 100% regenerated. The imputed datasets were validated using an extreme learning machine (ELM) classifier. The results show improvement in the accuracy of the imputed datasets. The work can, however, be extended to compare the accuracy of the imputed datasets with the original dataset with different classifiers like support vector machine (SVM), radial basis function (RBF), and ELMs.
format Article
id doaj-art-eaa2dba0acad41d199de1460880d038c
institution Directory of Open Access Journals
issn 2196-8888
2196-8896
language English
publishDate 2020-05-01
publisher World Scientific Publishing
record_format Article
spelling doaj-art-eaa2dba0acad41d199de1460880d038c2025-08-19T22:08:26ZengWorld Scientific PublishingVietnam Journal of Computer Science2196-88882196-88962020-05-017216117710.1142/S219688882050009810.1142/S2196888820500098The Effects of Missing Data Characteristics on the Choice of Imputation TechniquesOyekale Abel Alade0Ali Selamat1Roselina Sallehuddin2School of Computing, Faculty of Engineering, Universiti Teknologi Malaysia, 81310 Johor Bahru, Johor, MalaysiaSchool of Computing, Faculty of Engineering, Universiti Teknologi Malaysia, 81310 Johor Bahru, Johor, MalaysiaSchool of Computing, Faculty of Engineering, Universiti Teknologi Malaysia, 81310 Johor Bahru, Johor, MalaysiaOne major characteristic of data is completeness. Missing data is a significant problem in medical datasets. It leads to incorrect classification of patients and is dangerous to the health management of patients. Many factors lead to the missingness of values in databases in medical datasets. In this paper, we propose the need to examine the causes of missing data in a medical dataset to ensure that the right imputation method is used in solving the problem. The mechanism of missingness in datasets was studied to know the missing pattern of datasets and determine a suitable imputation technique to generate complete datasets. The pattern shows that the missingness of the dataset used in this study is not a monotone missing pattern. Also, single imputation techniques underestimate variance and ignore relationships among the variables; therefore, we used multiple imputations technique that runs in five iterations for the imputation of each missing value. The whole missing values in the dataset were 100% regenerated. The imputed datasets were validated using an extreme learning machine (ELM) classifier. The results show improvement in the accuracy of the imputed datasets. The work can, however, be extended to compare the accuracy of the imputed datasets with the original dataset with different classifiers like support vector machine (SVM), radial basis function (RBF), and ELMs.http://www.worldscientific.com/doi/pdf/10.1142/S2196888820500098imputation techniquesmechanism of missingnessmissing datamissing patternmultiple imputations
spellingShingle Oyekale Abel Alade
Ali Selamat
Roselina Sallehuddin
The Effects of Missing Data Characteristics on the Choice of Imputation Techniques
imputation techniques
mechanism of missingness
missing data
missing pattern
multiple imputations
title The Effects of Missing Data Characteristics on the Choice of Imputation Techniques
title_full The Effects of Missing Data Characteristics on the Choice of Imputation Techniques
title_fullStr The Effects of Missing Data Characteristics on the Choice of Imputation Techniques
title_full_unstemmed The Effects of Missing Data Characteristics on the Choice of Imputation Techniques
title_short The Effects of Missing Data Characteristics on the Choice of Imputation Techniques
title_sort effects of missing data characteristics on the choice of imputation techniques
topic imputation techniques
mechanism of missingness
missing data
missing pattern
multiple imputations
url http://www.worldscientific.com/doi/pdf/10.1142/S2196888820500098
work_keys_str_mv AT oyekaleabelalade theeffectsofmissingdatacharacteristicsonthechoiceofimputationtechniques
AT aliselamat theeffectsofmissingdatacharacteristicsonthechoiceofimputationtechniques
AT roselinasallehuddin theeffectsofmissingdatacharacteristicsonthechoiceofimputationtechniques
AT oyekaleabelalade effectsofmissingdatacharacteristicsonthechoiceofimputationtechniques
AT aliselamat effectsofmissingdatacharacteristicsonthechoiceofimputationtechniques
AT roselinasallehuddin effectsofmissingdatacharacteristicsonthechoiceofimputationtechniques