Dealing with Missing Values in Data

Many existing industrial and research data sets contain missing values due to various reasons, such as manual data entry procedures, equipment errors and incorrect measurements. Problems associated with missing values are loss of efficiency, complications in handling and analyzing the data and bias...

Full description

Bibliographic Details
Main Author: Jiri Kaiser
Format: Article
Language:English
Published: Czech Society of Systems Integration 2014-01-01
Series:Journal of Systems Integration
Subjects:
Online Access:http://si-journal.org/index.php/JSI/article/viewFile/178/134
Description
Summary:Many existing industrial and research data sets contain missing values due to various reasons, such as manual data entry procedures, equipment errors and incorrect measurements. Problems associated with missing values are loss of efficiency, complications in handling and analyzing the data and bias resulting from differences between missing and complete data. The important factor for selection of approach to missing values is missing data mechanism. There are various strategies for dealing with missing values. Some analytical methods have their own approach to handle missing values. Data set reduction is another option. Finally missing values problem can be handled by missing values imputation. This paper presents simple methods for missing values imputation like using most common value, mean or median, closest fit approach and methods based on data mining algorithms like k-nearest neighbor, neural networks and association rules, discusses their usability and presents issues with their applicability on examples.
ISSN:1804-2724