Missing Data Estimation using Principle Component Analysis and Autoassociative Neural Networks

Three new methods are used for estimating missing data in a database using Neural Networks, Principal Component Analysis and Genetic Algorithms are presented. The proposed methods are tested on a set of data obtained from the South African Antenatal Survey. The data is a collection of demographic pr...

Full description

Bibliographic Details
Main Authors: Jaisheel Mistry, Fulufhelo V. Nelwamondo, Tshilidzi Marwala
Format: Article
Language:English
Published: International Institute of Informatics and Cybernetics 2009-06-01
Series:Journal of Systemics, Cybernetics and Informatics
Subjects:
Online Access:http://www.iiisci.org/Journal/CV$/sci/pdfs/KS628XI.pdf
Description
Summary:Three new methods are used for estimating missing data in a database using Neural Networks, Principal Component Analysis and Genetic Algorithms are presented. The proposed methods are tested on a set of data obtained from the South African Antenatal Survey. The data is a collection of demographic properties of patients. The proposed methods use Principal Component Analysis to remove redundancies and reduce the dimensionality in the data. Variations of autoassociative Neural Networks are used to further reduce the dimensionality of the data. A Genetic Algorithm is then used to find the missing data by optimizing the error function of the three variants of the Autoencoder Neural Network. The proposed system was tested on data with 1 to 6 missing fields in a single record of data and the accuracy of the estimated values were calculated and recorded. All methods are as accurate as a conventional feedforward neural network structure however the use of the newly proposed methods employs neural network architectures that have fewer hidden nodes.
ISSN:1690-4524