Improved Feature Selection Model for Big Data Analytics

Although there are many attempts to build an optimal model for feature selection in Big Data applications, the complex nature of processing such kind of data makes it still a big challenge. Accordingly, the data mining process may be obstructed due to the high dimensionality and complexity of huge d...

Full description

Bibliographic Details
Main Authors:	Ibrahim M. El-Hasnony, Sherif I. Barakat, Mohamed Elhoseny, Reham R. Mostafa
Format:	Article
Language:	English
Published:	IEEE 2020-01-01
Series:	IEEE Access
Subjects:	Particle swarm optimization (PSO) grey wolf optimization (GWO) data mining big data analytics feature selection
Online Access:	https://ieeexplore.ieee.org/document/9058715/

id	doaj-7f1fecfba9c54fcaabac3d800b698715
record_format	Article
spelling	doaj-7f1fecfba9c54fcaabac3d800b6987152021-03-30T03:12:22ZengIEEEIEEE Access2169-35362020-01-018669896700410.1109/ACCESS.2020.29862329058715Improved Feature Selection Model for Big Data AnalyticsIbrahim M. El-Hasnony0https://orcid.org/0000-0002-9489-3449Sherif I. Barakat1Mohamed Elhoseny2https://orcid.org/0000-0001-6347-8368Reham R. Mostafa3Information Systems Department, Faculty of Computers and Information Sciences, Mansoura University, Mansoura, EgyptInformation Systems Department, Faculty of Computers and Information Sciences, Mansoura University, Mansoura, EgyptInformation Systems Department, Faculty of Computers and Information Sciences, Mansoura University, Mansoura, EgyptInformation Systems Department, Faculty of Computers and Information Sciences, Mansoura University, Mansoura, EgyptAlthough there are many attempts to build an optimal model for feature selection in Big Data applications, the complex nature of processing such kind of data makes it still a big challenge. Accordingly, the data mining process may be obstructed due to the high dimensionality and complexity of huge data sets. For the most informative features and classification accuracy optimization, the feature selection process constitutes a mandatory pre-processing phase to reduce dataset dimensionality. The exhaustive search for the relevant features is time-consuming. In this paper, a new binary variant of the wrapper feature selection grey wolf optimization and particle swarm optimization is proposed. The K-nearest neighbor classifier with Euclidean separation matrices is used to find the optimal solutions. A tent chaotic map helps in avoiding the algorithm from locked to the local optima problem. The sigmoid function employed for converting the search space from a continuous vector to a binary one to be suitable to the problem of feature selection. Cross-validation K-fold is used to overcome the overfitting issue. A variety of comparisons have been made with well-known and common algorithms, such as the particle swarm optimization algorithm, and the grey wolf optimization algorithm. Twenty datasets are used for the experiments, and statistical analyses are conducted to approve the performance and the effectiveness and of the proposed model with measures like selected features ratio, classification accuracy, and computation time. The cumulative features picked through the twenty datasets were 196 out of 773, as opposed to 393 and 336 in the GWO and the PSO, respectively. The overall accuracy is 90% relative to other algorithms ' 81.6 and 86.8. The total processing time for all datasets equals 184.3 seconds, wherein GWO and PSO equal 272 and 245.6, respectively.https://ieeexplore.ieee.org/document/9058715/Particle swarm optimization (PSO)grey wolf optimization (GWO)data miningbig data analyticsfeature selection
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Ibrahim M. El-Hasnony Sherif I. Barakat Mohamed Elhoseny Reham R. Mostafa
spellingShingle	Ibrahim M. El-Hasnony Sherif I. Barakat Mohamed Elhoseny Reham R. Mostafa Improved Feature Selection Model for Big Data Analytics IEEE Access Particle swarm optimization (PSO) grey wolf optimization (GWO) data mining big data analytics feature selection
author_facet	Ibrahim M. El-Hasnony Sherif I. Barakat Mohamed Elhoseny Reham R. Mostafa
author_sort	Ibrahim M. El-Hasnony
title	Improved Feature Selection Model for Big Data Analytics
title_short	Improved Feature Selection Model for Big Data Analytics
title_full	Improved Feature Selection Model for Big Data Analytics
title_fullStr	Improved Feature Selection Model for Big Data Analytics
title_full_unstemmed	Improved Feature Selection Model for Big Data Analytics
title_sort	improved feature selection model for big data analytics
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2020-01-01
description	Although there are many attempts to build an optimal model for feature selection in Big Data applications, the complex nature of processing such kind of data makes it still a big challenge. Accordingly, the data mining process may be obstructed due to the high dimensionality and complexity of huge data sets. For the most informative features and classification accuracy optimization, the feature selection process constitutes a mandatory pre-processing phase to reduce dataset dimensionality. The exhaustive search for the relevant features is time-consuming. In this paper, a new binary variant of the wrapper feature selection grey wolf optimization and particle swarm optimization is proposed. The K-nearest neighbor classifier with Euclidean separation matrices is used to find the optimal solutions. A tent chaotic map helps in avoiding the algorithm from locked to the local optima problem. The sigmoid function employed for converting the search space from a continuous vector to a binary one to be suitable to the problem of feature selection. Cross-validation K-fold is used to overcome the overfitting issue. A variety of comparisons have been made with well-known and common algorithms, such as the particle swarm optimization algorithm, and the grey wolf optimization algorithm. Twenty datasets are used for the experiments, and statistical analyses are conducted to approve the performance and the effectiveness and of the proposed model with measures like selected features ratio, classification accuracy, and computation time. The cumulative features picked through the twenty datasets were 196 out of 773, as opposed to 393 and 336 in the GWO and the PSO, respectively. The overall accuracy is 90% relative to other algorithms ' 81.6 and 86.8. The total processing time for all datasets equals 184.3 seconds, wherein GWO and PSO equal 272 and 245.6, respectively.
topic	Particle swarm optimization (PSO) grey wolf optimization (GWO) data mining big data analytics feature selection
url	https://ieeexplore.ieee.org/document/9058715/
work_keys_str_mv	AT ibrahimmelhasnony improvedfeatureselectionmodelforbigdataanalytics AT sherifibarakat improvedfeatureselectionmodelforbigdataanalytics AT mohamedelhoseny improvedfeatureselectionmodelforbigdataanalytics AT rehamrmostafa improvedfeatureselectionmodelforbigdataanalytics
_version_	1724183878230343680

Improved Feature Selection Model for Big Data Analytics

Similar Items