Parallel Distance-Based Instance Selection Algorithm for Feed-Forward Neural Network

Instance selection endeavors to decide which instances from the data set should be maintained for further use during the learning process. It can result in increased generalization of the learning model, shorter time of the learning process, or scaling up to large data sources. This paper presents a...

Full description

Bibliographic Details
Main Author: Fuangkhon Piyabute
Format: Article
Language:English
Published: De Gruyter 2017-04-01
Series:Journal of Intelligent Systems
Subjects:
Online Access:https://doi.org/10.1515/jisys-2015-0039
id doaj-291824d0c49147f099a4a4a2a6bd189e
record_format Article
spelling doaj-291824d0c49147f099a4a4a2a6bd189e2021-09-06T19:40:36ZengDe GruyterJournal of Intelligent Systems0334-18602191-026X2017-04-0126233535810.1515/jisys-2015-0039Parallel Distance-Based Instance Selection Algorithm for Feed-Forward Neural NetworkFuangkhon Piyabute0Department of Business Information Systems, Assumption University, Samut Prakan 10540, Kingdom of ThailandInstance selection endeavors to decide which instances from the data set should be maintained for further use during the learning process. It can result in increased generalization of the learning model, shorter time of the learning process, or scaling up to large data sources. This paper presents a parallel distance-based instance selection approach for a feed-forward neural network (FFNN), which can utilize all available processing power to reduce the data set while obtaining similar levels of classification accuracy as when the original data set is used. The algorithm identifies the instances at the decision boundary between consecutive classes of data, which are essential for placing hyperplane decision surfaces, and retains these instances in the reduced data set (subset). Each identified instance, called a prototype, is one of the representatives of the decision boundary of its class that constitutes the shape or distribution model of the data set. No feature or dimension is sacrificed in the reduction process. Regarding reduction capability, the algorithm obtains approximately 85% reduction power on non-overlapping two-class synthetic data sets, 70% reduction power on highly overlapping two-class synthetic data sets, and 77% reduction power on multiclass real-world data sets. Regarding generalization, the reduced data sets obtain similar levels of classification accuracy as when the original data set is used on both FFNN and support vector machine. Regarding execution time requirement, the speedup of the parallel algorithm over the serial algorithm is proportional to the number of threads the processor can run concurrently.https://doi.org/10.1515/jisys-2015-0039data miningdata reductionneural networkparallel algorithmsupport vector machine68t01
collection DOAJ
language English
format Article
sources DOAJ
author Fuangkhon Piyabute
spellingShingle Fuangkhon Piyabute
Parallel Distance-Based Instance Selection Algorithm for Feed-Forward Neural Network
Journal of Intelligent Systems
data mining
data reduction
neural network
parallel algorithm
support vector machine
68t01
author_facet Fuangkhon Piyabute
author_sort Fuangkhon Piyabute
title Parallel Distance-Based Instance Selection Algorithm for Feed-Forward Neural Network
title_short Parallel Distance-Based Instance Selection Algorithm for Feed-Forward Neural Network
title_full Parallel Distance-Based Instance Selection Algorithm for Feed-Forward Neural Network
title_fullStr Parallel Distance-Based Instance Selection Algorithm for Feed-Forward Neural Network
title_full_unstemmed Parallel Distance-Based Instance Selection Algorithm for Feed-Forward Neural Network
title_sort parallel distance-based instance selection algorithm for feed-forward neural network
publisher De Gruyter
series Journal of Intelligent Systems
issn 0334-1860
2191-026X
publishDate 2017-04-01
description Instance selection endeavors to decide which instances from the data set should be maintained for further use during the learning process. It can result in increased generalization of the learning model, shorter time of the learning process, or scaling up to large data sources. This paper presents a parallel distance-based instance selection approach for a feed-forward neural network (FFNN), which can utilize all available processing power to reduce the data set while obtaining similar levels of classification accuracy as when the original data set is used. The algorithm identifies the instances at the decision boundary between consecutive classes of data, which are essential for placing hyperplane decision surfaces, and retains these instances in the reduced data set (subset). Each identified instance, called a prototype, is one of the representatives of the decision boundary of its class that constitutes the shape or distribution model of the data set. No feature or dimension is sacrificed in the reduction process. Regarding reduction capability, the algorithm obtains approximately 85% reduction power on non-overlapping two-class synthetic data sets, 70% reduction power on highly overlapping two-class synthetic data sets, and 77% reduction power on multiclass real-world data sets. Regarding generalization, the reduced data sets obtain similar levels of classification accuracy as when the original data set is used on both FFNN and support vector machine. Regarding execution time requirement, the speedup of the parallel algorithm over the serial algorithm is proportional to the number of threads the processor can run concurrently.
topic data mining
data reduction
neural network
parallel algorithm
support vector machine
68t01
url https://doi.org/10.1515/jisys-2015-0039
work_keys_str_mv AT fuangkhonpiyabute paralleldistancebasedinstanceselectionalgorithmforfeedforwardneuralnetwork
_version_ 1717768106400546816