High-Performance Machine Learning for Large-Scale Data Classification considering Class Imbalance

Currently, data classification is one of the most important ways to analysis data. However, along with the development of data collection, transmission, and storage technologies, the scale of the data has been sharply increased. Additionally, due to multiple classes and imbalanced data distribution...

Full description

Bibliographic Details
Main Authors: Yang Liu, Xiang Li, Xianbang Chen, Xi Wang, Huaqiang Li
Format: Article
Language:English
Published: Hindawi Limited 2020-01-01
Series:Scientific Programming
Online Access:http://dx.doi.org/10.1155/2020/1953461
id doaj-80e0ae9824fa492a969de1f803a4f3a9
record_format Article
spelling doaj-80e0ae9824fa492a969de1f803a4f3a92021-07-02T19:47:42ZengHindawi LimitedScientific Programming1058-92441875-919X2020-01-01202010.1155/2020/19534611953461High-Performance Machine Learning for Large-Scale Data Classification considering Class ImbalanceYang Liu0Xiang Li1Xianbang Chen2Xi Wang3Huaqiang Li4College of Electrical Engineering, Sichuan University, Chengdu 610065, ChinaCollege of Electrical Engineering, Sichuan University, Chengdu 610065, ChinaCollege of Electrical Engineering, Sichuan University, Chengdu 610065, ChinaState Grid Sichuan Economic Research Institute, Chengdu 610041, ChinaCollege of Electrical Engineering, Sichuan University, Chengdu 610065, ChinaCurrently, data classification is one of the most important ways to analysis data. However, along with the development of data collection, transmission, and storage technologies, the scale of the data has been sharply increased. Additionally, due to multiple classes and imbalanced data distribution in the dataset, the class imbalance issue is also gradually highlighted. The traditional machine learning algorithms lack of abilities for handling the aforementioned issues so that the classification efficiency and precision may be significantly impacted. Therefore, this paper presents an improved artificial neural network in enabling the high-performance classification for the imbalanced large volume data. Firstly, the Borderline-SMOTE (synthetic minority oversampling technique) algorithm is employed to balance the training dataset, which potentially aims at improving the training of the back propagation neural network (BPNN), and then, zero-mean, batch-normalization, and rectified linear unit (ReLU) are further employed to optimize the input layer and hidden layers of BPNN. At last, the ensemble learning-based parallelization of the improved BPNN is implemented using the Hadoop framework. Positive conclusions can be summarized according to the experimental results. Benefitting from Borderline-SMOTE, the imbalanced training dataset can be balanced, which improves the training performance and the classification accuracy. The improvements for the input layer and hidden layer also enhance the training performances in terms of convergence. The parallelization and the ensemble learning techniques enable BPNN to implement the high-performance large-scale data classification. The experimental results show the effectiveness of the presented classification algorithm.http://dx.doi.org/10.1155/2020/1953461
collection DOAJ
language English
format Article
sources DOAJ
author Yang Liu
Xiang Li
Xianbang Chen
Xi Wang
Huaqiang Li
spellingShingle Yang Liu
Xiang Li
Xianbang Chen
Xi Wang
Huaqiang Li
High-Performance Machine Learning for Large-Scale Data Classification considering Class Imbalance
Scientific Programming
author_facet Yang Liu
Xiang Li
Xianbang Chen
Xi Wang
Huaqiang Li
author_sort Yang Liu
title High-Performance Machine Learning for Large-Scale Data Classification considering Class Imbalance
title_short High-Performance Machine Learning for Large-Scale Data Classification considering Class Imbalance
title_full High-Performance Machine Learning for Large-Scale Data Classification considering Class Imbalance
title_fullStr High-Performance Machine Learning for Large-Scale Data Classification considering Class Imbalance
title_full_unstemmed High-Performance Machine Learning for Large-Scale Data Classification considering Class Imbalance
title_sort high-performance machine learning for large-scale data classification considering class imbalance
publisher Hindawi Limited
series Scientific Programming
issn 1058-9244
1875-919X
publishDate 2020-01-01
description Currently, data classification is one of the most important ways to analysis data. However, along with the development of data collection, transmission, and storage technologies, the scale of the data has been sharply increased. Additionally, due to multiple classes and imbalanced data distribution in the dataset, the class imbalance issue is also gradually highlighted. The traditional machine learning algorithms lack of abilities for handling the aforementioned issues so that the classification efficiency and precision may be significantly impacted. Therefore, this paper presents an improved artificial neural network in enabling the high-performance classification for the imbalanced large volume data. Firstly, the Borderline-SMOTE (synthetic minority oversampling technique) algorithm is employed to balance the training dataset, which potentially aims at improving the training of the back propagation neural network (BPNN), and then, zero-mean, batch-normalization, and rectified linear unit (ReLU) are further employed to optimize the input layer and hidden layers of BPNN. At last, the ensemble learning-based parallelization of the improved BPNN is implemented using the Hadoop framework. Positive conclusions can be summarized according to the experimental results. Benefitting from Borderline-SMOTE, the imbalanced training dataset can be balanced, which improves the training performance and the classification accuracy. The improvements for the input layer and hidden layer also enhance the training performances in terms of convergence. The parallelization and the ensemble learning techniques enable BPNN to implement the high-performance large-scale data classification. The experimental results show the effectiveness of the presented classification algorithm.
url http://dx.doi.org/10.1155/2020/1953461
work_keys_str_mv AT yangliu highperformancemachinelearningforlargescaledataclassificationconsideringclassimbalance
AT xiangli highperformancemachinelearningforlargescaledataclassificationconsideringclassimbalance
AT xianbangchen highperformancemachinelearningforlargescaledataclassificationconsideringclassimbalance
AT xiwang highperformancemachinelearningforlargescaledataclassificationconsideringclassimbalance
AT huaqiangli highperformancemachinelearningforlargescaledataclassificationconsideringclassimbalance
_version_ 1721323429451268096