A New Big Data Model Using Distributed Cluster-Based Resampling for Class-Imbalance Problem
The class imbalance problem, one of the common data irregularities, causes the development of under-represented models. To resolve this issue, the present study proposes a new cluster-based MapReduce design, entitled Distributed Cluster-based Resampling for Imbalanced Big Data (DIBID). The design ai...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Sciendo
2019-12-01
|
Series: | Applied Computer Systems |
Subjects: | |
Online Access: | https://doi.org/10.2478/acss-2019-0013 |
id |
doaj-ae20c2d8188045fd89404a54d6e09f61 |
---|---|
record_format |
Article |
spelling |
doaj-ae20c2d8188045fd89404a54d6e09f612021-09-06T19:41:00ZengSciendoApplied Computer Systems2255-86912019-12-0124210411010.2478/acss-2019-0013acss-2019-0013A New Big Data Model Using Distributed Cluster-Based Resampling for Class-Imbalance ProblemTerzi Duygu Sinanc0Sagiroglu Seref1Department of Computer Engineering, Gazi University, Ankara, TurkeyDepartment of Computer Engineering, Gazi University, Ankara, TurkeyThe class imbalance problem, one of the common data irregularities, causes the development of under-represented models. To resolve this issue, the present study proposes a new cluster-based MapReduce design, entitled Distributed Cluster-based Resampling for Imbalanced Big Data (DIBID). The design aims at modifying the existing dataset to increase the classification success. Within the study, DIBID has been implemented on public datasets under two strategies. The first strategy has been designed to present the success of the model on data sets with different imbalanced ratios. The second strategy has been designed to compare the success of the model with other imbalanced big data solutions in the literature. According to the results, DIBID outperformed other imbalanced big data solutions in the literature and increased area under the curve values between 10 % and 24 % through the case study.https://doi.org/10.2478/acss-2019-0013big datacluster-based resamplingimbalanced big data classificationimbalanced data |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Terzi Duygu Sinanc Sagiroglu Seref |
spellingShingle |
Terzi Duygu Sinanc Sagiroglu Seref A New Big Data Model Using Distributed Cluster-Based Resampling for Class-Imbalance Problem Applied Computer Systems big data cluster-based resampling imbalanced big data classification imbalanced data |
author_facet |
Terzi Duygu Sinanc Sagiroglu Seref |
author_sort |
Terzi Duygu Sinanc |
title |
A New Big Data Model Using Distributed Cluster-Based Resampling for Class-Imbalance Problem |
title_short |
A New Big Data Model Using Distributed Cluster-Based Resampling for Class-Imbalance Problem |
title_full |
A New Big Data Model Using Distributed Cluster-Based Resampling for Class-Imbalance Problem |
title_fullStr |
A New Big Data Model Using Distributed Cluster-Based Resampling for Class-Imbalance Problem |
title_full_unstemmed |
A New Big Data Model Using Distributed Cluster-Based Resampling for Class-Imbalance Problem |
title_sort |
new big data model using distributed cluster-based resampling for class-imbalance problem |
publisher |
Sciendo |
series |
Applied Computer Systems |
issn |
2255-8691 |
publishDate |
2019-12-01 |
description |
The class imbalance problem, one of the common data irregularities, causes the development of under-represented models. To resolve this issue, the present study proposes a new cluster-based MapReduce design, entitled Distributed Cluster-based Resampling for Imbalanced Big Data (DIBID). The design aims at modifying the existing dataset to increase the classification success. Within the study, DIBID has been implemented on public datasets under two strategies. The first strategy has been designed to present the success of the model on data sets with different imbalanced ratios. The second strategy has been designed to compare the success of the model with other imbalanced big data solutions in the literature. According to the results, DIBID outperformed other imbalanced big data solutions in the literature and increased area under the curve values between 10 % and 24 % through the case study. |
topic |
big data cluster-based resampling imbalanced big data classification imbalanced data |
url |
https://doi.org/10.2478/acss-2019-0013 |
work_keys_str_mv |
AT terziduygusinanc anewbigdatamodelusingdistributedclusterbasedresamplingforclassimbalanceproblem AT sagirogluseref anewbigdatamodelusingdistributedclusterbasedresamplingforclassimbalanceproblem AT terziduygusinanc newbigdatamodelusingdistributedclusterbasedresamplingforclassimbalanceproblem AT sagirogluseref newbigdatamodelusingdistributedclusterbasedresamplingforclassimbalanceproblem |
_version_ |
1717767242551132160 |