An Efficient Big Data Anonymization Algorithm Based on Chaos and Perturbation Techniques

The topic of big data has attracted increasing interest in recent years. The emergence of big data leads to new difficulties in terms of protection models used for data privacy, which is of necessity for sharing and processing data. Protecting individuals’ sensitive information while maint...

Full description

Bibliographic Details
Main Authors: Can Eyupoglu, Muhammed Ali Aydin, Abdul Halim Zaim, Ahmet Sertbas
Format: Article
Language:English
Published: MDPI AG 2018-05-01
Series:Entropy
Subjects:
Online Access:http://www.mdpi.com/1099-4300/20/5/373
id doaj-7d8f3dfa8cf04a7cb737a370757a2611
record_format Article
spelling doaj-7d8f3dfa8cf04a7cb737a370757a26112020-11-24T21:03:00ZengMDPI AGEntropy1099-43002018-05-0120537310.3390/e20050373e20050373An Efficient Big Data Anonymization Algorithm Based on Chaos and Perturbation TechniquesCan Eyupoglu0Muhammed Ali Aydin1Abdul Halim Zaim2Ahmet Sertbas3Department of Computer Engineering, Istanbul Commerce University, Istanbul 34840, TurkeyDepartment of Computer Engineering, Istanbul University, Istanbul 34320, TurkeyDepartment of Computer Engineering, Istanbul Commerce University, Istanbul 34840, TurkeyDepartment of Computer Engineering, Istanbul University, Istanbul 34320, TurkeyThe topic of big data has attracted increasing interest in recent years. The emergence of big data leads to new difficulties in terms of protection models used for data privacy, which is of necessity for sharing and processing data. Protecting individuals’ sensitive information while maintaining the usability of the data set published is the most important challenge in privacy preserving. In this regard, data anonymization methods are utilized in order to protect data against identity disclosure and linking attacks. In this study, a novel data anonymization algorithm based on chaos and perturbation has been proposed for privacy and utility preserving in big data. The performance of the proposed algorithm is evaluated in terms of Kullback–Leibler divergence, probabilistic anonymity, classification accuracy, F-measure and execution time. The experimental results have shown that the proposed algorithm is efficient and performs better in terms of Kullback–Leibler divergence, classification accuracy and F-measure compared to most of the existing algorithms using the same data set. Resulting from applying chaos to perturb data, such successful algorithm is promising to be used in privacy preserving data mining and data publishing.http://www.mdpi.com/1099-4300/20/5/373big datachaosdata anonymizationdata perturbationprivacy preserving
collection DOAJ
language English
format Article
sources DOAJ
author Can Eyupoglu
Muhammed Ali Aydin
Abdul Halim Zaim
Ahmet Sertbas
spellingShingle Can Eyupoglu
Muhammed Ali Aydin
Abdul Halim Zaim
Ahmet Sertbas
An Efficient Big Data Anonymization Algorithm Based on Chaos and Perturbation Techniques
Entropy
big data
chaos
data anonymization
data perturbation
privacy preserving
author_facet Can Eyupoglu
Muhammed Ali Aydin
Abdul Halim Zaim
Ahmet Sertbas
author_sort Can Eyupoglu
title An Efficient Big Data Anonymization Algorithm Based on Chaos and Perturbation Techniques
title_short An Efficient Big Data Anonymization Algorithm Based on Chaos and Perturbation Techniques
title_full An Efficient Big Data Anonymization Algorithm Based on Chaos and Perturbation Techniques
title_fullStr An Efficient Big Data Anonymization Algorithm Based on Chaos and Perturbation Techniques
title_full_unstemmed An Efficient Big Data Anonymization Algorithm Based on Chaos and Perturbation Techniques
title_sort efficient big data anonymization algorithm based on chaos and perturbation techniques
publisher MDPI AG
series Entropy
issn 1099-4300
publishDate 2018-05-01
description The topic of big data has attracted increasing interest in recent years. The emergence of big data leads to new difficulties in terms of protection models used for data privacy, which is of necessity for sharing and processing data. Protecting individuals’ sensitive information while maintaining the usability of the data set published is the most important challenge in privacy preserving. In this regard, data anonymization methods are utilized in order to protect data against identity disclosure and linking attacks. In this study, a novel data anonymization algorithm based on chaos and perturbation has been proposed for privacy and utility preserving in big data. The performance of the proposed algorithm is evaluated in terms of Kullback–Leibler divergence, probabilistic anonymity, classification accuracy, F-measure and execution time. The experimental results have shown that the proposed algorithm is efficient and performs better in terms of Kullback–Leibler divergence, classification accuracy and F-measure compared to most of the existing algorithms using the same data set. Resulting from applying chaos to perturb data, such successful algorithm is promising to be used in privacy preserving data mining and data publishing.
topic big data
chaos
data anonymization
data perturbation
privacy preserving
url http://www.mdpi.com/1099-4300/20/5/373
work_keys_str_mv AT caneyupoglu anefficientbigdataanonymizationalgorithmbasedonchaosandperturbationtechniques
AT muhammedaliaydin anefficientbigdataanonymizationalgorithmbasedonchaosandperturbationtechniques
AT abdulhalimzaim anefficientbigdataanonymizationalgorithmbasedonchaosandperturbationtechniques
AT ahmetsertbas anefficientbigdataanonymizationalgorithmbasedonchaosandperturbationtechniques
AT caneyupoglu efficientbigdataanonymizationalgorithmbasedonchaosandperturbationtechniques
AT muhammedaliaydin efficientbigdataanonymizationalgorithmbasedonchaosandperturbationtechniques
AT abdulhalimzaim efficientbigdataanonymizationalgorithmbasedonchaosandperturbationtechniques
AT ahmetsertbas efficientbigdataanonymizationalgorithmbasedonchaosandperturbationtechniques
_version_ 1716774587811758080