A Tree Based (m,k)-Anonymity Privacy Preserving Technique For Tabular Data
碩士 === 國立中興大學 === 電機工程學系所 === 107 === Data Publishing contributes to the advancement of data science and the application of knowledge-based decision making. However, data publishing faces the problems of privacy leakage. Once the data is published, sensitive information may be excavated and results...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2019
|
Online Access: | http://ndltd.ncl.edu.tw/handle/646dv4 |
id |
ndltd-TW-107NCHU5441004 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-107NCHU54410042019-05-30T03:57:16Z http://ndltd.ncl.edu.tw/handle/646dv4 A Tree Based (m,k)-Anonymity Privacy Preserving Technique For Tabular Data 適用於表格資料發佈之樹狀(m,k)隱匿技術 Hsu-Heng Chou 周盱衡 碩士 國立中興大學 電機工程學系所 107 Data Publishing contributes to the advancement of data science and the application of knowledge-based decision making. However, data publishing faces the problems of privacy leakage. Once the data is published, sensitive information may be excavated and results in virtual or physical threats and attacks. For example, the ID of an anonymous user may be recognized and the information that he/she is reluctant to bring to light is revealed. More seriously, the revealing of one’s physical location could hazard his/her life safety. Therefore, data should be carefully examined and go through a privacy protection handling process before being released. Nowadays, k-anonymity is still one of the most frequently used privacy preserving model and generalization and perturbation are the common anonymity techniques. However, most of the generalization or perturbation techniques do not consider the data characteristics of high dimensionality thus leads to low data utilization. For handling the privacy preserving problem of high dimensional data, we consider that it is not easy for adversaries to obtain many data attributes to proceed privacy attacks. On the other hand, it is not easy to determine the quasi-identifier attributes. Instead of making all attributes k-anonymized, ensuring any m sub-dimensions of data attributes conform to the k-anonymity condition is probably a compromise to trade of the privacy preserving and data utility. Therefore, to handle the (m,k)-anonymity problem of a tabular data, we propose the (m,k)-anonymity algorithm with a Combination-Tree (C-Tree). The (m,k)-anonymity algorithm searches the C-Tree in a greedy and top-down manner to generalize the attributes of unqualified data records. The C-Tree is built based on the Pascal Theorem to summarize the data for easy of searching the unqualified data and figuring out the equivalent classes for local generalization. Also, we propose the Taxonomy Index Support (TIS) to speed the generalization process. To validate our methods, we conduct experiments with real dataset to study the key factors that influence the Information Loss and utility. According to the experimental results, our method outperforms the previous methods in achieving k-anonymity with lower Information Loss. Besides, the experimental results show a long computing time which is due to the high computational complexity. The future works include designing efficient data structures or algorithms to make the technique serviceable. Hsiao-Ping Tsai 蔡曉萍 2019 學位論文 ; thesis 50 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立中興大學 === 電機工程學系所 === 107 === Data Publishing contributes to the advancement of data science and the application of knowledge-based decision making. However, data publishing faces the problems of privacy leakage. Once the data is published, sensitive information may be excavated and results in virtual or physical threats and attacks. For example, the ID of an anonymous user may be recognized and the information that he/she is reluctant to bring to light is revealed. More seriously, the revealing of one’s physical location could hazard his/her life safety. Therefore, data should be carefully examined and go through a privacy protection handling process before being released.
Nowadays, k-anonymity is still one of the most frequently used privacy preserving model and generalization and perturbation are the common anonymity techniques. However, most of the generalization or perturbation techniques do not consider the data characteristics of high dimensionality thus leads to low data utilization. For handling the privacy preserving problem of high dimensional data, we consider that it is not easy for adversaries to obtain many data attributes to proceed privacy attacks. On the other hand, it is not easy to determine the quasi-identifier attributes. Instead of making all attributes k-anonymized, ensuring any m sub-dimensions of data attributes conform to the k-anonymity condition is probably a compromise to trade of the privacy preserving and data utility. Therefore, to handle the (m,k)-anonymity problem of a tabular data, we propose the (m,k)-anonymity algorithm with a Combination-Tree (C-Tree). The (m,k)-anonymity algorithm searches the C-Tree in a greedy and top-down manner to generalize the attributes of unqualified data records. The C-Tree is built based on the Pascal Theorem to summarize the data for easy of searching the unqualified data and figuring out the equivalent classes for local generalization. Also, we propose the Taxonomy Index Support (TIS) to speed the generalization process.
To validate our methods, we conduct experiments with real dataset to study the key factors that influence the Information Loss and utility. According to the experimental results, our method outperforms the previous methods in achieving k-anonymity with lower Information Loss. Besides, the experimental results show a long computing time which is due to the high computational complexity. The future works include designing efficient data structures or algorithms to make the technique serviceable.
|
author2 |
Hsiao-Ping Tsai |
author_facet |
Hsiao-Ping Tsai Hsu-Heng Chou 周盱衡 |
author |
Hsu-Heng Chou 周盱衡 |
spellingShingle |
Hsu-Heng Chou 周盱衡 A Tree Based (m,k)-Anonymity Privacy Preserving Technique For Tabular Data |
author_sort |
Hsu-Heng Chou |
title |
A Tree Based (m,k)-Anonymity Privacy Preserving Technique For Tabular Data |
title_short |
A Tree Based (m,k)-Anonymity Privacy Preserving Technique For Tabular Data |
title_full |
A Tree Based (m,k)-Anonymity Privacy Preserving Technique For Tabular Data |
title_fullStr |
A Tree Based (m,k)-Anonymity Privacy Preserving Technique For Tabular Data |
title_full_unstemmed |
A Tree Based (m,k)-Anonymity Privacy Preserving Technique For Tabular Data |
title_sort |
tree based (m,k)-anonymity privacy preserving technique for tabular data |
publishDate |
2019 |
url |
http://ndltd.ncl.edu.tw/handle/646dv4 |
work_keys_str_mv |
AT hsuhengchou atreebasedmkanonymityprivacypreservingtechniquefortabulardata AT zhōuxūhéng atreebasedmkanonymityprivacypreservingtechniquefortabulardata AT hsuhengchou shìyòngyúbiǎogézīliàofābùzhīshùzhuàngmkyǐnnìjìshù AT zhōuxūhéng shìyòngyúbiǎogézīliàofābùzhīshùzhuàngmkyǐnnìjìshù AT hsuhengchou treebasedmkanonymityprivacypreservingtechniquefortabulardata AT zhōuxūhéng treebasedmkanonymityprivacypreservingtechniquefortabulardata |
_version_ |
1719196518180716544 |