Summary: | Data anonymization is a relevant and important field within data privacy, which tries to find a good balance between utility and privacy in data. The field is especially relevant since the GDPR came into force, because the GDPR does not regulate anonymous data. This thesis focuses on anonymization of directory-structured data, which means data structured into a tree of directories. In the thesis, four of the most common models for anonymization of tabular data, k-anonymity, ℓ-diversity, t-closeness and differential privacy, are adapted for anonymization of directory-structured data. This adaptation is done by creating three different approaches for anonymizing directory-structured data: SingleTable, DirectoryWise and RecursiveDirectoryWise. These models and approaches are compared and evaluated using five metrics and three attack scenarios. The results show that there is always a trade-off between utility and privacy when anonymizing data. Especially it was concluded that the differential privacy model when using the RecursiveDirectoryWise approach gives the highest privacy, but also the highest information loss. On the contrary, the k-anonymity model when using the SingleTable approach or the t-closeness model when using the DirectoryWise approach gives the lowest information loss, but also the lowest privacy. The differential privacy model and the RecursiveDirectoryWise approach were also shown to give best protection against the chosen attacks. Finally, it was concluded that the differential privacy model when using the RecursiveDirectoryWise approach, was the most suitable combination to use when trying to follow the GDPR when anonymizing directory-structured data.
|