Differentially-Private Remote Software Profiling

Bibliographic Details
Main Author: Zhang, Hailong
Language:English
Published: The Ohio State University / OhioLINK 2020
Subjects:
Online Access:http://rave.ohiolink.edu/etdc/view?acc_num=osu1595270049924409
id ndltd-OhioLink-oai-etd.ohiolink.edu-osu1595270049924409
record_format oai_dc
collection NDLTD
language English
sources NDLTD
topic Computer Science
spellingShingle Computer Science
Zhang, Hailong
Differentially-Private Remote Software Profiling
author Zhang, Hailong
author_facet Zhang, Hailong
author_sort Zhang, Hailong
title Differentially-Private Remote Software Profiling
title_short Differentially-Private Remote Software Profiling
title_full Differentially-Private Remote Software Profiling
title_fullStr Differentially-Private Remote Software Profiling
title_full_unstemmed Differentially-Private Remote Software Profiling
title_sort differentially-private remote software profiling
publisher The Ohio State University / OhioLINK
publishDate 2020
url http://rave.ohiolink.edu/etdc/view?acc_num=osu1595270049924409
work_keys_str_mv AT zhanghailong differentiallyprivateremotesoftwareprofiling
_version_ 1719457600299335680
spelling ndltd-OhioLink-oai-etd.ohiolink.edu-osu15952700499244092021-08-03T07:15:42Z Differentially-Private Remote Software Profiling Zhang, Hailong Computer Science Remote profiling of deployed software has been studied in many contexts. In remote profiling, data is collected locally and then sent to a remote server where it is analyzed by the developers of the software and analysts working with them. There are significant privacy concerns about the collection and use of such data. For example, this sensitive data could potentially be misused due to rogue employees, legal proceedings, unethical business practices, or security breaches. The goal of this dissertation is to introduce principled privacy protection guarantees in the data collection process, after the profiling data has been collected locally but before sending it to the remote server. To provide a privacy-by-design solution with well-defined privacy properties, we employ differential privacy (DP), a powerful technique that allows meaningful statistics to be collected for a population without revealing “too much” information about any individual member of the population.The first contribution of this dissertation is a parameterized randomization approach for run-time event frequency profiling that achieves DP with respect to event traces. This approach introduces random noise to the local profile at each user in a way that prevents any adversary from inferring the actual event trace during data gathering. To compute useful statistics, software developers post-process the aggregated profile to account for the randomization. Using program analysis techniques, we extract a priori knowledge about relationships between events in the run-time profile and incorporate these relationships in the post-processing step. This produces frequency estimates that are consistent with the structure of the true event frequencies and reduces the error of the resulting estimates. We perform a study of method call traces from Android apps and show that well-designed solutions can achieve both high accuracy and principled privacy-by-design for the fundamental problem of event frequency profiling.As another setting for software frequency profiling, the profile at each user could be a frequency vector instead of an event trace. We propose an approach that reports the user-level frequency information with the addition of random noise drawn from the Laplace distribution. This approach achieves significantly higher accuracy of profiling results without reducing privacy protections in any substantial way, compared to our first approach introduced earlier. In addition, we propose a novel linear programming formulation to compute the magnitude of random noise that should be added to achieve meaningful privacy protections under certain linear constraints. These constraints are due to the intrinsic static structure of the underlying software and are commonly observed in software systems. To the best of our knowledge, no prior work has incorporated such domain-derived data constraints in the design of a DP analysis. Our experimental analysis shows that the privacy protection can be significantly weakened if the DP design does not take into account these constraints.The third focus for this dissertation is a profiling problem related to control-flow node coverage, where the constraints are with respect to control-flow graph (CFG) nodes. In such data collection, events are represented as nodes in a CFG and the edges in the CFG represent the transitions between events. Every execution of the software corresponds to a subgraph of the CFG that is covered at run time. We propose a novel definition of graph neighbors for a particular run-time covered subgraph, in order to account for the strong correlations between CFG nodes. We demonstrate that such correlations are captured by the notion of dominators, which is traditionally used in compiler optimizations. We use this insight to define the privacy guarantees that need to be achieved by any DP solution for control-flow node coverage analysis, and then propose an analysis to achieve these guarantees by randomizing the coverage information. Our experimental results demonstrate that the proposed analysis can achieve practical accuracy while providing principled DP guarantees.Overall, this dissertation presents several approaches targeting various profiling tasks with the goal of providing principled privacy guarantees for individual user’s profile data, while still allowing developers to learn useful statistical results across the whole user population. These approaches are promising advances in the larger landscape of privacy-preserving software analysis. We believe that applying similar techniques based on DP to other software analysis problems will be a fruitful direction for future work. 2020 English text The Ohio State University / OhioLINK http://rave.ohiolink.edu/etdc/view?acc_num=osu1595270049924409 http://rave.ohiolink.edu/etdc/view?acc_num=osu1595270049924409 unrestricted This thesis or dissertation is protected by copyright: some rights reserved. It is licensed for use under a Creative Commons license. Specific terms and permissions are available from this document's record in the OhioLINK ETD Center.