Privacy-preserving logistic regression training

Abstract Background Logistic regression is a popular technique used in machine learning to construct classification models. Since the construction of such models is based on computing with large datasets, it is an appealing idea to outsource this computation to a cloud service. The privacy-sensitive...

Full description

Bibliographic Details
Main Authors: Charlotte Bonte, Frederik Vercauteren
Format: Article
Language:English
Published: BMC 2018-10-01
Series:BMC Medical Genomics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12920-018-0398-y
id doaj-75e1a52d2f884f788ef7ebcb8645ab0a
record_format Article
spelling doaj-75e1a52d2f884f788ef7ebcb8645ab0a2021-04-02T08:40:39ZengBMCBMC Medical Genomics1755-87942018-10-0111S4132110.1186/s12920-018-0398-yPrivacy-preserving logistic regression trainingCharlotte Bonte0Frederik Vercauteren1imec-Cosic, Dept. Electrical Engineering, KU Leuvenimec-Cosic, Dept. Electrical Engineering, KU LeuvenAbstract Background Logistic regression is a popular technique used in machine learning to construct classification models. Since the construction of such models is based on computing with large datasets, it is an appealing idea to outsource this computation to a cloud service. The privacy-sensitive nature of the input data requires appropriate privacy preserving measures before outsourcing it. Homomorphic encryption enables one to compute on encrypted data directly, without decryption and can be used to mitigate the privacy concerns raised by using a cloud service. Methods In this paper, we propose an algorithm (and its implementation) to train a logistic regression model on a homomorphically encrypted dataset. The core of our algorithm consists of a new iterative method that can be seen as a simplified form of the fixed Hessian method, but with a much lower multiplicative complexity. Results We test the new method on two interesting real life applications: the first application is in medicine and constructs a model to predict the probability for a patient to have cancer, given genomic data as input; the second application is in finance and the model predicts the probability of a credit card transaction to be fraudulent. The method produces accurate results for both applications, comparable to running standard algorithms on plaintext data. Conclusions This article introduces a new simple iterative algorithm to train a logistic regression model that is tailored to be applied on a homomorphically encrypted dataset. This algorithm can be used as a privacy-preserving technique to build a binary classification model and can be applied in a wide range of problems that can be modelled with logistic regression. Our implementation results show that our method can handle the large datasets used in logistic regression training.http://link.springer.com/article/10.1186/s12920-018-0398-yHomomorphic encryptionLogistic regressionPrivacyFixed Hessian
collection DOAJ
language English
format Article
sources DOAJ
author Charlotte Bonte
Frederik Vercauteren
spellingShingle Charlotte Bonte
Frederik Vercauteren
Privacy-preserving logistic regression training
BMC Medical Genomics
Homomorphic encryption
Logistic regression
Privacy
Fixed Hessian
author_facet Charlotte Bonte
Frederik Vercauteren
author_sort Charlotte Bonte
title Privacy-preserving logistic regression training
title_short Privacy-preserving logistic regression training
title_full Privacy-preserving logistic regression training
title_fullStr Privacy-preserving logistic regression training
title_full_unstemmed Privacy-preserving logistic regression training
title_sort privacy-preserving logistic regression training
publisher BMC
series BMC Medical Genomics
issn 1755-8794
publishDate 2018-10-01
description Abstract Background Logistic regression is a popular technique used in machine learning to construct classification models. Since the construction of such models is based on computing with large datasets, it is an appealing idea to outsource this computation to a cloud service. The privacy-sensitive nature of the input data requires appropriate privacy preserving measures before outsourcing it. Homomorphic encryption enables one to compute on encrypted data directly, without decryption and can be used to mitigate the privacy concerns raised by using a cloud service. Methods In this paper, we propose an algorithm (and its implementation) to train a logistic regression model on a homomorphically encrypted dataset. The core of our algorithm consists of a new iterative method that can be seen as a simplified form of the fixed Hessian method, but with a much lower multiplicative complexity. Results We test the new method on two interesting real life applications: the first application is in medicine and constructs a model to predict the probability for a patient to have cancer, given genomic data as input; the second application is in finance and the model predicts the probability of a credit card transaction to be fraudulent. The method produces accurate results for both applications, comparable to running standard algorithms on plaintext data. Conclusions This article introduces a new simple iterative algorithm to train a logistic regression model that is tailored to be applied on a homomorphically encrypted dataset. This algorithm can be used as a privacy-preserving technique to build a binary classification model and can be applied in a wide range of problems that can be modelled with logistic regression. Our implementation results show that our method can handle the large datasets used in logistic regression training.
topic Homomorphic encryption
Logistic regression
Privacy
Fixed Hessian
url http://link.springer.com/article/10.1186/s12920-018-0398-y
work_keys_str_mv AT charlottebonte privacypreservinglogisticregressiontraining
AT frederikvercauteren privacypreservinglogisticregressiontraining
_version_ 1724170243264217088