Centralized and distributed learning methods for predictive health analytics

The U.S. health care system is considered costly and highly inefficient, devoting substantial resources to the treatment of acute conditions in a hospital setting rather than focusing on prevention and keeping patients out of the hospital. The potential for cost savings is large; in the U.S. more th...

Full description

Bibliographic Details
Main Author: Brisimi, Theodora
Language:en_US
Published: 2018
Subjects:
Online Access:https://hdl.handle.net/2144/27007
id ndltd-bu.edu-oai-open.bu.edu-2144-27007
record_format oai_dc
spelling ndltd-bu.edu-oai-open.bu.edu-2144-270072019-04-02T06:54:49Z Centralized and distributed learning methods for predictive health analytics Brisimi, Theodora Computer science Centralized and distributed methods Data analytics Diabetes hospitalizations Heart hospitalizations Machine learning Predictive health analytics The U.S. health care system is considered costly and highly inefficient, devoting substantial resources to the treatment of acute conditions in a hospital setting rather than focusing on prevention and keeping patients out of the hospital. The potential for cost savings is large; in the U.S. more than $30 billion are spent each year on hospitalizations deemed preventable, 31% of which is attributed to heart diseases and 20% to diabetes. Motivated by this, our work focuses on developing centralized and distributed learning methods to predict future heart- or diabetes- related hospitalizations based on patient Electronic Health Records (EHRs). We explore a variety of supervised classification methods and we present a novel likelihood ratio based method (K-LRT) that predicts hospitalizations and offers interpretability by identifying the K most significant features that lead to a positive prediction for each patient. Next, assuming that the positive class consists of multiple clusters (hospitalized patients due to different reasons), while the negative class is drawn from a single cluster (non-hospitalized patients healthy in every aspect), we present an alternating optimization approach, which jointly discovers the clusters in the positive class and optimizes the classifiers that separate each positive cluster from the negative samples. We establish the convergence of the method and characterize its VC dimension. Last, we develop a decentralized cluster Primal-Dual Splitting (cPDS) method for large-scale problems, that is computationally efficient and privacy-aware. Such a distributed learning scheme is relevant for multi-institutional collaborations or peer-to-peer applications, allowing the agents to collaborate, while keeping every participant's data private. cPDS is proved to have an improved convergence rate compared to existing centralized and decentralized methods. We test all methods on real EHR data from the Boston Medical Center and compare results in terms of prediction accuracy and interpretability. 2018-02-13T16:27:28Z 2018-02-13T16:27:28Z 2017 2017-11-02T22:14:40Z Thesis/Dissertation https://hdl.handle.net/2144/27007 en_US Attribution 4.0 International http://creativecommons.org/licenses/by/4.0/
collection NDLTD
language en_US
sources NDLTD
topic Computer science
Centralized and distributed methods
Data analytics
Diabetes hospitalizations
Heart hospitalizations
Machine learning
Predictive health analytics
spellingShingle Computer science
Centralized and distributed methods
Data analytics
Diabetes hospitalizations
Heart hospitalizations
Machine learning
Predictive health analytics
Brisimi, Theodora
Centralized and distributed learning methods for predictive health analytics
description The U.S. health care system is considered costly and highly inefficient, devoting substantial resources to the treatment of acute conditions in a hospital setting rather than focusing on prevention and keeping patients out of the hospital. The potential for cost savings is large; in the U.S. more than $30 billion are spent each year on hospitalizations deemed preventable, 31% of which is attributed to heart diseases and 20% to diabetes. Motivated by this, our work focuses on developing centralized and distributed learning methods to predict future heart- or diabetes- related hospitalizations based on patient Electronic Health Records (EHRs). We explore a variety of supervised classification methods and we present a novel likelihood ratio based method (K-LRT) that predicts hospitalizations and offers interpretability by identifying the K most significant features that lead to a positive prediction for each patient. Next, assuming that the positive class consists of multiple clusters (hospitalized patients due to different reasons), while the negative class is drawn from a single cluster (non-hospitalized patients healthy in every aspect), we present an alternating optimization approach, which jointly discovers the clusters in the positive class and optimizes the classifiers that separate each positive cluster from the negative samples. We establish the convergence of the method and characterize its VC dimension. Last, we develop a decentralized cluster Primal-Dual Splitting (cPDS) method for large-scale problems, that is computationally efficient and privacy-aware. Such a distributed learning scheme is relevant for multi-institutional collaborations or peer-to-peer applications, allowing the agents to collaborate, while keeping every participant's data private. cPDS is proved to have an improved convergence rate compared to existing centralized and decentralized methods. We test all methods on real EHR data from the Boston Medical Center and compare results in terms of prediction accuracy and interpretability.
author Brisimi, Theodora
author_facet Brisimi, Theodora
author_sort Brisimi, Theodora
title Centralized and distributed learning methods for predictive health analytics
title_short Centralized and distributed learning methods for predictive health analytics
title_full Centralized and distributed learning methods for predictive health analytics
title_fullStr Centralized and distributed learning methods for predictive health analytics
title_full_unstemmed Centralized and distributed learning methods for predictive health analytics
title_sort centralized and distributed learning methods for predictive health analytics
publishDate 2018
url https://hdl.handle.net/2144/27007
work_keys_str_mv AT brisimitheodora centralizedanddistributedlearningmethodsforpredictivehealthanalytics
_version_ 1719008914557632512