Propensity score synthetic augmentation matching using generative adversarial networks (PSSAM-GAN)

Understanding causality is of crucial importance in biomedical sciences, where developing prediction models is insufficient because the models need to be actionable. However, data sources, such as electronic health records, are observational and often plagued with various types of biases, e.g. confo...

Full description

Bibliographic Details
Main Authors:	Shantanu Ghosh, Christina Boucher, Jiang Bian, Mattia Prosperi
Format:	Article
Language:	English
Published:	Elsevier 2021-01-01
Series:	Computer Methods and Programs in Biomedicine Update
Subjects:	Causal AI Causal inference Deep learning Biomedical informatics Generative adversarial networks Propensity score
Online Access:	http://www.sciencedirect.com/science/article/pii/S2666990021000197

id	doaj-47778e8511a14e03a7bab56af5f06214
record_format	Article
spelling	doaj-47778e8511a14e03a7bab56af5f062142021-07-23T04:50:51ZengElsevierComputer Methods and Programs in Biomedicine Update2666-99002021-01-011100020Propensity score synthetic augmentation matching using generative adversarial networks (PSSAM-GAN)Shantanu Ghosh0Christina Boucher1Jiang Bian2Mattia Prosperi3Corresponding author.; Department of Computer and Information Science and Engineering, University of Florida, Florida 32611, USADepartment of Computer and Information Science and Engineering, University of Florida, Florida 32611, USADepartment of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Florida 32610, USADepartment of Epidemiology, College of Public Health and Health Professions & College of Medicine, University of Florida, Florida 32610, USAUnderstanding causality is of crucial importance in biomedical sciences, where developing prediction models is insufficient because the models need to be actionable. However, data sources, such as electronic health records, are observational and often plagued with various types of biases, e.g. confounding. Although randomized controlled trials are the gold standard to estimate the causal effects of treatment interventions on health outcomes, they are not always possible. Propensity score matching (PSM) is a popular statistical technique for observational data that aims at balancing the characteristics of the population assigned either to a treatment or to a control group, making treatment assignment and outcome independent upon these characteristics. However, matching subjects can reduce the sample size. Inverse probability weighting (IPW) maintains the sample size, but extreme values can lead to instability. While PSM and IPW have been historically used in conjunction with linear regression, machine learning methods –including deep learning with propensity dropout– have been proposed to account for nonlinear treatment assignments. In this work, we propose a novel deep learning approach –the Propensity Score Synthetic Augmentation Matching using Generative Adversarial Networks (PSSAM-GAN)– that aims at keeping the sample size, without IPW, by generating synthetic matches. PSSAM-GAN can be used in conjunction with any other prediction method to estimate treatment effects. Experiments performed on both semi-synthetic (perinatal interventions) and real-world observational data (antibiotic treatments, and job interventions) show that the PSSAM-GAN approach effectively creates balanced datasets, relaxing the weighting/dropout needs for downstream methods, and providing competitive performance in effects estimation as compared to simple GAN and in conjunction with other deep counterfactual learning architectures, e.g. TARNet.http://www.sciencedirect.com/science/article/pii/S2666990021000197Causal AICausal inferenceDeep learningBiomedical informaticsGenerative adversarial networksPropensity score
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Shantanu Ghosh Christina Boucher Jiang Bian Mattia Prosperi
spellingShingle	Shantanu Ghosh Christina Boucher Jiang Bian Mattia Prosperi Propensity score synthetic augmentation matching using generative adversarial networks (PSSAM-GAN) Computer Methods and Programs in Biomedicine Update Causal AI Causal inference Deep learning Biomedical informatics Generative adversarial networks Propensity score
author_facet	Shantanu Ghosh Christina Boucher Jiang Bian Mattia Prosperi
author_sort	Shantanu Ghosh
title	Propensity score synthetic augmentation matching using generative adversarial networks (PSSAM-GAN)
title_short	Propensity score synthetic augmentation matching using generative adversarial networks (PSSAM-GAN)
title_full	Propensity score synthetic augmentation matching using generative adversarial networks (PSSAM-GAN)
title_fullStr	Propensity score synthetic augmentation matching using generative adversarial networks (PSSAM-GAN)
title_full_unstemmed	Propensity score synthetic augmentation matching using generative adversarial networks (PSSAM-GAN)
title_sort	propensity score synthetic augmentation matching using generative adversarial networks (pssam-gan)
publisher	Elsevier
series	Computer Methods and Programs in Biomedicine Update
issn	2666-9900
publishDate	2021-01-01
description	Understanding causality is of crucial importance in biomedical sciences, where developing prediction models is insufficient because the models need to be actionable. However, data sources, such as electronic health records, are observational and often plagued with various types of biases, e.g. confounding. Although randomized controlled trials are the gold standard to estimate the causal effects of treatment interventions on health outcomes, they are not always possible. Propensity score matching (PSM) is a popular statistical technique for observational data that aims at balancing the characteristics of the population assigned either to a treatment or to a control group, making treatment assignment and outcome independent upon these characteristics. However, matching subjects can reduce the sample size. Inverse probability weighting (IPW) maintains the sample size, but extreme values can lead to instability. While PSM and IPW have been historically used in conjunction with linear regression, machine learning methods –including deep learning with propensity dropout– have been proposed to account for nonlinear treatment assignments. In this work, we propose a novel deep learning approach –the Propensity Score Synthetic Augmentation Matching using Generative Adversarial Networks (PSSAM-GAN)– that aims at keeping the sample size, without IPW, by generating synthetic matches. PSSAM-GAN can be used in conjunction with any other prediction method to estimate treatment effects. Experiments performed on both semi-synthetic (perinatal interventions) and real-world observational data (antibiotic treatments, and job interventions) show that the PSSAM-GAN approach effectively creates balanced datasets, relaxing the weighting/dropout needs for downstream methods, and providing competitive performance in effects estimation as compared to simple GAN and in conjunction with other deep counterfactual learning architectures, e.g. TARNet.
topic	Causal AI Causal inference Deep learning Biomedical informatics Generative adversarial networks Propensity score
url	http://www.sciencedirect.com/science/article/pii/S2666990021000197
work_keys_str_mv	AT shantanughosh propensityscoresyntheticaugmentationmatchingusinggenerativeadversarialnetworkspssamgan AT christinaboucher propensityscoresyntheticaugmentationmatchingusinggenerativeadversarialnetworkspssamgan AT jiangbian propensityscoresyntheticaugmentationmatchingusinggenerativeadversarialnetworkspssamgan AT mattiaprosperi propensityscoresyntheticaugmentationmatchingusinggenerativeadversarialnetworkspssamgan
_version_	1721290492774187008

Propensity score synthetic augmentation matching using generative adversarial networks (PSSAM-GAN)

Similar Items