Propensity score synthetic augmentation matching using generative adversarial networks (PSSAM-GAN)

Understanding causality is of crucial importance in biomedical sciences, where developing prediction models is insufficient because the models need to be actionable. However, data sources, such as electronic health records, are observational and often plagued with various types of biases, e.g. confo...

Full description

Bibliographic Details
Main Authors: Shantanu Ghosh, Christina Boucher, Jiang Bian, Mattia Prosperi
Format: Article
Language:English
Published: Elsevier 2021-01-01
Series:Computer Methods and Programs in Biomedicine Update
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2666990021000197
id doaj-47778e8511a14e03a7bab56af5f06214
record_format Article
spelling doaj-47778e8511a14e03a7bab56af5f062142021-07-23T04:50:51ZengElsevierComputer Methods and Programs in Biomedicine Update2666-99002021-01-011100020Propensity score synthetic augmentation matching using generative adversarial networks (PSSAM-GAN)Shantanu Ghosh0Christina Boucher1Jiang Bian2Mattia Prosperi3Corresponding author.; Department of Computer and Information Science and Engineering, University of Florida, Florida 32611, USADepartment of Computer and Information Science and Engineering, University of Florida, Florida 32611, USADepartment of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Florida 32610, USADepartment of Epidemiology, College of Public Health and Health Professions & College of Medicine, University of Florida, Florida 32610, USAUnderstanding causality is of crucial importance in biomedical sciences, where developing prediction models is insufficient because the models need to be actionable. However, data sources, such as electronic health records, are observational and often plagued with various types of biases, e.g. confounding. Although randomized controlled trials are the gold standard to estimate the causal effects of treatment interventions on health outcomes, they are not always possible. Propensity score matching (PSM) is a popular statistical technique for observational data that aims at balancing the characteristics of the population assigned either to a treatment or to a control group, making treatment assignment and outcome independent upon these characteristics. However, matching subjects can reduce the sample size. Inverse probability weighting (IPW) maintains the sample size, but extreme values can lead to instability. While PSM and IPW have been historically used in conjunction with linear regression, machine learning methods –including deep learning with propensity dropout– have been proposed to account for nonlinear treatment assignments. In this work, we propose a novel deep learning approach –the Propensity Score Synthetic Augmentation Matching using Generative Adversarial Networks (PSSAM-GAN)– that aims at keeping the sample size, without IPW, by generating synthetic matches. PSSAM-GAN can be used in conjunction with any other prediction method to estimate treatment effects. Experiments performed on both semi-synthetic (perinatal interventions) and real-world observational data (antibiotic treatments, and job interventions) show that the PSSAM-GAN approach effectively creates balanced datasets, relaxing the weighting/dropout needs for downstream methods, and providing competitive performance in effects estimation as compared to simple GAN and in conjunction with other deep counterfactual learning architectures, e.g. TARNet.http://www.sciencedirect.com/science/article/pii/S2666990021000197Causal AICausal inferenceDeep learningBiomedical informaticsGenerative adversarial networksPropensity score
collection DOAJ
language English
format Article
sources DOAJ
author Shantanu Ghosh
Christina Boucher
Jiang Bian
Mattia Prosperi
spellingShingle Shantanu Ghosh
Christina Boucher
Jiang Bian
Mattia Prosperi
Propensity score synthetic augmentation matching using generative adversarial networks (PSSAM-GAN)
Computer Methods and Programs in Biomedicine Update
Causal AI
Causal inference
Deep learning
Biomedical informatics
Generative adversarial networks
Propensity score
author_facet Shantanu Ghosh
Christina Boucher
Jiang Bian
Mattia Prosperi
author_sort Shantanu Ghosh
title Propensity score synthetic augmentation matching using generative adversarial networks (PSSAM-GAN)
title_short Propensity score synthetic augmentation matching using generative adversarial networks (PSSAM-GAN)
title_full Propensity score synthetic augmentation matching using generative adversarial networks (PSSAM-GAN)
title_fullStr Propensity score synthetic augmentation matching using generative adversarial networks (PSSAM-GAN)
title_full_unstemmed Propensity score synthetic augmentation matching using generative adversarial networks (PSSAM-GAN)
title_sort propensity score synthetic augmentation matching using generative adversarial networks (pssam-gan)
publisher Elsevier
series Computer Methods and Programs in Biomedicine Update
issn 2666-9900
publishDate 2021-01-01
description Understanding causality is of crucial importance in biomedical sciences, where developing prediction models is insufficient because the models need to be actionable. However, data sources, such as electronic health records, are observational and often plagued with various types of biases, e.g. confounding. Although randomized controlled trials are the gold standard to estimate the causal effects of treatment interventions on health outcomes, they are not always possible. Propensity score matching (PSM) is a popular statistical technique for observational data that aims at balancing the characteristics of the population assigned either to a treatment or to a control group, making treatment assignment and outcome independent upon these characteristics. However, matching subjects can reduce the sample size. Inverse probability weighting (IPW) maintains the sample size, but extreme values can lead to instability. While PSM and IPW have been historically used in conjunction with linear regression, machine learning methods –including deep learning with propensity dropout– have been proposed to account for nonlinear treatment assignments. In this work, we propose a novel deep learning approach –the Propensity Score Synthetic Augmentation Matching using Generative Adversarial Networks (PSSAM-GAN)– that aims at keeping the sample size, without IPW, by generating synthetic matches. PSSAM-GAN can be used in conjunction with any other prediction method to estimate treatment effects. Experiments performed on both semi-synthetic (perinatal interventions) and real-world observational data (antibiotic treatments, and job interventions) show that the PSSAM-GAN approach effectively creates balanced datasets, relaxing the weighting/dropout needs for downstream methods, and providing competitive performance in effects estimation as compared to simple GAN and in conjunction with other deep counterfactual learning architectures, e.g. TARNet.
topic Causal AI
Causal inference
Deep learning
Biomedical informatics
Generative adversarial networks
Propensity score
url http://www.sciencedirect.com/science/article/pii/S2666990021000197
work_keys_str_mv AT shantanughosh propensityscoresyntheticaugmentationmatchingusinggenerativeadversarialnetworkspssamgan
AT christinaboucher propensityscoresyntheticaugmentationmatchingusinggenerativeadversarialnetworkspssamgan
AT jiangbian propensityscoresyntheticaugmentationmatchingusinggenerativeadversarialnetworkspssamgan
AT mattiaprosperi propensityscoresyntheticaugmentationmatchingusinggenerativeadversarialnetworkspssamgan
_version_ 1721290492774187008