Propensity score synthetic augmentation matching using generative adversarial networks (PSSAM-GAN)
Understanding causality is of crucial importance in biomedical sciences, where developing prediction models is insufficient because the models need to be actionable. However, data sources, such as electronic health records, are observational and often plagued with various types of biases, e.g. confo...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2021-01-01
|
Series: | Computer Methods and Programs in Biomedicine Update |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2666990021000197 |
id |
doaj-47778e8511a14e03a7bab56af5f06214 |
---|---|
record_format |
Article |
spelling |
doaj-47778e8511a14e03a7bab56af5f062142021-07-23T04:50:51ZengElsevierComputer Methods and Programs in Biomedicine Update2666-99002021-01-011100020Propensity score synthetic augmentation matching using generative adversarial networks (PSSAM-GAN)Shantanu Ghosh0Christina Boucher1Jiang Bian2Mattia Prosperi3Corresponding author.; Department of Computer and Information Science and Engineering, University of Florida, Florida 32611, USADepartment of Computer and Information Science and Engineering, University of Florida, Florida 32611, USADepartment of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Florida 32610, USADepartment of Epidemiology, College of Public Health and Health Professions & College of Medicine, University of Florida, Florida 32610, USAUnderstanding causality is of crucial importance in biomedical sciences, where developing prediction models is insufficient because the models need to be actionable. However, data sources, such as electronic health records, are observational and often plagued with various types of biases, e.g. confounding. Although randomized controlled trials are the gold standard to estimate the causal effects of treatment interventions on health outcomes, they are not always possible. Propensity score matching (PSM) is a popular statistical technique for observational data that aims at balancing the characteristics of the population assigned either to a treatment or to a control group, making treatment assignment and outcome independent upon these characteristics. However, matching subjects can reduce the sample size. Inverse probability weighting (IPW) maintains the sample size, but extreme values can lead to instability. While PSM and IPW have been historically used in conjunction with linear regression, machine learning methods –including deep learning with propensity dropout– have been proposed to account for nonlinear treatment assignments. In this work, we propose a novel deep learning approach –the Propensity Score Synthetic Augmentation Matching using Generative Adversarial Networks (PSSAM-GAN)– that aims at keeping the sample size, without IPW, by generating synthetic matches. PSSAM-GAN can be used in conjunction with any other prediction method to estimate treatment effects. Experiments performed on both semi-synthetic (perinatal interventions) and real-world observational data (antibiotic treatments, and job interventions) show that the PSSAM-GAN approach effectively creates balanced datasets, relaxing the weighting/dropout needs for downstream methods, and providing competitive performance in effects estimation as compared to simple GAN and in conjunction with other deep counterfactual learning architectures, e.g. TARNet.http://www.sciencedirect.com/science/article/pii/S2666990021000197Causal AICausal inferenceDeep learningBiomedical informaticsGenerative adversarial networksPropensity score |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Shantanu Ghosh Christina Boucher Jiang Bian Mattia Prosperi |
spellingShingle |
Shantanu Ghosh Christina Boucher Jiang Bian Mattia Prosperi Propensity score synthetic augmentation matching using generative adversarial networks (PSSAM-GAN) Computer Methods and Programs in Biomedicine Update Causal AI Causal inference Deep learning Biomedical informatics Generative adversarial networks Propensity score |
author_facet |
Shantanu Ghosh Christina Boucher Jiang Bian Mattia Prosperi |
author_sort |
Shantanu Ghosh |
title |
Propensity score synthetic augmentation matching using generative adversarial networks (PSSAM-GAN) |
title_short |
Propensity score synthetic augmentation matching using generative adversarial networks (PSSAM-GAN) |
title_full |
Propensity score synthetic augmentation matching using generative adversarial networks (PSSAM-GAN) |
title_fullStr |
Propensity score synthetic augmentation matching using generative adversarial networks (PSSAM-GAN) |
title_full_unstemmed |
Propensity score synthetic augmentation matching using generative adversarial networks (PSSAM-GAN) |
title_sort |
propensity score synthetic augmentation matching using generative adversarial networks (pssam-gan) |
publisher |
Elsevier |
series |
Computer Methods and Programs in Biomedicine Update |
issn |
2666-9900 |
publishDate |
2021-01-01 |
description |
Understanding causality is of crucial importance in biomedical sciences, where developing prediction models is insufficient because the models need to be actionable. However, data sources, such as electronic health records, are observational and often plagued with various types of biases, e.g. confounding. Although randomized controlled trials are the gold standard to estimate the causal effects of treatment interventions on health outcomes, they are not always possible. Propensity score matching (PSM) is a popular statistical technique for observational data that aims at balancing the characteristics of the population assigned either to a treatment or to a control group, making treatment assignment and outcome independent upon these characteristics. However, matching subjects can reduce the sample size. Inverse probability weighting (IPW) maintains the sample size, but extreme values can lead to instability. While PSM and IPW have been historically used in conjunction with linear regression, machine learning methods –including deep learning with propensity dropout– have been proposed to account for nonlinear treatment assignments. In this work, we propose a novel deep learning approach –the Propensity Score Synthetic Augmentation Matching using Generative Adversarial Networks (PSSAM-GAN)– that aims at keeping the sample size, without IPW, by generating synthetic matches. PSSAM-GAN can be used in conjunction with any other prediction method to estimate treatment effects. Experiments performed on both semi-synthetic (perinatal interventions) and real-world observational data (antibiotic treatments, and job interventions) show that the PSSAM-GAN approach effectively creates balanced datasets, relaxing the weighting/dropout needs for downstream methods, and providing competitive performance in effects estimation as compared to simple GAN and in conjunction with other deep counterfactual learning architectures, e.g. TARNet. |
topic |
Causal AI Causal inference Deep learning Biomedical informatics Generative adversarial networks Propensity score |
url |
http://www.sciencedirect.com/science/article/pii/S2666990021000197 |
work_keys_str_mv |
AT shantanughosh propensityscoresyntheticaugmentationmatchingusinggenerativeadversarialnetworkspssamgan AT christinaboucher propensityscoresyntheticaugmentationmatchingusinggenerativeadversarialnetworkspssamgan AT jiangbian propensityscoresyntheticaugmentationmatchingusinggenerativeadversarialnetworkspssamgan AT mattiaprosperi propensityscoresyntheticaugmentationmatchingusinggenerativeadversarialnetworkspssamgan |
_version_ |
1721290492774187008 |