A Study on Generative Adversarial Networks Exacerbating Social Data Bias

abstract: Generative Adversarial Networks are designed, in theory, to replicate the distribution of the data they are trained on. With real-world limitations, such as finite network capacity and training set size, they inevitably suffer a yet unavoidable technical failure: mode collapse. GAN-generat...

Full description

Bibliographic Details
Other Authors: Jain, Niharika (Author)
Format: Dissertation
Language:English
Published: 2020
Subjects:
Online Access:http://hdl.handle.net/2286/R.I.57433
id ndltd-asu.edu-item-57433
record_format oai_dc
spelling ndltd-asu.edu-item-574332020-06-02T03:01:32Z A Study on Generative Adversarial Networks Exacerbating Social Data Bias abstract: Generative Adversarial Networks are designed, in theory, to replicate the distribution of the data they are trained on. With real-world limitations, such as finite network capacity and training set size, they inevitably suffer a yet unavoidable technical failure: mode collapse. GAN-generated data is not nearly as diverse as the real-world data the network is trained on; this work shows that this effect is especially drastic when the training data is highly non-uniform. Specifically, GANs learn to exacerbate the social biases which exist in the training set along sensitive axes such as gender and race. In an age where many datasets are curated from web and social media data (which are almost never balanced), this has dangerous implications for downstream tasks using GAN-generated synthetic data, such as data augmentation for classification. This thesis presents an empirical demonstration of this phenomenon and illustrates its real-world ramifications. It starts by showing that when asked to sample images from an illustrative dataset of engineering faculty headshots from 47 U.S. universities, unfortunately skewed toward white males, a DCGAN’s generator “imagines” faces with light skin colors and masculine features. In addition, this work verifies that the generated distribution diverges more from the real-world distribution when the training data is non-uniform than when it is uniform. This work also shows that a conditional variant of GAN is not immune to exacerbating sensitive social biases. Finally, this work contributes a preliminary case study on Snapchat’s explosively popular GAN-enabled “My Twin” selfie lens, which consistently lightens the skin tone for women of color in an attempt to make faces more feminine. The results and discussion of the study are meant to caution machine learning practitioners who may unsuspectingly increase the biases in their applications. Dissertation/Thesis Jain, Niharika (Author) Kambhampati, Subbarao (Advisor) Liu, Huan (Committee member) Manikonda, Lydia (Committee member) Arizona State University (Publisher) Artificial intelligence Computer science Ethics bias data augmentation generative adversarial network machine learning society eng 53 pages Masters Thesis Computer Science 2020 Masters Thesis http://hdl.handle.net/2286/R.I.57433 http://rightsstatements.org/vocab/InC/1.0/ 2020
collection NDLTD
language English
format Dissertation
sources NDLTD
topic Artificial intelligence
Computer science
Ethics
bias
data augmentation
generative adversarial network
machine learning
society
spellingShingle Artificial intelligence
Computer science
Ethics
bias
data augmentation
generative adversarial network
machine learning
society
A Study on Generative Adversarial Networks Exacerbating Social Data Bias
description abstract: Generative Adversarial Networks are designed, in theory, to replicate the distribution of the data they are trained on. With real-world limitations, such as finite network capacity and training set size, they inevitably suffer a yet unavoidable technical failure: mode collapse. GAN-generated data is not nearly as diverse as the real-world data the network is trained on; this work shows that this effect is especially drastic when the training data is highly non-uniform. Specifically, GANs learn to exacerbate the social biases which exist in the training set along sensitive axes such as gender and race. In an age where many datasets are curated from web and social media data (which are almost never balanced), this has dangerous implications for downstream tasks using GAN-generated synthetic data, such as data augmentation for classification. This thesis presents an empirical demonstration of this phenomenon and illustrates its real-world ramifications. It starts by showing that when asked to sample images from an illustrative dataset of engineering faculty headshots from 47 U.S. universities, unfortunately skewed toward white males, a DCGAN’s generator “imagines” faces with light skin colors and masculine features. In addition, this work verifies that the generated distribution diverges more from the real-world distribution when the training data is non-uniform than when it is uniform. This work also shows that a conditional variant of GAN is not immune to exacerbating sensitive social biases. Finally, this work contributes a preliminary case study on Snapchat’s explosively popular GAN-enabled “My Twin” selfie lens, which consistently lightens the skin tone for women of color in an attempt to make faces more feminine. The results and discussion of the study are meant to caution machine learning practitioners who may unsuspectingly increase the biases in their applications. === Dissertation/Thesis === Masters Thesis Computer Science 2020
author2 Jain, Niharika (Author)
author_facet Jain, Niharika (Author)
title A Study on Generative Adversarial Networks Exacerbating Social Data Bias
title_short A Study on Generative Adversarial Networks Exacerbating Social Data Bias
title_full A Study on Generative Adversarial Networks Exacerbating Social Data Bias
title_fullStr A Study on Generative Adversarial Networks Exacerbating Social Data Bias
title_full_unstemmed A Study on Generative Adversarial Networks Exacerbating Social Data Bias
title_sort study on generative adversarial networks exacerbating social data bias
publishDate 2020
url http://hdl.handle.net/2286/R.I.57433
_version_ 1719315880357134336