An empirical study identifying bias in Yelp dataset

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, February, 2021 === Cataloged from the official PDF of thesis. === Includes bibliographical references (pages 45-47). === Online review platforms have become an essential element of the...

Full description

Bibliographic Details
Main Author:	Choi, Seri,M. Eng.Massachusetts Institute of Technology.
Other Authors:	Alex Pentland.
Format:	Others
Language:	English
Published:	Massachusetts Institute of Technology 2021
Subjects:	Electrical Engineering and Computer Science.
Online Access:	https://hdl.handle.net/1721.1/130685

id	ndltd-MIT-oai-dspace.mit.edu-1721.1-130685
record_format	oai_dc
spelling	ndltd-MIT-oai-dspace.mit.edu-1721.1-1306852021-05-28T05:20:01Z An empirical study identifying bias in Yelp dataset Choi, Seri,M. Eng.Massachusetts Institute of Technology. Alex Pentland. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Electrical Engineering and Computer Science. Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, February, 2021 Cataloged from the official PDF of thesis. Includes bibliographical references (pages 45-47). Online review platforms have become an essential element of the business industry, providing users in-depth information on businesses and other users' experiences. The purpose of this study is to examine possible bias or discriminatory behaviors in users' rating habits in the Yelp dataset. The Surprise recommender system is utilized to produce expected ratings for the test set, training the model with 75% of the original dataset to learn the rating trends. Then, the ordinary least squares (OLS) linear regression is applied to identify which factors affected the percent change and which categories or locations show more bias than the others. This paper can provide insights into ways that bias can manifest within a dataset due to non-experimental factors such as social psychology; future research into this topic can therefore take these non-experimental factors, such as the discriminatory bias found in Yelp reviews, into consideration in order to reduce bias when utilizing machine learning algorithms. by Seri Choi. M. Eng. M.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science 2021-05-24T19:40:22Z 2021-05-24T19:40:22Z 2021 2021 Thesis https://hdl.handle.net/1721.1/130685 1251779073 eng MIT theses may be protected by copyright. Please reuse MIT thesis content according to the MIT Libraries Permissions Policy, which is available through the URL provided. http://dspace.mit.edu/handle/1721.1/7582 47 pages application/pdf Massachusetts Institute of Technology
collection	NDLTD
language	English
format	Others
sources	NDLTD
topic	Electrical Engineering and Computer Science.
spellingShingle	Electrical Engineering and Computer Science. Choi, Seri,M. Eng.Massachusetts Institute of Technology. An empirical study identifying bias in Yelp dataset
description	Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, February, 2021 === Cataloged from the official PDF of thesis. === Includes bibliographical references (pages 45-47). === Online review platforms have become an essential element of the business industry, providing users in-depth information on businesses and other users' experiences. The purpose of this study is to examine possible bias or discriminatory behaviors in users' rating habits in the Yelp dataset. The Surprise recommender system is utilized to produce expected ratings for the test set, training the model with 75% of the original dataset to learn the rating trends. Then, the ordinary least squares (OLS) linear regression is applied to identify which factors affected the percent change and which categories or locations show more bias than the others. This paper can provide insights into ways that bias can manifest within a dataset due to non-experimental factors such as social psychology; future research into this topic can therefore take these non-experimental factors, such as the discriminatory bias found in Yelp reviews, into consideration in order to reduce bias when utilizing machine learning algorithms. === by Seri Choi. === M. Eng. === M.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science
author2	Alex Pentland.
author_facet	Alex Pentland. Choi, Seri,M. Eng.Massachusetts Institute of Technology.
author	Choi, Seri,M. Eng.Massachusetts Institute of Technology.
author_sort	Choi, Seri,M. Eng.Massachusetts Institute of Technology.
title	An empirical study identifying bias in Yelp dataset
title_short	An empirical study identifying bias in Yelp dataset
title_full	An empirical study identifying bias in Yelp dataset
title_fullStr	An empirical study identifying bias in Yelp dataset
title_full_unstemmed	An empirical study identifying bias in Yelp dataset
title_sort	empirical study identifying bias in yelp dataset
publisher	Massachusetts Institute of Technology
publishDate	2021
url	https://hdl.handle.net/1721.1/130685
work_keys_str_mv	AT choiserimengmassachusettsinstituteoftechnology anempiricalstudyidentifyingbiasinyelpdataset AT choiserimengmassachusettsinstituteoftechnology empiricalstudyidentifyingbiasinyelpdataset
_version_	1719407324269903872

An empirical study identifying bias in Yelp dataset

Similar Items