Language technologies for understanding law, politics, and public policy

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. === This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. === Cataloged from student-su...

Full description

Bibliographic Details
Main Author:	Li, William (William Pui Lum)
Other Authors:	Andrew W. Lo.
Format:	Others
Language:	English
Published:	Massachusetts Institute of Technology 2016
Subjects:	Electrical Engineering and Computer Science.
Online Access:	http://hdl.handle.net/1721.1/103673

id	ndltd-MIT-oai-dspace.mit.edu-1721.1-103673
record_format	oai_dc
spelling	ndltd-MIT-oai-dspace.mit.edu-1721.1-1036732019-05-02T15:45:43Z Language technologies for understanding law, politics, and public policy Li, William (William Pui Lum) Andrew W. Lo. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (pages 205-209). This thesis focuses on the development of machine learning and natural language processing methods and their application to large, text-based open government datasets. We focus on models that uncover patterns and insights by inferring the origins of legal and political texts, with a particular emphasis on identifying text reuse and text similarity in these document collections. First, we present an authorship attribution model on unsigned U.S. Supreme Court opinions, offering insights into the authorship of important cases and the dynamics of Supreme Court decision-making. Second, we apply software engineering metrics to analyze the complexity of the United States Code of Laws, thereby illustrating the structure and evolution of the U.S. Code over the past century. Third, we trace policy trajectories of legislative bills in the United States Congress, enabling us to visualize the contents of four key bills during the Financial Crisis. These applications on diverse open government datasets reveal that text reuse occurs widely in legal and political texts: similar ideas often repeat in the same corpus, different historical versions of documents are usually quite similar, or legitimate reasons for copying or borrowing text may exist. Motivated by this observation, we present a novel statistical text model, Probabilistic Text Reuse (PTR), for finding repeated passages of text in large document collections. We illustrate the utility of PTR by finding template ideas, less-common voices, and insights into document structure in a large collection of public comments on regulations proposed by the U.S. Federal Communications Commission (FCC) on net neutrality. These techniques aim to help citizens better understand political processes and help governments better understand political speech. by William P. Li. Ph. D. 2016-07-18T19:11:42Z 2016-07-18T19:11:42Z 2016 2016 Thesis http://hdl.handle.net/1721.1/103673 953524878 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 209 pages application/pdf Massachusetts Institute of Technology
collection	NDLTD
language	English
format	Others
sources	NDLTD
topic	Electrical Engineering and Computer Science.
spellingShingle	Electrical Engineering and Computer Science. Li, William (William Pui Lum) Language technologies for understanding law, politics, and public policy
description	Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. === This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. === Cataloged from student-submitted PDF version of thesis. === Includes bibliographical references (pages 205-209). === This thesis focuses on the development of machine learning and natural language processing methods and their application to large, text-based open government datasets. We focus on models that uncover patterns and insights by inferring the origins of legal and political texts, with a particular emphasis on identifying text reuse and text similarity in these document collections. First, we present an authorship attribution model on unsigned U.S. Supreme Court opinions, offering insights into the authorship of important cases and the dynamics of Supreme Court decision-making. Second, we apply software engineering metrics to analyze the complexity of the United States Code of Laws, thereby illustrating the structure and evolution of the U.S. Code over the past century. Third, we trace policy trajectories of legislative bills in the United States Congress, enabling us to visualize the contents of four key bills during the Financial Crisis. These applications on diverse open government datasets reveal that text reuse occurs widely in legal and political texts: similar ideas often repeat in the same corpus, different historical versions of documents are usually quite similar, or legitimate reasons for copying or borrowing text may exist. Motivated by this observation, we present a novel statistical text model, Probabilistic Text Reuse (PTR), for finding repeated passages of text in large document collections. We illustrate the utility of PTR by finding template ideas, less-common voices, and insights into document structure in a large collection of public comments on regulations proposed by the U.S. Federal Communications Commission (FCC) on net neutrality. These techniques aim to help citizens better understand political processes and help governments better understand political speech. === by William P. Li. === Ph. D.
author2	Andrew W. Lo.
author_facet	Andrew W. Lo. Li, William (William Pui Lum)
author	Li, William (William Pui Lum)
author_sort	Li, William (William Pui Lum)
title	Language technologies for understanding law, politics, and public policy
title_short	Language technologies for understanding law, politics, and public policy
title_full	Language technologies for understanding law, politics, and public policy
title_fullStr	Language technologies for understanding law, politics, and public policy
title_full_unstemmed	Language technologies for understanding law, politics, and public policy
title_sort	language technologies for understanding law, politics, and public policy
publisher	Massachusetts Institute of Technology
publishDate	2016
url	http://hdl.handle.net/1721.1/103673
work_keys_str_mv	AT liwilliamwilliampuilum languagetechnologiesforunderstandinglawpoliticsandpublicpolicy
_version_	1719027535168143360

Language technologies for understanding law, politics, and public policy

Similar Items