Interpretability for Deep Learning Text Classifiers

The ubiquitous presence of automated decision-making systems that have a performance comparable to humans brought attention towards the necessity of interpretability for the generated predictions. Whether the goal is predicting the system’s behavior when the input changes, building user trust, or...

Full description

Bibliographic Details
Main Author:	Lucaci, Diana
Other Authors:	Inkpen, Diana
Format:	Others
Language:	en
Published:	Université d'Ottawa / University of Ottawa 2020
Subjects:	Interpretability Deep Learning Text Classifiers Natural Language Processing Text Mining
Online Access:	http://hdl.handle.net/10393/41564 http://dx.doi.org/10.20381/ruor-25786

id	ndltd-uottawa.ca-oai-ruor.uottawa.ca-10393-41564
record_format	oai_dc
spelling	ndltd-uottawa.ca-oai-ruor.uottawa.ca-10393-415642020-12-16T05:28:22Z Interpretability for Deep Learning Text Classifiers Lucaci, Diana Inkpen, Diana Interpretability Deep Learning Text Classifiers Natural Language Processing Text Mining The ubiquitous presence of automated decision-making systems that have a performance comparable to humans brought attention towards the necessity of interpretability for the generated predictions. Whether the goal is predicting the system’s behavior when the input changes, building user trust, or expert assistance in improving the machine learning methods, interpretability is paramount when the problem is not sufficiently validated in real applications, and when unacceptable results lead to significant consequences. While for humans, there are no standard interpretations for the decisions they make, the complexity of the systems with advanced information-processing capacities conceals the detailed explanations for individual predictions, encapsulating them under layers of abstractions and complex mathematical operations. Interpretability for deep learning classifiers becomes, thus, a challenging research topic where the ambiguity of the problem statement allows for multiple exploratory paths. Our work focuses on generating natural language interpretations for individual predictions of deep learning text classifiers. We propose a framework for extracting and identifying the phrases of the training corpus that influence the prediction confidence the most through unsupervised key phrase extraction and neural predictions. We assess the contribution margin that the added justification has when the deep learning model predicts the class probability of a text instance, by introducing and defining a contribution metric that allows one to quantify the fidelity of the explanation to the model. We assess both the performance impact of the proposed approach on the classification task as quantitative analysis and the quality of the generated justifications through extensive qualitative and error analysis. This methodology manages to capture the most influencing phrases of the training corpus as explanations that reveal the linguistic features used for individual test predictions, allowing humans to predict the behavior of the deep learning classifier. 2020-12-14T20:41:08Z 2020-12-14T20:41:08Z 2020-12-14 Thesis http://hdl.handle.net/10393/41564 http://dx.doi.org/10.20381/ruor-25786 en application/pdf Université d'Ottawa / University of Ottawa
collection	NDLTD
language	en
format	Others
sources	NDLTD
topic	Interpretability Deep Learning Text Classifiers Natural Language Processing Text Mining
spellingShingle	Interpretability Deep Learning Text Classifiers Natural Language Processing Text Mining Lucaci, Diana Interpretability for Deep Learning Text Classifiers
description	The ubiquitous presence of automated decision-making systems that have a performance comparable to humans brought attention towards the necessity of interpretability for the generated predictions. Whether the goal is predicting the system’s behavior when the input changes, building user trust, or expert assistance in improving the machine learning methods, interpretability is paramount when the problem is not sufficiently validated in real applications, and when unacceptable results lead to significant consequences. While for humans, there are no standard interpretations for the decisions they make, the complexity of the systems with advanced information-processing capacities conceals the detailed explanations for individual predictions, encapsulating them under layers of abstractions and complex mathematical operations. Interpretability for deep learning classifiers becomes, thus, a challenging research topic where the ambiguity of the problem statement allows for multiple exploratory paths. Our work focuses on generating natural language interpretations for individual predictions of deep learning text classifiers. We propose a framework for extracting and identifying the phrases of the training corpus that influence the prediction confidence the most through unsupervised key phrase extraction and neural predictions. We assess the contribution margin that the added justification has when the deep learning model predicts the class probability of a text instance, by introducing and defining a contribution metric that allows one to quantify the fidelity of the explanation to the model. We assess both the performance impact of the proposed approach on the classification task as quantitative analysis and the quality of the generated justifications through extensive qualitative and error analysis. This methodology manages to capture the most influencing phrases of the training corpus as explanations that reveal the linguistic features used for individual test predictions, allowing humans to predict the behavior of the deep learning classifier.
author2	Inkpen, Diana
author_facet	Inkpen, Diana Lucaci, Diana
author	Lucaci, Diana
author_sort	Lucaci, Diana
title	Interpretability for Deep Learning Text Classifiers
title_short	Interpretability for Deep Learning Text Classifiers
title_full	Interpretability for Deep Learning Text Classifiers
title_fullStr	Interpretability for Deep Learning Text Classifiers
title_full_unstemmed	Interpretability for Deep Learning Text Classifiers
title_sort	interpretability for deep learning text classifiers
publisher	Université d'Ottawa / University of Ottawa
publishDate	2020
url	http://hdl.handle.net/10393/41564 http://dx.doi.org/10.20381/ruor-25786
work_keys_str_mv	AT lucacidiana interpretabilityfordeeplearningtextclassifiers
_version_	1719370660366516224

Interpretability for Deep Learning Text Classifiers

Similar Items