Towards Debiasing Fact Verification Models

© 2019 Association for Computational Linguistics Fact verification requires validating a claim in the context of evidence. We show, however, that in the popular FEVER dataset this might not necessarily be the case. Claim-only classifiers perform competitively with top evidence-aware models. In this...

Full description

Bibliographic Details
Main Authors:	Schuster, Tal (Author), Shah, Darsh J (Author), Yeo, Yun Jie Serene (Author), Filizzola, Daniel (Author), Santus, Enrico (Author), Barzilay, Regina (Author)
Other Authors:	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory (Contributor), Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science (Contributor)
Format:	Article
Language:	English
Published:	Association for Computational Linguistics, 2021-11-15T15:59:15Z.
Subjects:	Article
Online Access:	Get fulltext


LEADER	01991 am a22002533u 4500
001	137401.2
042			\|a dc
100	1	0	\|a Schuster, Tal \|e author
100	1	0	\|a Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory \|e contributor
100	1	0	\|a Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science \|e contributor
700	1	0	\|a Shah, Darsh J \|e author
700	1	0	\|a Yeo, Yun Jie Serene \|e author
700	1	0	\|a Filizzola, Daniel \|e author
700	1	0	\|a Santus, Enrico \|e author
700	1	0	\|a Barzilay, Regina \|e author
245	0	0	\|a Towards Debiasing Fact Verification Models
260			\|b Association for Computational Linguistics, \|c 2021-11-15T15:59:15Z.
856			\|z Get fulltext \|u https://hdl.handle.net/1721.1/137401.2
520			\|a © 2019 Association for Computational Linguistics Fact verification requires validating a claim in the context of evidence. We show, however, that in the popular FEVER dataset this might not necessarily be the case. Claim-only classifiers perform competitively with top evidence-aware models. In this paper, we investigate the cause of this phenomenon, identifying strong cues for predicting labels solely based on the claim, without considering any evidence. We create an evaluation set that avoids those idiosyncrasies. The performance of FEVER-trained models significantly drops when evaluated on this test set. Therefore, we introduce a regularization method which alleviates the effect of bias in the training data, obtaining improvements on the newly created test set. This work is a step towards a more sound evaluation of reasoning capabilities in fact verification models.
520			\|a DSO (Grant DSOCL18002)
546			\|a en
655	7		\|a Article
773			\|t 10.18653/V1/D19-1341
773			\|t EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference

Towards Debiasing Fact Verification Models

Similar Items