MovieQA: Understanding Stories in Movies through Question-Answering

We introduce the MovieQA dataset which aims to evaluate automatic story comprehension from both video and text. The dataset consists of 14,944 questions about 408 movies with high semantic diversity. The questions range from simpler "Who" did "What" to "Whom", to "...

Full description

Bibliographic Details
Main Authors:	Tapaswi, Makarand (Author), Zhu, Yukun (Author), Stiefelhagen, Rainer (Author), Torralba, Antonio (Contributor), Urtasun, Raquel (Author), Fidler, Sanja (Author)
Other Authors:	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science (Contributor)
Format:	Article
Language:	English
Published:	Institute of Electrical and Electronics Engineers (IEEE), 2018-02-26T21:43:32Z.
Subjects:	Article
Online Access:	Get fulltext


LEADER	01843 am a22002293u 4500
001	113894
042			\|a dc
100	1	0	\|a Tapaswi, Makarand \|e author
100	1	0	\|a Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science \|e contributor
100	1	0	\|a Torralba, Antonio \|e contributor
700	1	0	\|a Zhu, Yukun \|e author
700	1	0	\|a Stiefelhagen, Rainer \|e author
700	1	0	\|a Torralba, Antonio \|e author
700	1	0	\|a Urtasun, Raquel \|e author
700	1	0	\|a Fidler, Sanja \|e author
245	0	0	\|a MovieQA: Understanding Stories in Movies through Question-Answering
260			\|b Institute of Electrical and Electronics Engineers (IEEE), \|c 2018-02-26T21:43:32Z.
856			\|z Get fulltext \|u http://hdl.handle.net/1721.1/113894
520			\|a We introduce the MovieQA dataset which aims to evaluate automatic story comprehension from both video and text. The dataset consists of 14,944 questions about 408 movies with high semantic diversity. The questions range from simpler "Who" did "What" to "Whom", to "Why" and "How" certain events occurred. Each question comes with a set of five possible answers, a correct one and four deceiving answers provided by human annotators. Our dataset is unique in that it contains multiple sources of information - video clips, plots, subtitles, scripts, and DVS. We analyze our data through various statistics and methods. We further extend existing QA techniques to show that question-answering with such open-ended semantics is hard. We make this data set public along with an evaluation benchmark to encourage inspiring work in this challenging domain. Keywords: Motion pictures, Visualization, Semantics, Voltage control, Cognition, Natural languages, Computer vision
546			\|a en_US
655	7		\|a Article
773			\|t 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

MovieQA: Understanding Stories in Movies through Question-Answering

Similar Items