Multimodal News Summarization, Tracking and Annotation Incorporating Tensor Analysis of Memes

We demonstrate four novel multimodal methods for efficient video summarization and comprehensive cross-cultural news video understanding. First, For video quick browsing, we demonstrate a multimedia event recounting system. Based on nine people-oriented design principles, it summarizes YouTube...

Full description

Bibliographic Details
Main Author:	Tsai, Chun-Yu
Language:	English
Published:	2017
Subjects:	Computer science Broadcast journalism Memes Video recordings > Abstracting and indexing
Online Access:	https://doi.org/10.7916/D8FF44N7

id	ndltd-columbia.edu-oai-academiccommons.columbia.edu-10.7916-D8FF44N7
record_format	oai_dc
spelling	ndltd-columbia.edu-oai-academiccommons.columbia.edu-10.7916-D8FF44N72019-05-09T15:15:25ZMultimodal News Summarization, Tracking and Annotation Incorporating Tensor Analysis of MemesTsai, Chun-Yu2017ThesesComputer scienceBroadcast journalismMemesVideo recordings--Abstracting and indexingWe demonstrate four novel multimodal methods for efficient video summarization and comprehensive cross-cultural news video understanding. First, For video quick browsing, we demonstrate a multimedia event recounting system. Based on nine people-oriented design principles, it summarizes YouTube-like videos into short visual segments (812sec) and textual words (less than 10 terms). In the 2013 Trecvid Multimedia Event Recounting competition, this system placed first in recognition time efficiency, while remaining above average in description accuracy. Secondly, we demonstrate the summarization of large amounts of online international news videos. In order to understand an international event such as Ebola virus, AirAsia Flight 8501 and Zika virus comprehensively, we present a novel and efficient constrained tensor factorization algorithm that first represents a video archive of multimedia news stories concerning a news event as a sparse tensor of order 4. The dimensions correspond to extracted visual memes, verbal tags, time periods, and cultures. The iterative algorithm approximately but accurately extracts coherent quad-clusters, each of which represents a significant summary of an important independent aspect of the news event. We give examples of quad-clusters extracted from tensors with at least 108 entries derived from international news coverage. We show the method is fast, can be tuned to give preferences to any subset of its four dimensions, and exceeds three existing methods in performance. Thirdly, noting that the co-occurrence of visual memes and tags in our summarization result is sparse, we show how to model cross-cultural visual meme influence based on normalized PageRank, which more accurately captures the rates at which visual memes are reposted in a specified time period in a specified culture. Lastly, we establish the correspondences of videos and text descriptions in different cultures by reliable visual cues, detect culture-specific tags for visual memes and then annotate videos in a cultural settings. Starting with any video with less text or no text in one culture (say, US), we select candidate annotations in the text of another culture (say, China) to annotate US video. Through analyzing the similarity of images annotated by those candidates, we can derive a set of proper tags from the viewpoints of another culture (China). We illustrate cultural-based annotation examples by segments of international news. We evaluate the generated tags by cross-cultural tag frequency, tag precision, and user studies.Englishhttps://doi.org/10.7916/D8FF44N7
collection	NDLTD
language	English
sources	NDLTD
topic	Computer science Broadcast journalism Memes Video recordings--Abstracting and indexing
spellingShingle	Computer science Broadcast journalism Memes Video recordings--Abstracting and indexing Tsai, Chun-Yu Multimodal News Summarization, Tracking and Annotation Incorporating Tensor Analysis of Memes
description	We demonstrate four novel multimodal methods for efficient video summarization and comprehensive cross-cultural news video understanding. First, For video quick browsing, we demonstrate a multimedia event recounting system. Based on nine people-oriented design principles, it summarizes YouTube-like videos into short visual segments (812sec) and textual words (less than 10 terms). In the 2013 Trecvid Multimedia Event Recounting competition, this system placed first in recognition time efficiency, while remaining above average in description accuracy. Secondly, we demonstrate the summarization of large amounts of online international news videos. In order to understand an international event such as Ebola virus, AirAsia Flight 8501 and Zika virus comprehensively, we present a novel and efficient constrained tensor factorization algorithm that first represents a video archive of multimedia news stories concerning a news event as a sparse tensor of order 4. The dimensions correspond to extracted visual memes, verbal tags, time periods, and cultures. The iterative algorithm approximately but accurately extracts coherent quad-clusters, each of which represents a significant summary of an important independent aspect of the news event. We give examples of quad-clusters extracted from tensors with at least 108 entries derived from international news coverage. We show the method is fast, can be tuned to give preferences to any subset of its four dimensions, and exceeds three existing methods in performance. Thirdly, noting that the co-occurrence of visual memes and tags in our summarization result is sparse, we show how to model cross-cultural visual meme influence based on normalized PageRank, which more accurately captures the rates at which visual memes are reposted in a specified time period in a specified culture. Lastly, we establish the correspondences of videos and text descriptions in different cultures by reliable visual cues, detect culture-specific tags for visual memes and then annotate videos in a cultural settings. Starting with any video with less text or no text in one culture (say, US), we select candidate annotations in the text of another culture (say, China) to annotate US video. Through analyzing the similarity of images annotated by those candidates, we can derive a set of proper tags from the viewpoints of another culture (China). We illustrate cultural-based annotation examples by segments of international news. We evaluate the generated tags by cross-cultural tag frequency, tag precision, and user studies.
author	Tsai, Chun-Yu
author_facet	Tsai, Chun-Yu
author_sort	Tsai, Chun-Yu
title	Multimodal News Summarization, Tracking and Annotation Incorporating Tensor Analysis of Memes
title_short	Multimodal News Summarization, Tracking and Annotation Incorporating Tensor Analysis of Memes
title_full	Multimodal News Summarization, Tracking and Annotation Incorporating Tensor Analysis of Memes
title_fullStr	Multimodal News Summarization, Tracking and Annotation Incorporating Tensor Analysis of Memes
title_full_unstemmed	Multimodal News Summarization, Tracking and Annotation Incorporating Tensor Analysis of Memes
title_sort	multimodal news summarization, tracking and annotation incorporating tensor analysis of memes
publishDate	2017
url	https://doi.org/10.7916/D8FF44N7
work_keys_str_mv	AT tsaichunyu multimodalnewssummarizationtrackingandannotationincorporatingtensoranalysisofmemes
_version_	1719046866587353088

Multimodal News Summarization, Tracking and Annotation Incorporating Tensor Analysis of Memes

Similar Items