Multimodal News Summarization, Tracking and Annotation Incorporating Tensor Analysis of Memes

We demonstrate four novel multimodal methods for efficient video summarization and comprehensive cross-cultural news video understanding. First, For video quick browsing, we demonstrate a multimedia event recounting system. Based on nine people-oriented design principles, it summarizes YouTube...

Full description

Bibliographic Details
Main Author: Tsai, Chun-Yu
Language:English
Published: 2017
Subjects:
Online Access:https://doi.org/10.7916/D8FF44N7
id ndltd-columbia.edu-oai-academiccommons.columbia.edu-10.7916-D8FF44N7
record_format oai_dc
spelling ndltd-columbia.edu-oai-academiccommons.columbia.edu-10.7916-D8FF44N72019-05-09T15:15:25ZMultimodal News Summarization, Tracking and Annotation Incorporating Tensor Analysis of MemesTsai, Chun-Yu2017ThesesComputer scienceBroadcast journalismMemesVideo recordings--Abstracting and indexingWe demonstrate four novel multimodal methods for efficient video summarization and comprehensive cross-cultural news video understanding. First, For video quick browsing, we demonstrate a multimedia event recounting system. Based on nine people-oriented design principles, it summarizes YouTube-like videos into short visual segments (812sec) and textual words (less than 10 terms). In the 2013 Trecvid Multimedia Event Recounting competition, this system placed first in recognition time efficiency, while remaining above average in description accuracy. Secondly, we demonstrate the summarization of large amounts of online international news videos. In order to understand an international event such as Ebola virus, AirAsia Flight 8501 and Zika virus comprehensively, we present a novel and efficient constrained tensor factorization algorithm that first represents a video archive of multimedia news stories concerning a news event as a sparse tensor of order 4. The dimensions correspond to extracted visual memes, verbal tags, time periods, and cultures. The iterative algorithm approximately but accurately extracts coherent quad-clusters, each of which represents a significant summary of an important independent aspect of the news event. We give examples of quad-clusters extracted from tensors with at least 108 entries derived from international news coverage. We show the method is fast, can be tuned to give preferences to any subset of its four dimensions, and exceeds three existing methods in performance. Thirdly, noting that the co-occurrence of visual memes and tags in our summarization result is sparse, we show how to model cross-cultural visual meme influence based on normalized PageRank, which more accurately captures the rates at which visual memes are reposted in a specified time period in a specified culture. Lastly, we establish the correspondences of videos and text descriptions in different cultures by reliable visual cues, detect culture-specific tags for visual memes and then annotate videos in a cultural settings. Starting with any video with less text or no text in one culture (say, US), we select candidate annotations in the text of another culture (say, China) to annotate US video. Through analyzing the similarity of images annotated by those candidates, we can derive a set of proper tags from the viewpoints of another culture (China). We illustrate cultural-based annotation examples by segments of international news. We evaluate the generated tags by cross-cultural tag frequency, tag precision, and user studies.Englishhttps://doi.org/10.7916/D8FF44N7
collection NDLTD
language English
sources NDLTD
topic Computer science
Broadcast journalism
Memes
Video recordings--Abstracting and indexing
spellingShingle Computer science
Broadcast journalism
Memes
Video recordings--Abstracting and indexing
Tsai, Chun-Yu
Multimodal News Summarization, Tracking and Annotation Incorporating Tensor Analysis of Memes
description We demonstrate four novel multimodal methods for efficient video summarization and comprehensive cross-cultural news video understanding. First, For video quick browsing, we demonstrate a multimedia event recounting system. Based on nine people-oriented design principles, it summarizes YouTube-like videos into short visual segments (812sec) and textual words (less than 10 terms). In the 2013 Trecvid Multimedia Event Recounting competition, this system placed first in recognition time efficiency, while remaining above average in description accuracy. Secondly, we demonstrate the summarization of large amounts of online international news videos. In order to understand an international event such as Ebola virus, AirAsia Flight 8501 and Zika virus comprehensively, we present a novel and efficient constrained tensor factorization algorithm that first represents a video archive of multimedia news stories concerning a news event as a sparse tensor of order 4. The dimensions correspond to extracted visual memes, verbal tags, time periods, and cultures. The iterative algorithm approximately but accurately extracts coherent quad-clusters, each of which represents a significant summary of an important independent aspect of the news event. We give examples of quad-clusters extracted from tensors with at least 108 entries derived from international news coverage. We show the method is fast, can be tuned to give preferences to any subset of its four dimensions, and exceeds three existing methods in performance. Thirdly, noting that the co-occurrence of visual memes and tags in our summarization result is sparse, we show how to model cross-cultural visual meme influence based on normalized PageRank, which more accurately captures the rates at which visual memes are reposted in a specified time period in a specified culture. Lastly, we establish the correspondences of videos and text descriptions in different cultures by reliable visual cues, detect culture-specific tags for visual memes and then annotate videos in a cultural settings. Starting with any video with less text or no text in one culture (say, US), we select candidate annotations in the text of another culture (say, China) to annotate US video. Through analyzing the similarity of images annotated by those candidates, we can derive a set of proper tags from the viewpoints of another culture (China). We illustrate cultural-based annotation examples by segments of international news. We evaluate the generated tags by cross-cultural tag frequency, tag precision, and user studies.
author Tsai, Chun-Yu
author_facet Tsai, Chun-Yu
author_sort Tsai, Chun-Yu
title Multimodal News Summarization, Tracking and Annotation Incorporating Tensor Analysis of Memes
title_short Multimodal News Summarization, Tracking and Annotation Incorporating Tensor Analysis of Memes
title_full Multimodal News Summarization, Tracking and Annotation Incorporating Tensor Analysis of Memes
title_fullStr Multimodal News Summarization, Tracking and Annotation Incorporating Tensor Analysis of Memes
title_full_unstemmed Multimodal News Summarization, Tracking and Annotation Incorporating Tensor Analysis of Memes
title_sort multimodal news summarization, tracking and annotation incorporating tensor analysis of memes
publishDate 2017
url https://doi.org/10.7916/D8FF44N7
work_keys_str_mv AT tsaichunyu multimodalnewssummarizationtrackingandannotationincorporatingtensoranalysisofmemes
_version_ 1719046866587353088