Multi-Modal Learning over User-Contributed Content from Cross-Domain Social Media

博士 === 國立臺灣大學 === 資訊網路與多媒體研究所 === 104 === Social media have changed the world and our lives. Every day, millions of media data are uploaded to social-sharing websites. The goal of the research is to discover and summarize large amounts of media data from the emerging social media into information...

Full description

Bibliographic Details
Main Authors: Wen-Yu Lee, 李文瑜
Other Authors: 徐宏民
Format: Others
Language:en_US
Published: 2016
Online Access:http://ndltd.ncl.edu.tw/handle/98114821102121233466
Description
Summary:博士 === 國立臺灣大學 === 資訊網路與多媒體研究所 === 104 === Social media have changed the world and our lives. Every day, millions of media data are uploaded to social-sharing websites. The goal of the research is to discover and summarize large amounts of media data from the emerging social media into information of interests. Our basic idea is to perform multi-modal learning for given data, leveraging user-contributed data from cross-domain social media. Specifically, given a photo, we intend to discover geographical information, people''s description or comments, and events of interest, closely related to the photo. These information then can be used for various purposes, such as being a real-time guide for the tourists to improve the quality of tourism. As a result, this dissertation studies modern challenges of image location identification, image annotation, and event discovery, followed by presenting promising ways to conquer the challenges. For image location identification, most previous works directly integrated visual features and geo-tags of the given photos. The performance of the existing approaches, however, could be limited if the given photos were taken indoors, and/or their image contents contain a number of buildings in a close proximity. As a solution, this dissertation unifies visual features, geo-tags, and check-in data, and further presents an image cluster refinement approach, for image location identification. For image annotation, label propagation is widely used to annotate photos based on similarity graphs of photos, where most previous works focused on single-label propagation. Although performing multi-label propagation is expected to be more efficient for annotation than performing single-label propagation several times, performing multi-label propagation may increase the computational complexities. Further, sizes of image datasets continue to increase and thus increase the problem complexity. As a solution, this dissertation presents a scalable multi-label propagation leveraging the power of distributed computing. For event discovery, most previous works investigated a specific media stream. Potentially, mining multiple media streams is capable of achieving better performance than mining a media stream alone, but could be more challenging. As a solution, this dissertation presents a two-stage framework that combines a flow-based media dataset and check-in-based media dataset for events-of-interest discovery. Experimental results on real media datasets show the effectiveness of all of the proposed approaches. Finally, this dissertation provides some possible directions for future studies.