Audio Information Based Wedding Video Indexing

碩士 === 國立臺灣大學 === 資訊工程學研究所 === 96 ===   People tend to use digital video recorder to capture their lives, for example wedding is one of important ceremonies in our life, and people usually film a video record to commemorate it. But then the videos are usually put into storage and never watch again,...

Full description

Bibliographic Details
Main Authors: Shao-Yen Fang, 方劭彥
Other Authors: Ja-Ling Wu
Format: Others
Language:en_US
Published: 2008
Online Access:http://ndltd.ncl.edu.tw/handle/96223881295701340956
Description
Summary:碩士 === 國立臺灣大學 === 資訊工程學研究所 === 96 ===   People tend to use digital video recorder to capture their lives, for example wedding is one of important ceremonies in our life, and people usually film a video record to commemorate it. But then the videos are usually put into storage and never watch again, because the raw video is hard to turn into compelling video story. Thus we need to apply the video summarization. Visual information such as dominant color, motion, scene change is usually used in traditional video summarization, but it is not well applicable in wedding video. On the other hand the audio information is meaningful. It is hard to avoid the noise in wedding videos, however most audio processings such as speech/music discrimination are dealt with in clean environment in the literature, and the performance of them are not good enough with noise, thus we develop the noisy environment resisted speech/music discrimination and vocal/non-vocal discrimination. In addition, contrast to other papers that apply low level acoustic features, we combine the results of speaker change detection and clap detection with our wedding event matching procedure. Distinguishably to other papers which focus on the signal processing, we apply a refine algorithm to re-correct the mismatched events to improve the performance of our proposed work.   In this thesis, the given wedding videos are divided into several segments by speech/music discrimination and vocal/non-vocal discrimination which are developed by our proposed work and can resist the noisy environment. Then the obtained segments will be labeled to associated wedding events assisted with speaker change detection and clap detection which are developed by our proposed work. Finally the labeled events will be revised by our refine algorithm that tried to re-match the mismatch events which are not fit for the wedding structure.