A Study of Query Modeling for Spoken Document Retrieval

碩士 === 國立臺灣師範大學 === 資訊工程研究所 === 99 === Spoken document retrieval (SDR) has recently become a more interesting research avenue due to increasing volumes of publicly available multimedia associated with speech information. The fundamental problems facing SDR are generally three-fold: 1) a query is oft...

Full description

Bibliographic Details
Main Author: 陳珮寧
Other Authors: 陳柏琳
Format: Others
Language:zh-TW
Published: 2011
Online Access:http://ndltd.ncl.edu.tw/handle/45158839408333695806
Description
Summary:碩士 === 國立臺灣師範大學 === 資訊工程研究所 === 99 === Spoken document retrieval (SDR) has recently become a more interesting research avenue due to increasing volumes of publicly available multimedia associated with speech information. The fundamental problems facing SDR are generally three-fold: 1) a query is often only a vague expression of an underlying information need, 2) there probably would be word usage mismatch between a query and a spoken document even if they are topically related to each other, and 3) the imperfect speech recognition transcript carries wrong information and thus deviates somewhat from representing the true theme of a spoken document. Many efforts have been devoted to developing elaborate indexing and modeling techniques for representing spoken documents, but few to improving query formulations for better representating the users‟ information needs. In view of this, we presented a novel language modeling framework exploring both lexical- and topic-based relevance formation for improving query effectiveness. We further explore various ways to glean both relevance and non-relevance information from the document collection so as to enhance the modeling of a given query in an unsupervised fashion. Experiments conducted on the TDT (Topic Detection and Tracking) SDR task demonstrate the perofrmance merits of the methods deduced from our retrieval framework deliver when compared to other existing retrieval methods.