Exploiting Machine Learning Methods for Spoken Document Retrieval

碩士 === 國立臺灣師範大學 === 資訊工程研究所 === 97 === This thesis investigates the use of machine-learning approaches, namely learning-to-rank algorithms, for information retrieval (IR), with special emphasis on their theoretical foundations and the associated features that are used by them, such as the lexical fe...

Full description

Bibliographic Details
Main Author:	游斯涵
Other Authors:	Berlin Chen
Format:	Others
Language:	zh-TW
Published:	2009
Online Access:	http://ndltd.ncl.edu.tw/handle/z2j5u7

id	ndltd-TW-097NTNU5392003
record_format	oai_dc
spelling	ndltd-TW-097NTNU53920032019-05-29T03:43:27Z http://ndltd.ncl.edu.tw/handle/z2j5u7 Exploiting Machine Learning Methods for Spoken Document Retrieval 使用機器學習方法於語音文件檢索之研究游斯涵碩士國立臺灣師範大學資訊工程研究所 97 This thesis investigates the use of machine-learning approaches, namely learning-to-rank algorithms, for information retrieval (IR), with special emphasis on their theoretical foundations and the associated features that are used by them, such as the lexical features, proximity features, and probabilistic features. Meanwhile, we also consider the application of these approaches for spoken document retrieval (SDR). All experiments were conducted on the Topic Detection and Tracking corpora (especially, TDT-2 and TDT-3), which are the benchmark collections widely adopted for various SDR evaluations since they contain tens of hours of mainland-accented Chinese broadcast news documents equipped with topic labels and orthographic transcripts. In the hope of discovering more useful speech-related features for SDR as well as analyzing the problems caused by speech recognition errors, a large vocabulary speech recognition (LVCSR) system that can output a word lattice consisting of multiple recognition hypotheses for each broadcast news document is established. Moreover, we also deal with the problem of training the machine-learning retrieval models with unbalanced training data, and propose a remedy for it. Finally, the preliminary experimental results seem to show that the RankNet based retrieval model outperforms the support vector machine (SVM) based retrieval model for the SDR task studied in this thesis. Berlin Chen 陳柏琳 2009 學位論文 ; thesis 134 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立臺灣師範大學 === 資訊工程研究所 === 97 === This thesis investigates the use of machine-learning approaches, namely learning-to-rank algorithms, for information retrieval (IR), with special emphasis on their theoretical foundations and the associated features that are used by them, such as the lexical features, proximity features, and probabilistic features. Meanwhile, we also consider the application of these approaches for spoken document retrieval (SDR). All experiments were conducted on the Topic Detection and Tracking corpora (especially, TDT-2 and TDT-3), which are the benchmark collections widely adopted for various SDR evaluations since they contain tens of hours of mainland-accented Chinese broadcast news documents equipped with topic labels and orthographic transcripts. In the hope of discovering more useful speech-related features for SDR as well as analyzing the problems caused by speech recognition errors, a large vocabulary speech recognition (LVCSR) system that can output a word lattice consisting of multiple recognition hypotheses for each broadcast news document is established. Moreover, we also deal with the problem of training the machine-learning retrieval models with unbalanced training data, and propose a remedy for it. Finally, the preliminary experimental results seem to show that the RankNet based retrieval model outperforms the support vector machine (SVM) based retrieval model for the SDR task studied in this thesis.
author2	Berlin Chen
author_facet	Berlin Chen 游斯涵
author	游斯涵
spellingShingle	游斯涵 Exploiting Machine Learning Methods for Spoken Document Retrieval
author_sort	游斯涵
title	Exploiting Machine Learning Methods for Spoken Document Retrieval
title_short	Exploiting Machine Learning Methods for Spoken Document Retrieval
title_full	Exploiting Machine Learning Methods for Spoken Document Retrieval
title_fullStr	Exploiting Machine Learning Methods for Spoken Document Retrieval
title_full_unstemmed	Exploiting Machine Learning Methods for Spoken Document Retrieval
title_sort	exploiting machine learning methods for spoken document retrieval
publishDate	2009
url	http://ndltd.ncl.edu.tw/handle/z2j5u7
work_keys_str_mv	AT yóusīhán exploitingmachinelearningmethodsforspokendocumentretrieval AT yóusīhán shǐyòngjīqìxuéxífāngfǎyúyǔyīnwénjiànjiǎnsuǒzhīyánjiū
_version_	1719193681700847616

Exploiting Machine Learning Methods for Spoken Document Retrieval

Similar Items