Investigating Acoustic Modeling and Semi-supervised Discriminative Training for Speech Recognition

碩士 === 國立臺灣師範大學 === 資訊工程學系 === 107 === More recently, a novel objective function of discriminative acoustic model training, namely Lattice-free maximum mutual information (LF-MMI), has been proposed and achieved the new state-of-the-art in automatic speech recognition (ASR). Although LF-MMI shows ex...

Full description

Bibliographic Details
Main Authors:	Lo, Tien-Hong, 羅天宏
Other Authors:	Chen, Berlin
Format:	Others
Language:	zh-TW
Published:	2019
Online Access:	http://ndltd.ncl.edu.tw/handle/88f353

id	ndltd-TW-107NTNU5392006
record_format	oai_dc
spelling	ndltd-TW-107NTNU53920062019-05-16T01:45:07Z http://ndltd.ncl.edu.tw/handle/88f353 Investigating Acoustic Modeling and Semi-supervised Discriminative Training for Speech Recognition 探討聲學模型化技術與半監督鑑別式訓練於語音辨識之研究 Lo, Tien-Hong 羅天宏碩士國立臺灣師範大學資訊工程學系 107 More recently, a novel objective function of discriminative acoustic model training, namely Lattice-free maximum mutual information (LF-MMI), has been proposed and achieved the new state-of-the-art in automatic speech recognition (ASR). Although LF-MMI shows excellent performance in various ASR tasks with supervised training settings, its performance is often significantly degraded when with semi-supervised settings. This is because LF-MMI shares a common deficiency of discriminative training criteria, being sensitive to the accuracy of the corresponding transcripts of training utterances. In view of the above, this thesis explores two questions to LF-MMI with a semi-supervised training setting: the first one is how to improve the seed model and the second one is how to use untranscribed training data. For the former, we investigate several transfer learning approaches (e.g. weight transfer and multitask learning) and the model combination (e.g. hypothesis-level combination and frame-level combination). The distinction between the above two methods is whether extra training data is being used or not. On the other hand, for the second question, we introduce negative conditional entropy (NCE) and lattice for supervision, in conjunction with the LF-MMI objective function. A series of experiments were conducted on the Augmented Multi-Party Interaction (AMI) benchmark corpus. The experimental results show that transfer learning using out-of-domain data (ODD) and model combination based on complementary diversity can effectively improve the performance of the seed model. The pairing of NCE and lattice for supervision can improve the word error rate (WER) and WER recovery rate (WRR). Chen, Berlin 陳柏琳 2019 學位論文 ; thesis 102 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立臺灣師範大學 === 資訊工程學系 === 107 === More recently, a novel objective function of discriminative acoustic model training, namely Lattice-free maximum mutual information (LF-MMI), has been proposed and achieved the new state-of-the-art in automatic speech recognition (ASR). Although LF-MMI shows excellent performance in various ASR tasks with supervised training settings, its performance is often significantly degraded when with semi-supervised settings. This is because LF-MMI shares a common deficiency of discriminative training criteria, being sensitive to the accuracy of the corresponding transcripts of training utterances. In view of the above, this thesis explores two questions to LF-MMI with a semi-supervised training setting: the first one is how to improve the seed model and the second one is how to use untranscribed training data. For the former, we investigate several transfer learning approaches (e.g. weight transfer and multitask learning) and the model combination (e.g. hypothesis-level combination and frame-level combination). The distinction between the above two methods is whether extra training data is being used or not. On the other hand, for the second question, we introduce negative conditional entropy (NCE) and lattice for supervision, in conjunction with the LF-MMI objective function. A series of experiments were conducted on the Augmented Multi-Party Interaction (AMI) benchmark corpus. The experimental results show that transfer learning using out-of-domain data (ODD) and model combination based on complementary diversity can effectively improve the performance of the seed model. The pairing of NCE and lattice for supervision can improve the word error rate (WER) and WER recovery rate (WRR).
author2	Chen, Berlin
author_facet	Chen, Berlin Lo, Tien-Hong 羅天宏
author	Lo, Tien-Hong 羅天宏
spellingShingle	Lo, Tien-Hong 羅天宏 Investigating Acoustic Modeling and Semi-supervised Discriminative Training for Speech Recognition
author_sort	Lo, Tien-Hong
title	Investigating Acoustic Modeling and Semi-supervised Discriminative Training for Speech Recognition
title_short	Investigating Acoustic Modeling and Semi-supervised Discriminative Training for Speech Recognition
title_full	Investigating Acoustic Modeling and Semi-supervised Discriminative Training for Speech Recognition
title_fullStr	Investigating Acoustic Modeling and Semi-supervised Discriminative Training for Speech Recognition
title_full_unstemmed	Investigating Acoustic Modeling and Semi-supervised Discriminative Training for Speech Recognition
title_sort	investigating acoustic modeling and semi-supervised discriminative training for speech recognition
publishDate	2019
url	http://ndltd.ncl.edu.tw/handle/88f353
work_keys_str_mv	AT lotienhong investigatingacousticmodelingandsemisuperviseddiscriminativetrainingforspeechrecognition AT luótiānhóng investigatingacousticmodelingandsemisuperviseddiscriminativetrainingforspeechrecognition AT lotienhong tàntǎoshēngxuémóxínghuàjìshùyǔbànjiāndūjiànbiéshìxùnliànyúyǔyīnbiànshízhīyánjiū AT luótiānhóng tàntǎoshēngxuémóxínghuàjìshùyǔbànjiāndūjiànbiéshìxùnliànyúyǔyīnbiànshízhīyánjiū
_version_	1719179692528893952

Investigating Acoustic Modeling and Semi-supervised Discriminative Training for Speech Recognition

Similar Items