Investigating Acoustic Modeling and Semi-supervised Discriminative Training for Speech Recognition

碩士 === 國立臺灣師範大學 === 資訊工程學系 === 107 === More recently, a novel objective function of discriminative acoustic model training, namely Lattice-free maximum mutual information (LF-MMI), has been proposed and achieved the new state-of-the-art in automatic speech recognition (ASR). Although LF-MMI shows ex...

Full description

Bibliographic Details
Main Authors: Lo, Tien-Hong, 羅天宏
Other Authors: Chen, Berlin
Format: Others
Language:zh-TW
Published: 2019
Online Access:http://ndltd.ncl.edu.tw/handle/88f353
id ndltd-TW-107NTNU5392006
record_format oai_dc
spelling ndltd-TW-107NTNU53920062019-05-16T01:45:07Z http://ndltd.ncl.edu.tw/handle/88f353 Investigating Acoustic Modeling and Semi-supervised Discriminative Training for Speech Recognition 探討聲學模型化技術與半監督鑑別式訓練於語音辨識之研究 Lo, Tien-Hong 羅天宏 碩士 國立臺灣師範大學 資訊工程學系 107 More recently, a novel objective function of discriminative acoustic model training, namely Lattice-free maximum mutual information (LF-MMI), has been proposed and achieved the new state-of-the-art in automatic speech recognition (ASR). Although LF-MMI shows excellent performance in various ASR tasks with supervised training settings, its performance is often significantly degraded when with semi-supervised settings. This is because LF-MMI shares a common deficiency of discriminative training criteria, being sensitive to the accuracy of the corresponding transcripts of training utterances. In view of the above, this thesis explores two questions to LF-MMI with a semi-supervised training setting: the first one is how to improve the seed model and the second one is how to use untranscribed training data. For the former, we investigate several transfer learning approaches (e.g. weight transfer and multitask learning) and the model combination (e.g. hypothesis-level combination and frame-level combination). The distinction between the above two methods is whether extra training data is being used or not. On the other hand, for the second question, we introduce negative conditional entropy (NCE) and lattice for supervision, in conjunction with the LF-MMI objective function. A series of experiments were conducted on the Augmented Multi-Party Interaction (AMI) benchmark corpus. The experimental results show that transfer learning using out-of-domain data (ODD) and model combination based on complementary diversity can effectively improve the performance of the seed model. The pairing of NCE and lattice for supervision can improve the word error rate (WER) and WER recovery rate (WRR). Chen, Berlin 陳柏琳 2019 學位論文 ; thesis 102 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立臺灣師範大學 === 資訊工程學系 === 107 === More recently, a novel objective function of discriminative acoustic model training, namely Lattice-free maximum mutual information (LF-MMI), has been proposed and achieved the new state-of-the-art in automatic speech recognition (ASR). Although LF-MMI shows excellent performance in various ASR tasks with supervised training settings, its performance is often significantly degraded when with semi-supervised settings. This is because LF-MMI shares a common deficiency of discriminative training criteria, being sensitive to the accuracy of the corresponding transcripts of training utterances. In view of the above, this thesis explores two questions to LF-MMI with a semi-supervised training setting: the first one is how to improve the seed model and the second one is how to use untranscribed training data. For the former, we investigate several transfer learning approaches (e.g. weight transfer and multitask learning) and the model combination (e.g. hypothesis-level combination and frame-level combination). The distinction between the above two methods is whether extra training data is being used or not. On the other hand, for the second question, we introduce negative conditional entropy (NCE) and lattice for supervision, in conjunction with the LF-MMI objective function. A series of experiments were conducted on the Augmented Multi-Party Interaction (AMI) benchmark corpus. The experimental results show that transfer learning using out-of-domain data (ODD) and model combination based on complementary diversity can effectively improve the performance of the seed model. The pairing of NCE and lattice for supervision can improve the word error rate (WER) and WER recovery rate (WRR).
author2 Chen, Berlin
author_facet Chen, Berlin
Lo, Tien-Hong
羅天宏
author Lo, Tien-Hong
羅天宏
spellingShingle Lo, Tien-Hong
羅天宏
Investigating Acoustic Modeling and Semi-supervised Discriminative Training for Speech Recognition
author_sort Lo, Tien-Hong
title Investigating Acoustic Modeling and Semi-supervised Discriminative Training for Speech Recognition
title_short Investigating Acoustic Modeling and Semi-supervised Discriminative Training for Speech Recognition
title_full Investigating Acoustic Modeling and Semi-supervised Discriminative Training for Speech Recognition
title_fullStr Investigating Acoustic Modeling and Semi-supervised Discriminative Training for Speech Recognition
title_full_unstemmed Investigating Acoustic Modeling and Semi-supervised Discriminative Training for Speech Recognition
title_sort investigating acoustic modeling and semi-supervised discriminative training for speech recognition
publishDate 2019
url http://ndltd.ncl.edu.tw/handle/88f353
work_keys_str_mv AT lotienhong investigatingacousticmodelingandsemisuperviseddiscriminativetrainingforspeechrecognition
AT luótiānhóng investigatingacousticmodelingandsemisuperviseddiscriminativetrainingforspeechrecognition
AT lotienhong tàntǎoshēngxuémóxínghuàjìshùyǔbànjiāndūjiànbiéshìxùnliànyúyǔyīnbiànshízhīyánjiū
AT luótiānhóng tàntǎoshēngxuémóxínghuàjìshùyǔbànjiāndūjiànbiéshìxùnliànyúyǔyīnbiànshízhīyánjiū
_version_ 1719179692528893952