Improving Dialogue State Tracking by incorporating multiple Automatic Speech Recognition results

碩士 === 國立中央大學 === 資訊工程學系 === 106 === Nowadays, the development of dialogue systems has changed the communication between human and computer. In the past, people use commands or instructions to ask computers to do tasks. We expect the computer can understand the user intent in the dialogue, and accom...

Full description

Bibliographic Details
Main Authors:	Yu-Cheng Hsiao, 蕭又誠
Other Authors:	Tzong-Han Tsai
Format:	Others
Language:	zh-TW
Published:	2018
Online Access:	http://ndltd.ncl.edu.tw/handle/6jpdk9

id	ndltd-TW-106NCU05392016
record_format	oai_dc
spelling	ndltd-TW-106NCU053920162019-05-16T00:15:46Z http://ndltd.ncl.edu.tw/handle/6jpdk9 Improving Dialogue State Tracking by incorporating multiple Automatic Speech Recognition results 藉由加入多重語音辨識結果來改善對話狀態追蹤 Yu-Cheng Hsiao 蕭又誠碩士國立中央大學資訊工程學系 106 Nowadays, the development of dialogue systems has changed the communication between human and computer. In the past, people use commands or instructions to ask computers to do tasks. We expect the computer can understand the user intent in the dialogue, and accomplish the user goal. Unlike chit-chat bots, the purpose of task-oriented dialogue systems (TDS) is to accomplish specific tasks, like booking restaurants. So the complexity of TDS’s is more difficult than that of chi-chat bots. First, a TDS needs to understand the user intent by Language Understanding (LU). Second, a TDS requires dialog management to perform dialog state tracking (DST) and dialog policy selection. At last, the system generates the natural language sentence respond to users. Dialogue management is most difficult in the task-oriented dialogue system structure. Our research is focused on dialog state tracking. We use the Dialog State Tracking Challenge 2(DSTC2) dataset in our experiment. According to the statistics, the Word Error Rate of automatic speech recognition (ASR) is 30%. Most of studies only used the top ASR result as the input of their models for DST. We propose to use multiple ASR results. We use reinforcement learning to select useful rank ASR results in addition to the top-1. And use DST model to predict the dialog state of the selected ASR results. The final step is aggregating all the dialog states as our system’s output. Our method can achieve an accuracy of 59.98% in the test set, showing that our method is better than the baseline which just uses top ASR result as the input. In the future, we plan to use language understanding information of the ASR results in our method. Tzong-Han Tsai 蔡宗翰 2018 學位論文 ; thesis 39 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立中央大學 === 資訊工程學系 === 106 === Nowadays, the development of dialogue systems has changed the communication between human and computer. In the past, people use commands or instructions to ask computers to do tasks. We expect the computer can understand the user intent in the dialogue, and accomplish the user goal. Unlike chit-chat bots, the purpose of task-oriented dialogue systems (TDS) is to accomplish specific tasks, like booking restaurants. So the complexity of TDS’s is more difficult than that of chi-chat bots. First, a TDS needs to understand the user intent by Language Understanding (LU). Second, a TDS requires dialog management to perform dialog state tracking (DST) and dialog policy selection. At last, the system generates the natural language sentence respond to users. Dialogue management is most difficult in the task-oriented dialogue system structure. Our research is focused on dialog state tracking. We use the Dialog State Tracking Challenge 2(DSTC2) dataset in our experiment. According to the statistics, the Word Error Rate of automatic speech recognition (ASR) is 30%. Most of studies only used the top ASR result as the input of their models for DST. We propose to use multiple ASR results. We use reinforcement learning to select useful rank ASR results in addition to the top-1. And use DST model to predict the dialog state of the selected ASR results. The final step is aggregating all the dialog states as our system’s output. Our method can achieve an accuracy of 59.98% in the test set, showing that our method is better than the baseline which just uses top ASR result as the input. In the future, we plan to use language understanding information of the ASR results in our method.
author2	Tzong-Han Tsai
author_facet	Tzong-Han Tsai Yu-Cheng Hsiao 蕭又誠
author	Yu-Cheng Hsiao 蕭又誠
spellingShingle	Yu-Cheng Hsiao 蕭又誠 Improving Dialogue State Tracking by incorporating multiple Automatic Speech Recognition results
author_sort	Yu-Cheng Hsiao
title	Improving Dialogue State Tracking by incorporating multiple Automatic Speech Recognition results
title_short	Improving Dialogue State Tracking by incorporating multiple Automatic Speech Recognition results
title_full	Improving Dialogue State Tracking by incorporating multiple Automatic Speech Recognition results
title_fullStr	Improving Dialogue State Tracking by incorporating multiple Automatic Speech Recognition results
title_full_unstemmed	Improving Dialogue State Tracking by incorporating multiple Automatic Speech Recognition results
title_sort	improving dialogue state tracking by incorporating multiple automatic speech recognition results
publishDate	2018
url	http://ndltd.ncl.edu.tw/handle/6jpdk9
work_keys_str_mv	AT yuchenghsiao improvingdialoguestatetrackingbyincorporatingmultipleautomaticspeechrecognitionresults AT xiāoyòuchéng improvingdialoguestatetrackingbyincorporatingmultipleautomaticspeechrecognitionresults AT yuchenghsiao jíyóujiārùduōzhòngyǔyīnbiànshíjiéguǒláigǎishànduìhuàzhuàngtàizhuīzōng AT xiāoyòuchéng jíyóujiārùduōzhòngyǔyīnbiànshíjiéguǒláigǎishànduìhuàzhuàngtàizhuīzōng
_version_	1719163737739362304

Improving Dialogue State Tracking by incorporating multiple Automatic Speech Recognition results

Similar Items