Improving Dialogue State Tracking by incorporating multiple Automatic Speech Recognition results

碩士 === 國立中央大學 === 資訊工程學系 === 106 === Nowadays, the development of dialogue systems has changed the communication between human and computer. In the past, people use commands or instructions to ask computers to do tasks. We expect the computer can understand the user intent in the dialogue, and accom...

Full description

Bibliographic Details
Main Authors: Yu-Cheng Hsiao, 蕭又誠
Other Authors: Tzong-Han Tsai
Format: Others
Language:zh-TW
Published: 2018
Online Access:http://ndltd.ncl.edu.tw/handle/6jpdk9
id ndltd-TW-106NCU05392016
record_format oai_dc
spelling ndltd-TW-106NCU053920162019-05-16T00:15:46Z http://ndltd.ncl.edu.tw/handle/6jpdk9 Improving Dialogue State Tracking by incorporating multiple Automatic Speech Recognition results 藉由加入多重語音辨識結果來改善對話狀態追蹤 Yu-Cheng Hsiao 蕭又誠 碩士 國立中央大學 資訊工程學系 106 Nowadays, the development of dialogue systems has changed the communication between human and computer. In the past, people use commands or instructions to ask computers to do tasks. We expect the computer can understand the user intent in the dialogue, and accomplish the user goal. Unlike chit-chat bots, the purpose of task-oriented dialogue systems (TDS) is to accomplish specific tasks, like booking restaurants. So the complexity of TDS’s is more difficult than that of chi-chat bots. First, a TDS needs to understand the user intent by Language Understanding (LU). Second, a TDS requires dialog management to perform dialog state tracking (DST) and dialog policy selection. At last, the system generates the natural language sentence respond to users. Dialogue management is most difficult in the task-oriented dialogue system structure. Our research is focused on dialog state tracking. We use the Dialog State Tracking Challenge 2(DSTC2) dataset in our experiment. According to the statistics, the Word Error Rate of automatic speech recognition (ASR) is 30%. Most of studies only used the top ASR result as the input of their models for DST. We propose to use multiple ASR results. We use reinforcement learning to select useful rank ASR results in addition to the top-1. And use DST model to predict the dialog state of the selected ASR results. The final step is aggregating all the dialog states as our system’s output. Our method can achieve an accuracy of 59.98% in the test set, showing that our method is better than the baseline which just uses top ASR result as the input. In the future, we plan to use language understanding information of the ASR results in our method. Tzong-Han Tsai 蔡宗翰 2018 學位論文 ; thesis 39 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立中央大學 === 資訊工程學系 === 106 === Nowadays, the development of dialogue systems has changed the communication between human and computer. In the past, people use commands or instructions to ask computers to do tasks. We expect the computer can understand the user intent in the dialogue, and accomplish the user goal. Unlike chit-chat bots, the purpose of task-oriented dialogue systems (TDS) is to accomplish specific tasks, like booking restaurants. So the complexity of TDS’s is more difficult than that of chi-chat bots. First, a TDS needs to understand the user intent by Language Understanding (LU). Second, a TDS requires dialog management to perform dialog state tracking (DST) and dialog policy selection. At last, the system generates the natural language sentence respond to users. Dialogue management is most difficult in the task-oriented dialogue system structure. Our research is focused on dialog state tracking. We use the Dialog State Tracking Challenge 2(DSTC2) dataset in our experiment. According to the statistics, the Word Error Rate of automatic speech recognition (ASR) is 30%. Most of studies only used the top ASR result as the input of their models for DST. We propose to use multiple ASR results. We use reinforcement learning to select useful rank ASR results in addition to the top-1. And use DST model to predict the dialog state of the selected ASR results. The final step is aggregating all the dialog states as our system’s output. Our method can achieve an accuracy of 59.98% in the test set, showing that our method is better than the baseline which just uses top ASR result as the input. In the future, we plan to use language understanding information of the ASR results in our method.
author2 Tzong-Han Tsai
author_facet Tzong-Han Tsai
Yu-Cheng Hsiao
蕭又誠
author Yu-Cheng Hsiao
蕭又誠
spellingShingle Yu-Cheng Hsiao
蕭又誠
Improving Dialogue State Tracking by incorporating multiple Automatic Speech Recognition results
author_sort Yu-Cheng Hsiao
title Improving Dialogue State Tracking by incorporating multiple Automatic Speech Recognition results
title_short Improving Dialogue State Tracking by incorporating multiple Automatic Speech Recognition results
title_full Improving Dialogue State Tracking by incorporating multiple Automatic Speech Recognition results
title_fullStr Improving Dialogue State Tracking by incorporating multiple Automatic Speech Recognition results
title_full_unstemmed Improving Dialogue State Tracking by incorporating multiple Automatic Speech Recognition results
title_sort improving dialogue state tracking by incorporating multiple automatic speech recognition results
publishDate 2018
url http://ndltd.ncl.edu.tw/handle/6jpdk9
work_keys_str_mv AT yuchenghsiao improvingdialoguestatetrackingbyincorporatingmultipleautomaticspeechrecognitionresults
AT xiāoyòuchéng improvingdialoguestatetrackingbyincorporatingmultipleautomaticspeechrecognitionresults
AT yuchenghsiao jíyóujiārùduōzhòngyǔyīnbiànshíjiéguǒláigǎishànduìhuàzhuàngtàizhuīzōng
AT xiāoyòuchéng jíyóujiārùduōzhòngyǔyīnbiànshíjiéguǒláigǎishànduìhuàzhuàngtàizhuīzōng
_version_ 1719163737739362304