End-to-End-Based Tibetan Multitask Speech Recognition

To date, speech recognition technology for majority languages has been applied in wireless communication devices successfully. However, as a minority language, Tibetan has very limited resources for conventional automatic speech recognition. It lacks of enough data, sub-word units, lexicons, and wor...

Full description

Bibliographic Details
Main Authors:	Yue Zhao, Jianjian Yue, Xiaona Xu, Licheng Wu, Xiali Li
Format:	Article
Language:	English
Published:	IEEE 2019-01-01
Series:	IEEE Access
Subjects:	End-to-end model multitask speech recognition Tibetan language wavenet model
Online Access:	https://ieeexplore.ieee.org/document/8894432/

id	doaj-90fe08bc87f745689b50ebe605a9421b
record_format	Article
spelling	doaj-90fe08bc87f745689b50ebe605a9421b2021-03-30T00:53:48ZengIEEEIEEE Access2169-35362019-01-01716251916252910.1109/ACCESS.2019.29524068894432End-to-End-Based Tibetan Multitask Speech RecognitionYue Zhao0https://orcid.org/0000-0002-4007-7016Jianjian Yue1Xiaona Xu2Licheng Wu3Xiali Li4School of Information and Engineering, Minzu University of China, Beijing, ChinaSchool of Information and Engineering, Minzu University of China, Beijing, ChinaSchool of Information and Engineering, Minzu University of China, Beijing, ChinaSchool of Information and Engineering, Minzu University of China, Beijing, ChinaSchool of Information and Engineering, Minzu University of China, Beijing, ChinaTo date, speech recognition technology for majority languages has been applied in wireless communication devices successfully. However, as a minority language, Tibetan has very limited resources for conventional automatic speech recognition. It lacks of enough data, sub-word units, lexicons, and word inventories for some dialects. In this paper, we present a multitask end-to-end model to perform simultaneous Tibetan speech content recognition, dialect identification, and speaker recognition. This model avoids processing the pronunciation dictionary and word segmentation for new dialects while allowing for training three tasks in a single model. We build the multitask recognition framework based on WaveNet-CTC. The dialect information and speaker ID are used in the output for training. The experimental results show that our method has better performance compared with a task-specific model.https://ieeexplore.ieee.org/document/8894432/End-to-end modelmultitask speech recognitionTibetan languagewavenet model
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Yue Zhao Jianjian Yue Xiaona Xu Licheng Wu Xiali Li
spellingShingle	Yue Zhao Jianjian Yue Xiaona Xu Licheng Wu Xiali Li End-to-End-Based Tibetan Multitask Speech Recognition IEEE Access End-to-end model multitask speech recognition Tibetan language wavenet model
author_facet	Yue Zhao Jianjian Yue Xiaona Xu Licheng Wu Xiali Li
author_sort	Yue Zhao
title	End-to-End-Based Tibetan Multitask Speech Recognition
title_short	End-to-End-Based Tibetan Multitask Speech Recognition
title_full	End-to-End-Based Tibetan Multitask Speech Recognition
title_fullStr	End-to-End-Based Tibetan Multitask Speech Recognition
title_full_unstemmed	End-to-End-Based Tibetan Multitask Speech Recognition
title_sort	end-to-end-based tibetan multitask speech recognition
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2019-01-01
description	To date, speech recognition technology for majority languages has been applied in wireless communication devices successfully. However, as a minority language, Tibetan has very limited resources for conventional automatic speech recognition. It lacks of enough data, sub-word units, lexicons, and word inventories for some dialects. In this paper, we present a multitask end-to-end model to perform simultaneous Tibetan speech content recognition, dialect identification, and speaker recognition. This model avoids processing the pronunciation dictionary and word segmentation for new dialects while allowing for training three tasks in a single model. We build the multitask recognition framework based on WaveNet-CTC. The dialect information and speaker ID are used in the output for training. The experimental results show that our method has better performance compared with a task-specific model.
topic	End-to-end model multitask speech recognition Tibetan language wavenet model
url	https://ieeexplore.ieee.org/document/8894432/
work_keys_str_mv	AT yuezhao endtoendbasedtibetanmultitaskspeechrecognition AT jianjianyue endtoendbasedtibetanmultitaskspeechrecognition AT xiaonaxu endtoendbasedtibetanmultitaskspeechrecognition AT lichengwu endtoendbasedtibetanmultitaskspeechrecognition AT xialili endtoendbasedtibetanmultitaskspeechrecognition
_version_	1724187722660184064

End-to-End-Based Tibetan Multitask Speech Recognition

Similar Items