Turkish Speech Recognition Based On Deep Neural Networks

In this paper we develop a Turkish speech recognition (SR) system using deep neural networks and compare it with the previous state-of-the-art traditional Gaussian mixture model-hidden Markov model (GMM-HMM) method using the same Turkish speech dataset and the same large vocabulary Turkish corpus....

Full description

Bibliographic Details
Main Authors:	Ussen Abre KIMANUKA, Osman BUYUK
Format:	Article
Language:	English
Published:	Suleyman Demirel University 2018-10-01
Series:	Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi
Subjects:	Turkish speech recognition Deep neural network; Gaussian mixture model; Hidden markov model; GMM-HMM; DNN-HMM
Online Access:	http://dergipark.org.tr/sdufenbed/issue/39695/470071?publisher=sdu-1

id	doaj-741e2e06f0fc4892961c2ebeec3c4b3b
record_format	Article
spelling	doaj-741e2e06f0fc4892961c2ebeec3c4b3b2020-11-24T22:43:28ZengSuleyman Demirel UniversitySüleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi1308-65292018-10-01223193291113Turkish Speech Recognition Based On Deep Neural NetworksUssen Abre KIMANUKAOsman BUYUKIn this paper we develop a Turkish speech recognition (SR) system using deep neural networks and compare it with the previous state-of-the-art traditional Gaussian mixture model-hidden Markov model (GMM-HMM) method using the same Turkish speech dataset and the same large vocabulary Turkish corpus. Nowadays most SR systems deployed worldwide and particularly in Turkey use Hidden Markov Models to deal with the speech temporal variations. Gaussian mixture models are used to estimate the amount at which each state of each HMM fits a short frame of coefficients which is the representation of an acoustic input. A deep neural network consisting of feed-forward neural network is another way to estimate the fit; this neural network takes as input several frames of coefficients and gives as output posterior probabilities over HMM states. It has been shown that the use of deep neural networks can outperform the traditional GMM-HMM in other languages such as English and German. The fact that Turkish language is an agglutinative language and the lack of a huge amount of speech data complicate the design of a performant SR system. By making use of deep neural networks we will obviously improve the performance but still we will not achieve better result than English language due to the difference in the availability of speech data. We present various architectural and training techniques for the Turkish DNN-based models. The models are tested using a Turkish database collected from mobile devices. In the experiments, we observe that the Turkish DNN-HMM system have decreased the word error rate approximately 2.5% when compared to the GMM-HMM traditional system.http://dergipark.org.tr/sdufenbed/issue/39695/470071?publisher=sdu-1Turkish speech recognitionDeep neural network; Gaussian mixture model; Hidden markov model; GMM-HMM; DNN-HMM
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Ussen Abre KIMANUKA Osman BUYUK
spellingShingle	Ussen Abre KIMANUKA Osman BUYUK Turkish Speech Recognition Based On Deep Neural Networks Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi Turkish speech recognition Deep neural network; Gaussian mixture model; Hidden markov model; GMM-HMM; DNN-HMM
author_facet	Ussen Abre KIMANUKA Osman BUYUK
author_sort	Ussen Abre KIMANUKA
title	Turkish Speech Recognition Based On Deep Neural Networks
title_short	Turkish Speech Recognition Based On Deep Neural Networks
title_full	Turkish Speech Recognition Based On Deep Neural Networks
title_fullStr	Turkish Speech Recognition Based On Deep Neural Networks
title_full_unstemmed	Turkish Speech Recognition Based On Deep Neural Networks
title_sort	turkish speech recognition based on deep neural networks
publisher	Suleyman Demirel University
series	Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi
issn	1308-6529
publishDate	2018-10-01
description	In this paper we develop a Turkish speech recognition (SR) system using deep neural networks and compare it with the previous state-of-the-art traditional Gaussian mixture model-hidden Markov model (GMM-HMM) method using the same Turkish speech dataset and the same large vocabulary Turkish corpus. Nowadays most SR systems deployed worldwide and particularly in Turkey use Hidden Markov Models to deal with the speech temporal variations. Gaussian mixture models are used to estimate the amount at which each state of each HMM fits a short frame of coefficients which is the representation of an acoustic input. A deep neural network consisting of feed-forward neural network is another way to estimate the fit; this neural network takes as input several frames of coefficients and gives as output posterior probabilities over HMM states. It has been shown that the use of deep neural networks can outperform the traditional GMM-HMM in other languages such as English and German. The fact that Turkish language is an agglutinative language and the lack of a huge amount of speech data complicate the design of a performant SR system. By making use of deep neural networks we will obviously improve the performance but still we will not achieve better result than English language due to the difference in the availability of speech data. We present various architectural and training techniques for the Turkish DNN-based models. The models are tested using a Turkish database collected from mobile devices. In the experiments, we observe that the Turkish DNN-HMM system have decreased the word error rate approximately 2.5% when compared to the GMM-HMM traditional system.
topic	Turkish speech recognition Deep neural network; Gaussian mixture model; Hidden markov model; GMM-HMM; DNN-HMM
url	http://dergipark.org.tr/sdufenbed/issue/39695/470071?publisher=sdu-1
work_keys_str_mv	AT ussenabrekimanuka turkishspeechrecognitionbasedondeepneuralnetworks AT osmanbuyuk turkishspeechrecognitionbasedondeepneuralnetworks
_version_	1725695735771430912

Turkish Speech Recognition Based On Deep Neural Networks

Similar Items