Measuring the performance of isolated spoken Malay speech recognition using Multi-layer Neural Networks

This paper describes speech signal modeling techniques which are suited to high performance and robust isolated word recognition. In this study, a speech recognition system is presented, specifically an isolated spoken Malay word recognizer which uses spontaneous and formally speeches collected from...

Full description

Bibliographic Details
Main Authors: Bakar, N.A (Author), Bakar, Z.A (Author), Seman, N. (Author)
Format: Article
Language:English
Subjects:
Online Access:View Fulltext in Publisher
View in Scopus
LEADER 03420nas a2200529Ia 4500
001 10.1109-CSSR.2010.5773762
008 220112c20109999CNT?? ? 0 0und d
020 |a 9781424489862 (ISBN) 
245 1 0 |a Measuring the performance of isolated spoken Malay speech recognition using Multi-layer Neural Networks 
856 |z View Fulltext in Publisher  |u https://doi.org/10.1109/CSSR.2010.5773762 
856 |z View in Scopus  |u https://www.scopus.com/inward/record.uri?eid=2-s2.0-79959647601&doi=10.1109%2fCSSR.2010.5773762&partnerID=40&md5=4c8517b18be16768c7bdf680c55a24be 
520 3 |a This paper describes speech signal modeling techniques which are suited to high performance and robust isolated word recognition. In this study, a speech recognition system is presented, specifically an isolated spoken Malay word recognizer which uses spontaneous and formally speeches collected from Parliament of Malaysia. Currently the vocabulary is limited to 25 words that can be pronounced exactly as it written and controls the distribution of the vocalic segments. The speech segmentation task is achieved by adopted energy based parameter and zero crossing rate measure with modification to better locates the beginning and ending points of speech from the spoken words. The training and recognition processes are realized by using Multi-layer Perceptron (MLP) Neural Networks with two-layer network configurations that are trained with stochastic error back-propagation to adjust its weights and biases after presentation of every training data. The Mel-frequency Cepstral Coefficients (MFCCs) has been chosen as speech extraction approach from each segmented utterance as characteristic features for the word recognizer. Recognition results showed that the performance of the two-layer networks increased as the numbers of hidden neurons increased. The best network structures average classification rate is 84.731% with (150-25) configuration. Implementation results also showed that the conjugate gradient (CG) algorithm was more accurate and reliable than the Levenberg-Marquardt (LM) algorithm for the network complexities and data size considered in this study. © 2010 IEEE. 
650 0 4 |a Algorithms 
650 0 4 |a Backpropagation 
650 0 4 |a Back-propagation 
650 0 4 |a Classification rates 
650 0 4 |a Conjugate gradient algorithms 
650 0 4 |a Conjugate gradient method 
650 0 4 |a Data size 
650 0 4 |a Hidden Neuron 
650 0 4 |a Hidden neurons 
650 0 4 |a Isolated word recognition 
650 0 4 |a Levenberg-Marquardt algorithm 
650 0 4 |a Malaysia 
650 0 4 |a Mel-frequency cepstral coefficients 
650 0 4 |a Melfrequency Cepstral Coefficients 
650 0 4 |a Multi layer perceptron 
650 0 4 |a Multi-layer Perceptron 
650 0 4 |a Natural language processing systems 
650 0 4 |a Network complexity 
650 0 4 |a Network layers 
650 0 4 |a Network structures 
650 0 4 |a Neural networks 
650 0 4 |a Recognition process 
650 0 4 |a Speech extraction 
650 0 4 |a Speech recognition 
650 0 4 |a Speech recognition systems 
650 0 4 |a Speech segmentation 
650 0 4 |a Speech signal modeling 
650 0 4 |a Spoken words 
650 0 4 |a Stochastic errors 
650 0 4 |a Training data 
650 0 4 |a Two-layer network 
650 0 4 |a Zero crossing rate 
700 1 0 |a Bakar, N.A.  |e author 
700 1 0 |a Bakar, Z.A.  |e author 
700 1 0 |a Seman, N.  |e author