A Design of Training Efficiency Improving Strategy for Mandarin Speech Recognition System - A Case Study on Business Name Querying System and Phrase Recognition System

碩士 === 國立中山大學 === 電機工程學系研究所 === 106 === Speech recognition technology has been widely used in our daily life. Voice navigation system for vehicles and intelligent personal assistant for mobile phones are the products of its application. As the number of speech recognition users dramatically increa...

Full description

Bibliographic Details
Main Authors: Chang-Hung Lee, 李常宏
Other Authors: Ben-Shung Chow
Format: Others
Language:zh-TW
Published: 2018
Online Access:http://ndltd.ncl.edu.tw/handle/jkr9e6
Description
Summary:碩士 === 國立中山大學 === 電機工程學系研究所 === 106 === Speech recognition technology has been widely used in our daily life. Voice navigation system for vehicles and intelligent personal assistant for mobile phones are the products of its application. As the number of speech recognition users dramatically increases, how to improve the speech training efficiency becomes an objective that the system developers must constantly endeavor. This thesis investigates the training efficiency improving strategy for Mandarin speech recognition system. Phonetically balanced and chained word selection is proposed to reduce the training size from 2,699 to 1,449 two-syllable words. A case study on business name querying system and phrase recognition system is used to demonstrate the effectiveness of this system. The experimental results indicate that the correct rates are not significantly decreased compared to those of the 2,699-word training method. Mandarin Chinese is a mono-syllable language. It can be categorized into 415 monotonic classes without intonation, and 1,340 classes with intonation. In this thesis, Mel-frequency cepstral coefficients and linear predicted cepstral coefficients are used to extract the bi-parametric speech features. Hidden Markov model is then applied to estimate the probabilistic properties of each syllable and establish the real recognition system. A four-tone classifier is further designed to improve the system accuracy by the two dimensional pitch statistics of the tones. Two databases of 194,512 Mandarin phrases and 303,971 business names are collected for system evaluation. The training time of the speech system using the proposed word set can be reduced by 51.17%. Under the Intel Core i5 notebook with 2.3 GHz CPU and the macOS Sierra operating system environment, the correct rates of 93.59% and 95.11% can be achieved respectively