Taiwanese Tone Recognition

碩士 === 國立清華大學 === 統計學研究所 === 102 === This thesis explores Taiwanese tone automatic recognition. Using DeZongGing (地藏經) Taiwanese speech corpus and the Hidden Markov Model Toolkit (HTK), we first segment a speech waveform into syllable segments. Then for each syllable segment, short time speech analy...

Full description

Bibliographic Details
Main Author: 林駿羽
Other Authors: 江永進
Format: Others
Language:zh-TW
Published: 2014
Online Access:http://ndltd.ncl.edu.tw/handle/4qpzjy
Description
Summary:碩士 === 國立清華大學 === 統計學研究所 === 102 === This thesis explores Taiwanese tone automatic recognition. Using DeZongGing (地藏經) Taiwanese speech corpus and the Hidden Markov Model Toolkit (HTK), we first segment a speech waveform into syllable segments. Then for each syllable segment, short time speech analysis is performed using acf/amdf (autocorrelation function divided by absolute mean difference function). Using these as basic features, we then explore two kinds of classifiers for Taiwanese tones. For the first kind, we further reduce the basic features into the coefficients of third order polynomial fit on the pitch tracks; pitch tracks can be obtained in a different number of ways, and we use the ridge of the largest island size in the acf/amdf map. With now four coefficients for each syllable, we then classify the syllables for their tones using LDA (linear discriminant analysis), QDA (quadratic discriminant analysis). Under cross validation, the accuracies of these classifiers range from 52% to 59%. For the second kind, we treat the basic features as a gray level picture, normalized them into size 28×28, and then use the Deep Belief Networks(DBN) for classification, as in the recognition case of hand written digits. The cross validation accuracies can go upto 72%, with or without noise perturbations.