A Study on the Generation and Adjustment of Prosodic Information for CELP-based Text-to-Speech Conversion

碩士 === 國立成功大學 === 資訊及電子工程研究所 === 83 === In this thesis, a CELP-based text-to-speech conversion system is presented. We take 1410 Mandarin Chinese monosyllables as the basic synthetic units in this system. The Code Excited Linear Prediction...

Full description

Bibliographic Details
Main Authors: Chuang,Hsin-Chung, 莊欣中
Other Authors: Chung-Hsien Wu
Format: Others
Language:zh-TW
Published: 1995
Online Access:http://ndltd.ncl.edu.tw/handle/83645301219524983534
Description
Summary:碩士 === 國立成功大學 === 資訊及電子工程研究所 === 83 === In this thesis, a CELP-based text-to-speech conversion system is presented. We take 1410 Mandarin Chinese monosyllables as the basic synthetic units in this system. The Code Excited Linear Prediction (CELP) algorithm is applied to our speech synthesizer for high compression rate and good speech quality. In order to improve the naturalness of the synthetic speech, a method for prosodic modification is proposed to replace the traditional rule-based approach for pronunciation. At first, a total of 12 representative pitch contour patterns are defined for the behavior of four lexical tones and a neutral tone in Mandarin Chinese. By the observation, it appears that the acoustic properties of a syllable may be affected by the different concatenation condition in a sentence. Consequently, a Bayesian network is employed to model the relation between fluctuation of pitch contour and linguistic features. This network is trained by a set of sentence utterance and provides appropriate prosodic information for adjusting the synthetic speech in the synthesis process. The synthetic speech has been tested on 20 subjects. The results indicated that the average correct rate is 96.65% for intelligibility, and the ratio for the mean opinion score above "fair" level is 84.31% for naturalness.