Very Low Bit-Rate Speech Codec System

碩士 === 國立中正大學 === 電機工程學系 === 85 === lntelligible speech is very important in human communication. We use speech to express meaning, thought, and emotion, etc. Therefore, speech is the most natural communication way. In order to utilize spee...

Full description

Bibliographic Details
Main Authors: Tsou, Kai-Ming, 鄒開明
Other Authors: Oscal T.-C. Chen
Format: Others
Language:zh-TW
Published: 1997
Online Access:http://ndltd.ncl.edu.tw/handle/70285674174504945569
id ndltd-TW-085CCU00442041
record_format oai_dc
spelling ndltd-TW-085CCU004420412015-10-13T12:14:44Z http://ndltd.ncl.edu.tw/handle/70285674174504945569 Very Low Bit-Rate Speech Codec System 超低位元率之語音編解碼系統 Tsou, Kai-Ming 鄒開明 碩士 國立中正大學 電機工程學系 85 lntelligible speech is very important in human communication. We use speech to express meaning, thought, and emotion, etc. Therefore, speech is the most natural communication way. In order to utilize speech in some applications, like digital mobile phone and voice mails etc., speech must be recorded in memory using a digital style. Without compression, it would be too expensive to store or transmit speech signals by network which requires much memory and transmission cost. Hence, we need a method to reduce the transmission cost and storage memory. In other words, we must develop algorithms to compress human speech. Here, we proposed a speech coding algorithm which consists of the following features: (1) linear predictive filter (2) mixed 4-state excitations, (3) adaptive filter bands, and (4) perceptual weighted filter. The linear predictive filter could accurately mimic the vocal tract of human speech. The feature of 4-state mixed excitations which are voiced, unvoiced, onset, and offset can eliminate the buzzy quality in a single excitation. The adaptive filter bands can effectively analyze the LPC residual signals with a better frequency resolution. We also utilize the perceptual weighted filter to sharpen the formant resonances. According to our tests, the proposed algorithm can reach 3.0 MOS in the clean female speech and 2.9 MOS in the clean male speech. When considering noisy speech with a signal-to-noise ratio of 0dB, our algorithm can achieve 2.3 MOS in female and male speech, where the MOS value of original speech is about 2.6. The proposed speech coding algorithm can yield intelligible quality for clean and noisy speech with a bit rate of 1.52 kbits/sec. A cost-effective programmable Harvard architecture has been developed for the processing of the proposed 4-state mixed excitation codec. The proposed processor can have much simple decoding of instructions, optimized bandwidths of control units, a small size of an instruction memory. With the minimized bandwidths and complexities of functional units, the power consumption is also reduced. At a system clock above 20 MHz, our processor can provide a real-time speech codec processing for various speech applications. lntelligible speech is very important in human communication. We use speech to express meaning, thought, and emotion, etc. Therefore, speech is the most natural communication way. In order to utilize speech in some applications, like digital mobile phone and voice mails etc., speech must be recorded in memory using a digital style. Without compression, it would be too expensive to store or transmit speech signals by network which requires much memory and transmission cost. Hence, we need a method to reduce the transmission cost and storage memory. In other words, we must develop algorithms to compress human speech. Here, we proposed a speech coding algorithm which consists of the following features: (1) linear predictive filter (2) mixed 4-state excitations, (3) adaptive filter bands, and (4) perceptual weighted filter. The linear predictive filter could accurately mimic the vocal tract of human speech. The feature of 4-state mixed excitations which are voiced, unvoiced, onset, and offset can eliminate the buzzy quality in a single excitation. The adaptive filter bands can effectively analyze the LPC residual signals with a better frequency resolution. We also utilize the perceptual weighted filter to sharpen the formant resonances. According to our tests, the proposed algorithm can reach 3.0 MOS in the clean female speech and 2.9 MOS in the clean male speech. When considering noisy speech with a signal-to-noise ratio of 0dB, our algorithm can achieve 2.3 MOS in female and male speech, where the MOS value of original speech is about 2.6. The proposed speech coding algorithm can yield intelligible quality for clean and noisy speech with a bit rate of 1.52 kbits/sec. A cost-effective programmable Harvard architecture has been developed for the processing of the proposed 4-state mixed excitation codec. The proposed processor can have much simple decoding of instructions, optimized bandwidths of control units, a small size of an instruction memory. With the minimized bandwidths and complexities of functional units, the power consumption is also reduced. At a system clock above 20 MHz, our processor can provide a real-time speech codec processing for various speech applications. Oscal T.-C. Chen 陳自強 1997 學位論文 ; thesis 52 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
author2 Oscal T.-C. Chen
author_facet Oscal T.-C. Chen
Tsou, Kai-Ming
鄒開明
author Tsou, Kai-Ming
鄒開明
spellingShingle Tsou, Kai-Ming
鄒開明
Very Low Bit-Rate Speech Codec System
author_sort Tsou, Kai-Ming
title Very Low Bit-Rate Speech Codec System
title_short Very Low Bit-Rate Speech Codec System
title_full Very Low Bit-Rate Speech Codec System
title_fullStr Very Low Bit-Rate Speech Codec System
title_full_unstemmed Very Low Bit-Rate Speech Codec System
title_sort very low bit-rate speech codec system
publishDate 1997
url http://ndltd.ncl.edu.tw/handle/70285674174504945569
work_keys_str_mv AT tsoukaiming verylowbitratespeechcodecsystem
AT zōukāimíng verylowbitratespeechcodecsystem
AT tsoukaiming chāodīwèiyuánlǜzhīyǔyīnbiānjiěmǎxìtǒng
AT zōukāimíng chāodīwèiyuánlǜzhīyǔyīnbiānjiěmǎxìtǒng
_version_ 1716855199008555008
description 碩士 === 國立中正大學 === 電機工程學系 === 85 === lntelligible speech is very important in human communication. We use speech to express meaning, thought, and emotion, etc. Therefore, speech is the most natural communication way. In order to utilize speech in some applications, like digital mobile phone and voice mails etc., speech must be recorded in memory using a digital style. Without compression, it would be too expensive to store or transmit speech signals by network which requires much memory and transmission cost. Hence, we need a method to reduce the transmission cost and storage memory. In other words, we must develop algorithms to compress human speech. Here, we proposed a speech coding algorithm which consists of the following features: (1) linear predictive filter (2) mixed 4-state excitations, (3) adaptive filter bands, and (4) perceptual weighted filter. The linear predictive filter could accurately mimic the vocal tract of human speech. The feature of 4-state mixed excitations which are voiced, unvoiced, onset, and offset can eliminate the buzzy quality in a single excitation. The adaptive filter bands can effectively analyze the LPC residual signals with a better frequency resolution. We also utilize the perceptual weighted filter to sharpen the formant resonances. According to our tests, the proposed algorithm can reach 3.0 MOS in the clean female speech and 2.9 MOS in the clean male speech. When considering noisy speech with a signal-to-noise ratio of 0dB, our algorithm can achieve 2.3 MOS in female and male speech, where the MOS value of original speech is about 2.6. The proposed speech coding algorithm can yield intelligible quality for clean and noisy speech with a bit rate of 1.52 kbits/sec. A cost-effective programmable Harvard architecture has been developed for the processing of the proposed 4-state mixed excitation codec. The proposed processor can have much simple decoding of instructions, optimized bandwidths of control units, a small size of an instruction memory. With the minimized bandwidths and complexities of functional units, the power consumption is also reduced. At a system clock above 20 MHz, our processor can provide a real-time speech codec processing for various speech applications. lntelligible speech is very important in human communication. We use speech to express meaning, thought, and emotion, etc. Therefore, speech is the most natural communication way. In order to utilize speech in some applications, like digital mobile phone and voice mails etc., speech must be recorded in memory using a digital style. Without compression, it would be too expensive to store or transmit speech signals by network which requires much memory and transmission cost. Hence, we need a method to reduce the transmission cost and storage memory. In other words, we must develop algorithms to compress human speech. Here, we proposed a speech coding algorithm which consists of the following features: (1) linear predictive filter (2) mixed 4-state excitations, (3) adaptive filter bands, and (4) perceptual weighted filter. The linear predictive filter could accurately mimic the vocal tract of human speech. The feature of 4-state mixed excitations which are voiced, unvoiced, onset, and offset can eliminate the buzzy quality in a single excitation. The adaptive filter bands can effectively analyze the LPC residual signals with a better frequency resolution. We also utilize the perceptual weighted filter to sharpen the formant resonances. According to our tests, the proposed algorithm can reach 3.0 MOS in the clean female speech and 2.9 MOS in the clean male speech. When considering noisy speech with a signal-to-noise ratio of 0dB, our algorithm can achieve 2.3 MOS in female and male speech, where the MOS value of original speech is about 2.6. The proposed speech coding algorithm can yield intelligible quality for clean and noisy speech with a bit rate of 1.52 kbits/sec. A cost-effective programmable Harvard architecture has been developed for the processing of the proposed 4-state mixed excitation codec. The proposed processor can have much simple decoding of instructions, optimized bandwidths of control units, a small size of an instruction memory. With the minimized bandwidths and complexities of functional units, the power consumption is also reduced. At a system clock above 20 MHz, our processor can provide a real-time speech codec processing for various speech applications.