Very Low Bit-Rate Speech Codec System

碩士 === 國立中正大學 === 電機工程學系 === 85 === lntelligible speech is very important in human communication. We use speech to express meaning, thought, and emotion, etc. Therefore, speech is the most natural communication way. In order to utilize spee...

Full description

Bibliographic Details
Main Authors:	Tsou, Kai-Ming, 鄒開明
Other Authors:	Oscal T.-C. Chen
Format:	Others
Language:	zh-TW
Published:	1997
Online Access:	http://ndltd.ncl.edu.tw/handle/70285674174504945569

id	ndltd-TW-085CCU00442041
record_format	oai_dc
spelling	ndltd-TW-085CCU004420412015-10-13T12:14:44Z http://ndltd.ncl.edu.tw/handle/70285674174504945569 Very Low Bit-Rate Speech Codec System 超低位元率之語音編解碼系統 Tsou, Kai-Ming 鄒開明碩士國立中正大學電機工程學系 85 lntelligible speech is very important in human communication. We use speech to express meaning, thought, and emotion, etc. Therefore, speech is the most natural communication way. In order to utilize speech in some applications, like digital mobile phone and voice mails etc., speech must be recorded in memory using a digital style. Without compression, it would be too expensive to store or transmit speech signals by network which requires much memory and transmission cost. Hence, we need a method to reduce the transmission cost and storage memory. In other words, we must develop algorithms to compress human speech. Here, we proposed a speech coding algorithm which consists of the following features: (1) linear predictive filter (2) mixed 4-state excitations, (3) adaptive filter bands, and (4) perceptual weighted filter. The linear predictive filter could accurately mimic the vocal tract of human speech. The feature of 4-state mixed excitations which are voiced, unvoiced, onset, and offset can eliminate the buzzy quality in a single excitation. The adaptive filter bands can effectively analyze the LPC residual signals with a better frequency resolution. We also utilize the perceptual weighted filter to sharpen the formant resonances. According to our tests, the proposed algorithm can reach 3.0 MOS in the clean female speech and 2.9 MOS in the clean male speech. When considering noisy speech with a signal-to-noise ratio of 0dB, our algorithm can achieve 2.3 MOS in female and male speech, where the MOS value of original speech is about 2.6. The proposed speech coding algorithm can yield intelligible quality for clean and noisy speech with a bit rate of 1.52 kbits/sec. A cost-effective programmable Harvard architecture has been developed for the processing of the proposed 4-state mixed excitation codec. The proposed processor can have much simple decoding of instructions, optimized bandwidths of control units, a small size of an instruction memory. With the minimized bandwidths and complexities of functional units, the power consumption is also reduced. At a system clock above 20 MHz, our processor can provide a real-time speech codec processing for various speech applications. lntelligible speech is very important in human communication. We use speech to express meaning, thought, and emotion, etc. Therefore, speech is the most natural communication way. In order to utilize speech in some applications, like digital mobile phone and voice mails etc., speech must be recorded in memory using a digital style. Without compression, it would be too expensive to store or transmit speech signals by network which requires much memory and transmission cost. Hence, we need a method to reduce the transmission cost and storage memory. In other words, we must develop algorithms to compress human speech. Here, we proposed a speech coding algorithm which consists of the following features: (1) linear predictive filter (2) mixed 4-state excitations, (3) adaptive filter bands, and (4) perceptual weighted filter. The linear predictive filter could accurately mimic the vocal tract of human speech. The feature of 4-state mixed excitations which are voiced, unvoiced, onset, and offset can eliminate the buzzy quality in a single excitation. The adaptive filter bands can effectively analyze the LPC residual signals with a better frequency resolution. We also utilize the perceptual weighted filter to sharpen the formant resonances. According to our tests, the proposed algorithm can reach 3.0 MOS in the clean female speech and 2.9 MOS in the clean male speech. When considering noisy speech with a signal-to-noise ratio of 0dB, our algorithm can achieve 2.3 MOS in female and male speech, where the MOS value of original speech is about 2.6. The proposed speech coding algorithm can yield intelligible quality for clean and noisy speech with a bit rate of 1.52 kbits/sec. A cost-effective programmable Harvard architecture has been developed for the processing of the proposed 4-state mixed excitation codec. The proposed processor can have much simple decoding of instructions, optimized bandwidths of control units, a small size of an instruction memory. With the minimized bandwidths and complexities of functional units, the power consumption is also reduced. At a system clock above 20 MHz, our processor can provide a real-time speech codec processing for various speech applications. Oscal T.-C. Chen 陳自強 1997 學位論文 ; thesis 52 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
author2	Oscal T.-C. Chen
author_facet	Oscal T.-C. Chen Tsou, Kai-Ming 鄒開明
author	Tsou, Kai-Ming 鄒開明
spellingShingle	Tsou, Kai-Ming 鄒開明 Very Low Bit-Rate Speech Codec System
author_sort	Tsou, Kai-Ming
title	Very Low Bit-Rate Speech Codec System
title_short	Very Low Bit-Rate Speech Codec System
title_full	Very Low Bit-Rate Speech Codec System
title_fullStr	Very Low Bit-Rate Speech Codec System
title_full_unstemmed	Very Low Bit-Rate Speech Codec System
title_sort	very low bit-rate speech codec system
publishDate	1997
url	http://ndltd.ncl.edu.tw/handle/70285674174504945569
work_keys_str_mv	AT tsoukaiming verylowbitratespeechcodecsystem AT zōukāimíng verylowbitratespeechcodecsystem AT tsoukaiming chāodīwèiyuánlǜzhīyǔyīnbiānjiěmǎxìtǒng AT zōukāimíng chāodīwèiyuánlǜzhīyǔyīnbiānjiěmǎxìtǒng
_version_	1716855199008555008
description	碩士 === 國立中正大學 === 電機工程學系 === 85 === lntelligible speech is very important in human communication. We use speech to express meaning, thought, and emotion, etc. Therefore, speech is the most natural communication way. In order to utilize speech in some applications, like digital mobile phone and voice mails etc., speech must be recorded in memory using a digital style. Without compression, it would be too expensive to store or transmit speech signals by network which requires much memory and transmission cost. Hence, we need a method to reduce the transmission cost and storage memory. In other words, we must develop algorithms to compress human speech. Here, we proposed a speech coding algorithm which consists of the following features: (1) linear predictive filter (2) mixed 4-state excitations, (3) adaptive filter bands, and (4) perceptual weighted filter. The linear predictive filter could accurately mimic the vocal tract of human speech. The feature of 4-state mixed excitations which are voiced, unvoiced, onset, and offset can eliminate the buzzy quality in a single excitation. The adaptive filter bands can effectively analyze the LPC residual signals with a better frequency resolution. We also utilize the perceptual weighted filter to sharpen the formant resonances. According to our tests, the proposed algorithm can reach 3.0 MOS in the clean female speech and 2.9 MOS in the clean male speech. When considering noisy speech with a signal-to-noise ratio of 0dB, our algorithm can achieve 2.3 MOS in female and male speech, where the MOS value of original speech is about 2.6. The proposed speech coding algorithm can yield intelligible quality for clean and noisy speech with a bit rate of 1.52 kbits/sec. A cost-effective programmable Harvard architecture has been developed for the processing of the proposed 4-state mixed excitation codec. The proposed processor can have much simple decoding of instructions, optimized bandwidths of control units, a small size of an instruction memory. With the minimized bandwidths and complexities of functional units, the power consumption is also reduced. At a system clock above 20 MHz, our processor can provide a real-time speech codec processing for various speech applications. lntelligible speech is very important in human communication. We use speech to express meaning, thought, and emotion, etc. Therefore, speech is the most natural communication way. In order to utilize speech in some applications, like digital mobile phone and voice mails etc., speech must be recorded in memory using a digital style. Without compression, it would be too expensive to store or transmit speech signals by network which requires much memory and transmission cost. Hence, we need a method to reduce the transmission cost and storage memory. In other words, we must develop algorithms to compress human speech. Here, we proposed a speech coding algorithm which consists of the following features: (1) linear predictive filter (2) mixed 4-state excitations, (3) adaptive filter bands, and (4) perceptual weighted filter. The linear predictive filter could accurately mimic the vocal tract of human speech. The feature of 4-state mixed excitations which are voiced, unvoiced, onset, and offset can eliminate the buzzy quality in a single excitation. The adaptive filter bands can effectively analyze the LPC residual signals with a better frequency resolution. We also utilize the perceptual weighted filter to sharpen the formant resonances. According to our tests, the proposed algorithm can reach 3.0 MOS in the clean female speech and 2.9 MOS in the clean male speech. When considering noisy speech with a signal-to-noise ratio of 0dB, our algorithm can achieve 2.3 MOS in female and male speech, where the MOS value of original speech is about 2.6. The proposed speech coding algorithm can yield intelligible quality for clean and noisy speech with a bit rate of 1.52 kbits/sec. A cost-effective programmable Harvard architecture has been developed for the processing of the proposed 4-state mixed excitation codec. The proposed processor can have much simple decoding of instructions, optimized bandwidths of control units, a small size of an instruction memory. With the minimized bandwidths and complexities of functional units, the power consumption is also reduced. At a system clock above 20 MHz, our processor can provide a real-time speech codec processing for various speech applications.

Very Low Bit-Rate Speech Codec System

Similar Items