Very Low Bit-Rate Speech Codec System
碩士 === 國立中正大學 === 電機工程學系 === 85 === lntelligible speech is very important in human communication. We use speech to express meaning, thought, and emotion, etc. Therefore, speech is the most natural communication way. In order to utilize spee...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
1997
|
Online Access: | http://ndltd.ncl.edu.tw/handle/70285674174504945569 |
id |
ndltd-TW-085CCU00442041 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-085CCU004420412015-10-13T12:14:44Z http://ndltd.ncl.edu.tw/handle/70285674174504945569 Very Low Bit-Rate Speech Codec System 超低位元率之語音編解碼系統 Tsou, Kai-Ming 鄒開明 碩士 國立中正大學 電機工程學系 85 lntelligible speech is very important in human communication. We use speech to express meaning, thought, and emotion, etc. Therefore, speech is the most natural communication way. In order to utilize speech in some applications, like digital mobile phone and voice mails etc., speech must be recorded in memory using a digital style. Without compression, it would be too expensive to store or transmit speech signals by network which requires much memory and transmission cost. Hence, we need a method to reduce the transmission cost and storage memory. In other words, we must develop algorithms to compress human speech. Here, we proposed a speech coding algorithm which consists of the following features: (1) linear predictive filter (2) mixed 4-state excitations, (3) adaptive filter bands, and (4) perceptual weighted filter. The linear predictive filter could accurately mimic the vocal tract of human speech. The feature of 4-state mixed excitations which are voiced, unvoiced, onset, and offset can eliminate the buzzy quality in a single excitation. The adaptive filter bands can effectively analyze the LPC residual signals with a better frequency resolution. We also utilize the perceptual weighted filter to sharpen the formant resonances. According to our tests, the proposed algorithm can reach 3.0 MOS in the clean female speech and 2.9 MOS in the clean male speech. When considering noisy speech with a signal-to-noise ratio of 0dB, our algorithm can achieve 2.3 MOS in female and male speech, where the MOS value of original speech is about 2.6. The proposed speech coding algorithm can yield intelligible quality for clean and noisy speech with a bit rate of 1.52 kbits/sec. A cost-effective programmable Harvard architecture has been developed for the processing of the proposed 4-state mixed excitation codec. The proposed processor can have much simple decoding of instructions, optimized bandwidths of control units, a small size of an instruction memory. With the minimized bandwidths and complexities of functional units, the power consumption is also reduced. At a system clock above 20 MHz, our processor can provide a real-time speech codec processing for various speech applications. lntelligible speech is very important in human communication. We use speech to express meaning, thought, and emotion, etc. Therefore, speech is the most natural communication way. In order to utilize speech in some applications, like digital mobile phone and voice mails etc., speech must be recorded in memory using a digital style. Without compression, it would be too expensive to store or transmit speech signals by network which requires much memory and transmission cost. Hence, we need a method to reduce the transmission cost and storage memory. In other words, we must develop algorithms to compress human speech. Here, we proposed a speech coding algorithm which consists of the following features: (1) linear predictive filter (2) mixed 4-state excitations, (3) adaptive filter bands, and (4) perceptual weighted filter. The linear predictive filter could accurately mimic the vocal tract of human speech. The feature of 4-state mixed excitations which are voiced, unvoiced, onset, and offset can eliminate the buzzy quality in a single excitation. The adaptive filter bands can effectively analyze the LPC residual signals with a better frequency resolution. We also utilize the perceptual weighted filter to sharpen the formant resonances. According to our tests, the proposed algorithm can reach 3.0 MOS in the clean female speech and 2.9 MOS in the clean male speech. When considering noisy speech with a signal-to-noise ratio of 0dB, our algorithm can achieve 2.3 MOS in female and male speech, where the MOS value of original speech is about 2.6. The proposed speech coding algorithm can yield intelligible quality for clean and noisy speech with a bit rate of 1.52 kbits/sec. A cost-effective programmable Harvard architecture has been developed for the processing of the proposed 4-state mixed excitation codec. The proposed processor can have much simple decoding of instructions, optimized bandwidths of control units, a small size of an instruction memory. With the minimized bandwidths and complexities of functional units, the power consumption is also reduced. At a system clock above 20 MHz, our processor can provide a real-time speech codec processing for various speech applications. Oscal T.-C. Chen 陳自強 1997 學位論文 ; thesis 52 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
author2 |
Oscal T.-C. Chen |
author_facet |
Oscal T.-C. Chen Tsou, Kai-Ming 鄒開明 |
author |
Tsou, Kai-Ming 鄒開明 |
spellingShingle |
Tsou, Kai-Ming 鄒開明 Very Low Bit-Rate Speech Codec System |
author_sort |
Tsou, Kai-Ming |
title |
Very Low Bit-Rate Speech Codec System |
title_short |
Very Low Bit-Rate Speech Codec System |
title_full |
Very Low Bit-Rate Speech Codec System |
title_fullStr |
Very Low Bit-Rate Speech Codec System |
title_full_unstemmed |
Very Low Bit-Rate Speech Codec System |
title_sort |
very low bit-rate speech codec system |
publishDate |
1997 |
url |
http://ndltd.ncl.edu.tw/handle/70285674174504945569 |
work_keys_str_mv |
AT tsoukaiming verylowbitratespeechcodecsystem AT zōukāimíng verylowbitratespeechcodecsystem AT tsoukaiming chāodīwèiyuánlǜzhīyǔyīnbiānjiěmǎxìtǒng AT zōukāimíng chāodīwèiyuánlǜzhīyǔyīnbiānjiěmǎxìtǒng |
_version_ |
1716855199008555008 |
description |
碩士 === 國立中正大學 === 電機工程學系 === 85 === lntelligible speech is very important in human communication. We
use speech to express meaning, thought, and emotion, etc.
Therefore, speech is the most natural communication way. In
order to utilize speech in some applications, like digital
mobile phone and voice mails etc., speech must be recorded in
memory using a digital style. Without compression, it would be
too expensive to store or transmit speech signals by network
which requires much memory and transmission cost. Hence, we need
a method to reduce the transmission cost and storage memory. In
other words, we must develop algorithms to compress human
speech. Here, we proposed a speech coding algorithm which
consists of the following features: (1) linear predictive filter
(2) mixed 4-state excitations, (3) adaptive filter bands, and
(4) perceptual weighted filter. The linear predictive filter
could accurately mimic the vocal tract of human speech. The
feature of 4-state mixed excitations which are voiced, unvoiced,
onset, and offset can eliminate the buzzy quality in a single
excitation. The adaptive filter bands can effectively analyze
the LPC residual signals with a better frequency resolution. We
also utilize the perceptual weighted filter to sharpen the
formant resonances. According to our tests, the proposed
algorithm can reach 3.0 MOS in the clean female speech and 2.9
MOS in the clean male speech. When considering noisy speech with
a signal-to-noise ratio of 0dB, our algorithm can achieve 2.3
MOS in female and male speech, where the MOS value of original
speech is about 2.6. The proposed speech coding algorithm can
yield intelligible quality for clean and noisy speech with a bit
rate of 1.52 kbits/sec. A cost-effective programmable Harvard
architecture has been developed for the processing of the
proposed 4-state mixed excitation codec. The proposed processor
can have much simple decoding of instructions, optimized
bandwidths of control units, a small size of an instruction
memory. With the minimized bandwidths and complexities of
functional units, the power consumption is also reduced. At a
system clock above 20 MHz, our processor can provide a real-time
speech codec processing for various speech applications.
lntelligible speech is very important in human communication. We
use speech to express meaning, thought, and emotion, etc.
Therefore, speech is the most natural communication way. In
order to utilize speech in some applications, like digital
mobile phone and voice mails etc., speech must be recorded in
memory using a digital style. Without compression, it would be
too expensive to store or transmit speech signals by network
which requires much memory and transmission cost. Hence, we need
a method to reduce the transmission cost and storage memory. In
other words, we must develop algorithms to compress human
speech. Here, we proposed a speech coding algorithm which
consists of the following features: (1) linear predictive filter
(2) mixed 4-state excitations, (3) adaptive filter bands, and
(4) perceptual weighted filter. The linear predictive filter
could accurately mimic the vocal tract of human speech. The
feature of 4-state mixed excitations which are voiced, unvoiced,
onset, and offset can eliminate the buzzy quality in a single
excitation. The adaptive filter bands can effectively analyze
the LPC residual signals with a better frequency resolution. We
also utilize the perceptual weighted filter to sharpen the
formant resonances. According to our tests, the proposed
algorithm can reach 3.0 MOS in the clean female speech and 2.9
MOS in the clean male speech. When considering noisy speech with
a signal-to-noise ratio of 0dB, our algorithm can achieve 2.3
MOS in female and male speech, where the MOS value of original
speech is about 2.6. The proposed speech coding algorithm can
yield intelligible quality for clean and noisy speech with a bit
rate of 1.52 kbits/sec. A cost-effective programmable Harvard
architecture has been developed for the processing of the
proposed 4-state mixed excitation codec. The proposed processor
can have much simple decoding of instructions, optimized
bandwidths of control units, a small size of an instruction
memory. With the minimized bandwidths and complexities of
functional units, the power consumption is also reduced. At a
system clock above 20 MHz, our processor can provide a real-time
speech codec processing for various speech applications.
|