A Study of Reconfigurable Multi-Core VLSI Architecture Design for Speaker-Speech Recognition
博士 === 國立成功大學 === 電機工程學系 === 103 === In multimedia applications, audio processing usually requires amounts of high-dimensional computations. When intelligent applications grow up, automatic speaker-speech recognition (ASSR), which requires extraction, recognition, and learning (ERL) functions, are m...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2015
|
Online Access: | http://ndltd.ncl.edu.tw/handle/cwj2k3 |
id |
ndltd-TW-103NCKU5442026 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-103NCKU54420262019-05-15T21:59:10Z http://ndltd.ncl.edu.tw/handle/cwj2k3 A Study of Reconfigurable Multi-Core VLSI Architecture Design for Speaker-Speech Recognition 語者語音辨識之可重組式多核心積體電路架構設計之研究 Chih-HsaingPeng 彭志祥 博士 國立成功大學 電機工程學系 103 In multimedia applications, audio processing usually requires amounts of high-dimensional computations. When intelligent applications grow up, automatic speaker-speech recognition (ASSR), which requires extraction, recognition, and learning (ERL) functions, are more and more popular. Conventional VLSI designs focus on the enhancement of single component; in addition, most chip solutions belong to high-cost and non-specified design for ASSR. This dissertation proposes a novel reconfigurable multi-core architecture which has five self-reconfigurable modes and four pre-configurable modes for low cost and high efficiency. According to this architecture, this work focuses on ASSR to design a high-dimensional processing ability chip with real-time performance. In the first part of this dissertation, this work uses a hardware/software co-design to analyze the bottleneck of whole ERL system. In the second part of this dissertation, in the bottleneck of training phase, learning is realized by hardware acceleration which has tri-mode reconfigurable ability. Compared with the baseline, the proposed work has higher usage rate of hardware. Therefore, in a low-cost limitation, this work still has high speed factor. The work is completed based on a standard library for 0.18 um CMOS technology. The chip requires a die size of 8.6 mm2 and a power comsuption of 77.33 mW to achieve 31% less gate count and 16-fold improvement of learning speed. In the third part of this dissertation, to consider the performance of whole ERL system, extraction and recognition hardware are integrated into the previous design. Because the bottleneck of testing phase comes to the extraction part, the hardware acceleration of extraction is realized. The hardware of recognition is designed for low cost. The work is manufactured based on a standard library for 90 nm CMOS technology. The chip requires a die size of 4.3 mm2 and a power comsuption of 8.9 mW to achieve 3-fold improvement of extraction speed with 26% increase of gate count. The next work integrates extraction and recognition architecture into reconfigurable architecture efficiently. The reconfigurable architecture becomes the mixing of five self-reconfigurable modes and four pre-configurable modes. The simulation results show that 3-fold improvement of extraction speed requires 17% decrease of gate count. In the fourth part of this dissertation, the specification is to achieve lower cost. Accordingly, this work presents a novel algorithm, namely binary halved clustering (BHC) to replace the conventional training method, that is, sequential minimal optimization (SMO). Compared with the popular algorithm, K-means, the proposed algorithm can save 87% less computational quantity and an average accuracy of 92.7%. This system can be applied in a case study of automatic speech-speaker recognition, and it achieves both low-computation time and high accuracy. This work is also manufactured based on a standard library for 90 nm CMOS technology. The chip requires a die size of 2.2 mm2 and a power comsuption of 8.74 mW. Jhing-Fa Wang 王駿發 2015 學位論文 ; thesis 99 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
博士 === 國立成功大學 === 電機工程學系 === 103 === In multimedia applications, audio processing usually requires amounts of high-dimensional computations. When intelligent applications grow up, automatic speaker-speech recognition (ASSR), which requires extraction, recognition, and learning (ERL) functions, are more and more popular. Conventional VLSI designs focus on the enhancement of single component; in addition, most chip solutions belong to high-cost and non-specified design for ASSR. This dissertation proposes a novel reconfigurable multi-core architecture which has five self-reconfigurable modes and four pre-configurable modes for low cost and high efficiency. According to this architecture, this work focuses on ASSR to design a high-dimensional processing ability chip with real-time performance.
In the first part of this dissertation, this work uses a hardware/software co-design to analyze the bottleneck of whole ERL system. In the second part of this dissertation, in the bottleneck of training phase, learning is realized by hardware acceleration which has tri-mode reconfigurable ability. Compared with the baseline, the proposed work has higher usage rate of hardware. Therefore, in a low-cost limitation, this work still has high speed factor. The work is completed based on a standard library for 0.18 um CMOS technology. The chip requires a die size of 8.6 mm2 and a power comsuption of 77.33 mW to achieve 31% less gate count and 16-fold improvement of learning speed.
In the third part of this dissertation, to consider the performance of whole ERL system, extraction and recognition hardware are integrated into the previous design. Because the bottleneck of testing phase comes to the extraction part, the hardware acceleration of extraction is realized. The hardware of recognition is designed for low cost. The work is manufactured based on a standard library for 90 nm CMOS technology. The chip requires a die size of 4.3 mm2 and a power comsuption of 8.9 mW to achieve 3-fold improvement of extraction speed with 26% increase of gate count. The next work integrates extraction and recognition architecture into reconfigurable architecture efficiently. The reconfigurable architecture becomes the mixing of five self-reconfigurable modes and four pre-configurable modes. The simulation results show that 3-fold improvement of extraction speed requires 17% decrease of gate count.
In the fourth part of this dissertation, the specification is to achieve lower cost. Accordingly, this work presents a novel algorithm, namely binary halved clustering (BHC) to replace the conventional training method, that is, sequential minimal optimization (SMO). Compared with the popular algorithm, K-means, the proposed algorithm can save 87% less computational quantity and an average accuracy of 92.7%. This system can be applied in a case study of automatic speech-speaker recognition, and it achieves both low-computation time and high accuracy. This work is also manufactured based on a standard library for 90 nm CMOS technology. The chip requires a die size of 2.2 mm2 and a power comsuption of 8.74 mW.
|
author2 |
Jhing-Fa Wang |
author_facet |
Jhing-Fa Wang Chih-HsaingPeng 彭志祥 |
author |
Chih-HsaingPeng 彭志祥 |
spellingShingle |
Chih-HsaingPeng 彭志祥 A Study of Reconfigurable Multi-Core VLSI Architecture Design for Speaker-Speech Recognition |
author_sort |
Chih-HsaingPeng |
title |
A Study of Reconfigurable Multi-Core VLSI Architecture Design for Speaker-Speech Recognition |
title_short |
A Study of Reconfigurable Multi-Core VLSI Architecture Design for Speaker-Speech Recognition |
title_full |
A Study of Reconfigurable Multi-Core VLSI Architecture Design for Speaker-Speech Recognition |
title_fullStr |
A Study of Reconfigurable Multi-Core VLSI Architecture Design for Speaker-Speech Recognition |
title_full_unstemmed |
A Study of Reconfigurable Multi-Core VLSI Architecture Design for Speaker-Speech Recognition |
title_sort |
study of reconfigurable multi-core vlsi architecture design for speaker-speech recognition |
publishDate |
2015 |
url |
http://ndltd.ncl.edu.tw/handle/cwj2k3 |
work_keys_str_mv |
AT chihhsaingpeng astudyofreconfigurablemulticorevlsiarchitecturedesignforspeakerspeechrecognition AT péngzhìxiáng astudyofreconfigurablemulticorevlsiarchitecturedesignforspeakerspeechrecognition AT chihhsaingpeng yǔzhěyǔyīnbiànshízhīkězhòngzǔshìduōhéxīnjītǐdiànlùjiàgòushèjìzhīyánjiū AT péngzhìxiáng yǔzhěyǔyīnbiànshízhīkězhòngzǔshìduōhéxīnjītǐdiànlùjiàgòushèjìzhīyánjiū AT chihhsaingpeng studyofreconfigurablemulticorevlsiarchitecturedesignforspeakerspeechrecognition AT péngzhìxiáng studyofreconfigurablemulticorevlsiarchitecturedesignforspeakerspeechrecognition |
_version_ |
1719122123622973440 |