A Study of Reconfigurable Multi-Core VLSI Architecture Design for Speaker-Speech Recognition

博士 === 國立成功大學 === 電機工程學系 === 103 === In multimedia applications, audio processing usually requires amounts of high-dimensional computations. When intelligent applications grow up, automatic speaker-speech recognition (ASSR), which requires extraction, recognition, and learning (ERL) functions, are m...

Full description

Bibliographic Details
Main Authors: Chih-HsaingPeng, 彭志祥
Other Authors: Jhing-Fa Wang
Format: Others
Language:en_US
Published: 2015
Online Access:http://ndltd.ncl.edu.tw/handle/cwj2k3
id ndltd-TW-103NCKU5442026
record_format oai_dc
spelling ndltd-TW-103NCKU54420262019-05-15T21:59:10Z http://ndltd.ncl.edu.tw/handle/cwj2k3 A Study of Reconfigurable Multi-Core VLSI Architecture Design for Speaker-Speech Recognition 語者語音辨識之可重組式多核心積體電路架構設計之研究 Chih-HsaingPeng 彭志祥 博士 國立成功大學 電機工程學系 103 In multimedia applications, audio processing usually requires amounts of high-dimensional computations. When intelligent applications grow up, automatic speaker-speech recognition (ASSR), which requires extraction, recognition, and learning (ERL) functions, are more and more popular. Conventional VLSI designs focus on the enhancement of single component; in addition, most chip solutions belong to high-cost and non-specified design for ASSR. This dissertation proposes a novel reconfigurable multi-core architecture which has five self-reconfigurable modes and four pre-configurable modes for low cost and high efficiency. According to this architecture, this work focuses on ASSR to design a high-dimensional processing ability chip with real-time performance. In the first part of this dissertation, this work uses a hardware/software co-design to analyze the bottleneck of whole ERL system. In the second part of this dissertation, in the bottleneck of training phase, learning is realized by hardware acceleration which has tri-mode reconfigurable ability. Compared with the baseline, the proposed work has higher usage rate of hardware. Therefore, in a low-cost limitation, this work still has high speed factor. The work is completed based on a standard library for 0.18 um CMOS technology. The chip requires a die size of 8.6 mm2 and a power comsuption of 77.33 mW to achieve 31% less gate count and 16-fold improvement of learning speed. In the third part of this dissertation, to consider the performance of whole ERL system, extraction and recognition hardware are integrated into the previous design. Because the bottleneck of testing phase comes to the extraction part, the hardware acceleration of extraction is realized. The hardware of recognition is designed for low cost. The work is manufactured based on a standard library for 90 nm CMOS technology. The chip requires a die size of 4.3 mm2 and a power comsuption of 8.9 mW to achieve 3-fold improvement of extraction speed with 26% increase of gate count. The next work integrates extraction and recognition architecture into reconfigurable architecture efficiently. The reconfigurable architecture becomes the mixing of five self-reconfigurable modes and four pre-configurable modes. The simulation results show that 3-fold improvement of extraction speed requires 17% decrease of gate count. In the fourth part of this dissertation, the specification is to achieve lower cost. Accordingly, this work presents a novel algorithm, namely binary halved clustering (BHC) to replace the conventional training method, that is, sequential minimal optimization (SMO). Compared with the popular algorithm, K-means, the proposed algorithm can save 87% less computational quantity and an average accuracy of 92.7%. This system can be applied in a case study of automatic speech-speaker recognition, and it achieves both low-computation time and high accuracy. This work is also manufactured based on a standard library for 90 nm CMOS technology. The chip requires a die size of 2.2 mm2 and a power comsuption of 8.74 mW. Jhing-Fa Wang 王駿發 2015 學位論文 ; thesis 99 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 博士 === 國立成功大學 === 電機工程學系 === 103 === In multimedia applications, audio processing usually requires amounts of high-dimensional computations. When intelligent applications grow up, automatic speaker-speech recognition (ASSR), which requires extraction, recognition, and learning (ERL) functions, are more and more popular. Conventional VLSI designs focus on the enhancement of single component; in addition, most chip solutions belong to high-cost and non-specified design for ASSR. This dissertation proposes a novel reconfigurable multi-core architecture which has five self-reconfigurable modes and four pre-configurable modes for low cost and high efficiency. According to this architecture, this work focuses on ASSR to design a high-dimensional processing ability chip with real-time performance. In the first part of this dissertation, this work uses a hardware/software co-design to analyze the bottleneck of whole ERL system. In the second part of this dissertation, in the bottleneck of training phase, learning is realized by hardware acceleration which has tri-mode reconfigurable ability. Compared with the baseline, the proposed work has higher usage rate of hardware. Therefore, in a low-cost limitation, this work still has high speed factor. The work is completed based on a standard library for 0.18 um CMOS technology. The chip requires a die size of 8.6 mm2 and a power comsuption of 77.33 mW to achieve 31% less gate count and 16-fold improvement of learning speed. In the third part of this dissertation, to consider the performance of whole ERL system, extraction and recognition hardware are integrated into the previous design. Because the bottleneck of testing phase comes to the extraction part, the hardware acceleration of extraction is realized. The hardware of recognition is designed for low cost. The work is manufactured based on a standard library for 90 nm CMOS technology. The chip requires a die size of 4.3 mm2 and a power comsuption of 8.9 mW to achieve 3-fold improvement of extraction speed with 26% increase of gate count. The next work integrates extraction and recognition architecture into reconfigurable architecture efficiently. The reconfigurable architecture becomes the mixing of five self-reconfigurable modes and four pre-configurable modes. The simulation results show that 3-fold improvement of extraction speed requires 17% decrease of gate count. In the fourth part of this dissertation, the specification is to achieve lower cost. Accordingly, this work presents a novel algorithm, namely binary halved clustering (BHC) to replace the conventional training method, that is, sequential minimal optimization (SMO). Compared with the popular algorithm, K-means, the proposed algorithm can save 87% less computational quantity and an average accuracy of 92.7%. This system can be applied in a case study of automatic speech-speaker recognition, and it achieves both low-computation time and high accuracy. This work is also manufactured based on a standard library for 90 nm CMOS technology. The chip requires a die size of 2.2 mm2 and a power comsuption of 8.74 mW.
author2 Jhing-Fa Wang
author_facet Jhing-Fa Wang
Chih-HsaingPeng
彭志祥
author Chih-HsaingPeng
彭志祥
spellingShingle Chih-HsaingPeng
彭志祥
A Study of Reconfigurable Multi-Core VLSI Architecture Design for Speaker-Speech Recognition
author_sort Chih-HsaingPeng
title A Study of Reconfigurable Multi-Core VLSI Architecture Design for Speaker-Speech Recognition
title_short A Study of Reconfigurable Multi-Core VLSI Architecture Design for Speaker-Speech Recognition
title_full A Study of Reconfigurable Multi-Core VLSI Architecture Design for Speaker-Speech Recognition
title_fullStr A Study of Reconfigurable Multi-Core VLSI Architecture Design for Speaker-Speech Recognition
title_full_unstemmed A Study of Reconfigurable Multi-Core VLSI Architecture Design for Speaker-Speech Recognition
title_sort study of reconfigurable multi-core vlsi architecture design for speaker-speech recognition
publishDate 2015
url http://ndltd.ncl.edu.tw/handle/cwj2k3
work_keys_str_mv AT chihhsaingpeng astudyofreconfigurablemulticorevlsiarchitecturedesignforspeakerspeechrecognition
AT péngzhìxiáng astudyofreconfigurablemulticorevlsiarchitecturedesignforspeakerspeechrecognition
AT chihhsaingpeng yǔzhěyǔyīnbiànshízhīkězhòngzǔshìduōhéxīnjītǐdiànlùjiàgòushèjìzhīyánjiū
AT péngzhìxiáng yǔzhěyǔyīnbiànshízhīkězhòngzǔshìduōhéxīnjītǐdiànlùjiàgòushèjìzhīyánjiū
AT chihhsaingpeng studyofreconfigurablemulticorevlsiarchitecturedesignforspeakerspeechrecognition
AT péngzhìxiáng studyofreconfigurablemulticorevlsiarchitecturedesignforspeakerspeechrecognition
_version_ 1719122123622973440