Data and Hardware Efficient Design for Convolutional Neural Network

碩士 === 國立交通大學 === 電子研究所 === 105 === Deep convolutional neural networks (CNNs) have achieved state-of-the-art accuracy on the recognition, detection, and other computer vision fields. However, its hardware design faces challenges of high computational complexity and data bandwidth as well as huge div...

Full description

Bibliographic Details
Main Authors:	Lin, Yue-Jin, 林岳縉
Other Authors:	Chang, Tian-Sheuan
Format:	Others
Language:	en_US
Published:	2016
Online Access:	http://ndltd.ncl.edu.tw/handle/53096384893402910541

id	ndltd-TW-105NCTU5428020
record_format	oai_dc
spelling	ndltd-TW-105NCTU54280202017-09-06T04:22:26Z http://ndltd.ncl.edu.tw/handle/53096384893402910541 Data and Hardware Efficient Design for Convolutional Neural Network 適用於卷積類神經網路之高效率硬體加速器設計 Lin, Yue-Jin 林岳縉碩士國立交通大學電子研究所 105 Deep convolutional neural networks (CNNs) have achieved state-of-the-art accuracy on the recognition, detection, and other computer vision fields. However, its hardware design faces challenges of high computational complexity and data bandwidth as well as huge divergence in different CNN network layers. In which, the throughput of the convolutional layer would be bounded by available hardware resource and throughput of the fully connected layer would be bounded by available data bandwidth. Thus, a highly flexible design is desired to meet these needs. This thesis will present our end-to-end CNN accelerator that maximizes hardware utilization to 100% with run-time multiple kernel size configurations and minimizes data bandwidth with the output first strategy to improve data reuse of the convolutional layers by up to 300X~600X compared to the non-reused case. The whole CNN implementation of the target network is generated optimally for both hardware and data efficiency under design resource constraints, and this implementation is run-time reconfigured by the layer optimized parameters to achieve real-time and end-to-end CNN acceleration. An implementation example for Alexnet consumes 1.783M gate count for 216 MACs and 142.64 KB internal buffer with TSMC 40nm process, and achieves 99.7 fps and 61.6 fps under 454 MHz clock frequency for the convolutional layers and all layers of the AlexNet respectively. Chang, Tian-Sheuan 張添烜 2016 學位論文 ; thesis 87 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	碩士 === 國立交通大學 === 電子研究所 === 105 === Deep convolutional neural networks (CNNs) have achieved state-of-the-art accuracy on the recognition, detection, and other computer vision fields. However, its hardware design faces challenges of high computational complexity and data bandwidth as well as huge divergence in different CNN network layers. In which, the throughput of the convolutional layer would be bounded by available hardware resource and throughput of the fully connected layer would be bounded by available data bandwidth. Thus, a highly flexible design is desired to meet these needs. This thesis will present our end-to-end CNN accelerator that maximizes hardware utilization to 100% with run-time multiple kernel size configurations and minimizes data bandwidth with the output first strategy to improve data reuse of the convolutional layers by up to 300X~600X compared to the non-reused case. The whole CNN implementation of the target network is generated optimally for both hardware and data efficiency under design resource constraints, and this implementation is run-time reconfigured by the layer optimized parameters to achieve real-time and end-to-end CNN acceleration. An implementation example for Alexnet consumes 1.783M gate count for 216 MACs and 142.64 KB internal buffer with TSMC 40nm process, and achieves 99.7 fps and 61.6 fps under 454 MHz clock frequency for the convolutional layers and all layers of the AlexNet respectively.
author2	Chang, Tian-Sheuan
author_facet	Chang, Tian-Sheuan Lin, Yue-Jin 林岳縉
author	Lin, Yue-Jin 林岳縉
spellingShingle	Lin, Yue-Jin 林岳縉 Data and Hardware Efficient Design for Convolutional Neural Network
author_sort	Lin, Yue-Jin
title	Data and Hardware Efficient Design for Convolutional Neural Network
title_short	Data and Hardware Efficient Design for Convolutional Neural Network
title_full	Data and Hardware Efficient Design for Convolutional Neural Network
title_fullStr	Data and Hardware Efficient Design for Convolutional Neural Network
title_full_unstemmed	Data and Hardware Efficient Design for Convolutional Neural Network
title_sort	data and hardware efficient design for convolutional neural network
publishDate	2016
url	http://ndltd.ncl.edu.tw/handle/53096384893402910541
work_keys_str_mv	AT linyuejin dataandhardwareefficientdesignforconvolutionalneuralnetwork AT línyuèjìn dataandhardwareefficientdesignforconvolutionalneuralnetwork AT linyuejin shìyòngyújuǎnjīlèishénjīngwǎnglùzhīgāoxiàolǜyìngtǐjiāsùqìshèjì AT línyuèjìn shìyòngyújuǎnjīlèishénjīngwǎnglùzhīgāoxiàolǜyìngtǐjiāsùqìshèjì
_version_	1718527850465722368

Data and Hardware Efficient Design for Convolutional Neural Network

Similar Items