Data and Hardware Efficient Design for Convolutional Neural Network

碩士 === 國立交通大學 === 電子研究所 === 105 === Deep convolutional neural networks (CNNs) have achieved state-of-the-art accuracy on the recognition, detection, and other computer vision fields. However, its hardware design faces challenges of high computational complexity and data bandwidth as well as huge div...

Full description

Bibliographic Details
Main Authors: Lin, Yue-Jin, 林岳縉
Other Authors: Chang, Tian-Sheuan
Format: Others
Language:en_US
Published: 2016
Online Access:http://ndltd.ncl.edu.tw/handle/53096384893402910541
id ndltd-TW-105NCTU5428020
record_format oai_dc
spelling ndltd-TW-105NCTU54280202017-09-06T04:22:26Z http://ndltd.ncl.edu.tw/handle/53096384893402910541 Data and Hardware Efficient Design for Convolutional Neural Network 適用於卷積類神經網路之高效率硬體加速器設計 Lin, Yue-Jin 林岳縉 碩士 國立交通大學 電子研究所 105 Deep convolutional neural networks (CNNs) have achieved state-of-the-art accuracy on the recognition, detection, and other computer vision fields. However, its hardware design faces challenges of high computational complexity and data bandwidth as well as huge divergence in different CNN network layers. In which, the throughput of the convolutional layer would be bounded by available hardware resource and throughput of the fully connected layer would be bounded by available data bandwidth. Thus, a highly flexible design is desired to meet these needs. This thesis will present our end-to-end CNN accelerator that maximizes hardware utilization to 100% with run-time multiple kernel size configurations and minimizes data bandwidth with the output first strategy to improve data reuse of the convolutional layers by up to 300X~600X compared to the non-reused case. The whole CNN implementation of the target network is generated optimally for both hardware and data efficiency under design resource constraints, and this implementation is run-time reconfigured by the layer optimized parameters to achieve real-time and end-to-end CNN acceleration. An implementation example for Alexnet consumes 1.783M gate count for 216 MACs and 142.64 KB internal buffer with TSMC 40nm process, and achieves 99.7 fps and 61.6 fps under 454 MHz clock frequency for the convolutional layers and all layers of the AlexNet respectively. Chang, Tian-Sheuan 張添烜 2016 學位論文 ; thesis 87 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立交通大學 === 電子研究所 === 105 === Deep convolutional neural networks (CNNs) have achieved state-of-the-art accuracy on the recognition, detection, and other computer vision fields. However, its hardware design faces challenges of high computational complexity and data bandwidth as well as huge divergence in different CNN network layers. In which, the throughput of the convolutional layer would be bounded by available hardware resource and throughput of the fully connected layer would be bounded by available data bandwidth. Thus, a highly flexible design is desired to meet these needs. This thesis will present our end-to-end CNN accelerator that maximizes hardware utilization to 100% with run-time multiple kernel size configurations and minimizes data bandwidth with the output first strategy to improve data reuse of the convolutional layers by up to 300X~600X compared to the non-reused case. The whole CNN implementation of the target network is generated optimally for both hardware and data efficiency under design resource constraints, and this implementation is run-time reconfigured by the layer optimized parameters to achieve real-time and end-to-end CNN acceleration. An implementation example for Alexnet consumes 1.783M gate count for 216 MACs and 142.64 KB internal buffer with TSMC 40nm process, and achieves 99.7 fps and 61.6 fps under 454 MHz clock frequency for the convolutional layers and all layers of the AlexNet respectively.
author2 Chang, Tian-Sheuan
author_facet Chang, Tian-Sheuan
Lin, Yue-Jin
林岳縉
author Lin, Yue-Jin
林岳縉
spellingShingle Lin, Yue-Jin
林岳縉
Data and Hardware Efficient Design for Convolutional Neural Network
author_sort Lin, Yue-Jin
title Data and Hardware Efficient Design for Convolutional Neural Network
title_short Data and Hardware Efficient Design for Convolutional Neural Network
title_full Data and Hardware Efficient Design for Convolutional Neural Network
title_fullStr Data and Hardware Efficient Design for Convolutional Neural Network
title_full_unstemmed Data and Hardware Efficient Design for Convolutional Neural Network
title_sort data and hardware efficient design for convolutional neural network
publishDate 2016
url http://ndltd.ncl.edu.tw/handle/53096384893402910541
work_keys_str_mv AT linyuejin dataandhardwareefficientdesignforconvolutionalneuralnetwork
AT línyuèjìn dataandhardwareefficientdesignforconvolutionalneuralnetwork
AT linyuejin shìyòngyújuǎnjīlèishénjīngwǎnglùzhīgāoxiàolǜyìngtǐjiāsùqìshèjì
AT línyuèjìn shìyòngyújuǎnjīlèishénjīngwǎnglùzhīgāoxiàolǜyìngtǐjiāsùqìshèjì
_version_ 1718527850465722368