Sparse Ternary Convolutional Neural Network Model and its Hardware Design

碩士 === 國立交通大學 === 電子研究所 === 106 === Convolutional neural networks(CNNs) blow up in the last few years. The performance is impressive especially in the computer vision field. However, the computation complexity of state-of-art models is very high. As the result, powerful GPU is ne...

Full description

Bibliographic Details
Main Authors:	Chiu,Kuan-Lin, 邱冠霖
Other Authors:	張添烜
Format:	Others
Language:	en_US
Published:	2017
Online Access:	http://ndltd.ncl.edu.tw/handle/vkwg7p

id	ndltd-TW-106NCTU5428076
record_format	oai_dc
spelling	ndltd-TW-106NCTU54280762019-05-16T00:08:11Z http://ndltd.ncl.edu.tw/handle/vkwg7p Sparse Ternary Convolutional Neural Network Model and its Hardware Design 稀疏三元卷積類神經網路模型及其硬體設計 Chiu,Kuan-Lin 邱冠霖碩士國立交通大學電子研究所 106 Convolutional neural networks(CNNs) blow up in the last few years. The performance is impressive especially in the computer vision field. However, the computation complexity of state-of-art models is very high. As the result, powerful GPU is needed to compute CNN models. Several works try to reduce the computation by quantizing weights and activations. But quantizing models directly may have negative effect on accuracy. Thus, this thesis propose a systematic method named progressive quantization to simplified models when training. We could simplify weights from floating point to ternary values and quantize activation from floating point to fixed point values when training models. Besides, we also simplify batch normalization at proper time. Training models through our method, the accuracy drops in ResNet-56 and DenseNet-40 are 1.61% and 3.9% respectively in our experiment. On the other side, this thesis also propose a compatible hardware. We import only non-zero values to our accelerator by sparse matrix loading and group-sort and merge method. In addition, we make good use of the ternary weights to replace multipliers to multiplexers and shift operators. As for data reuse, this thesis propose input view convolution to reduce the dependency between convolution inputs and propose PE cooperation to calculation in high level parallel with few inputs in different output feature maps. At last, an implementation synthetized with TSMC 40nm process consumes 3.28M gate counts. ResNet-56 with CIFAR10 and ResNet-34 with ImageNet could arrive 1684FPS and 80FPS respectively with the implementation under 500MHz clock frequency. 張添烜 2017 學位論文 ; thesis 68 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	碩士 === 國立交通大學 === 電子研究所 === 106 === Convolutional neural networks(CNNs) blow up in the last few years. The performance is impressive especially in the computer vision field. However, the computation complexity of state-of-art models is very high. As the result, powerful GPU is needed to compute CNN models. Several works try to reduce the computation by quantizing weights and activations. But quantizing models directly may have negative effect on accuracy. Thus, this thesis propose a systematic method named progressive quantization to simplified models when training. We could simplify weights from floating point to ternary values and quantize activation from floating point to fixed point values when training models. Besides, we also simplify batch normalization at proper time. Training models through our method, the accuracy drops in ResNet-56 and DenseNet-40 are 1.61% and 3.9% respectively in our experiment. On the other side, this thesis also propose a compatible hardware. We import only non-zero values to our accelerator by sparse matrix loading and group-sort and merge method. In addition, we make good use of the ternary weights to replace multipliers to multiplexers and shift operators. As for data reuse, this thesis propose input view convolution to reduce the dependency between convolution inputs and propose PE cooperation to calculation in high level parallel with few inputs in different output feature maps. At last, an implementation synthetized with TSMC 40nm process consumes 3.28M gate counts. ResNet-56 with CIFAR10 and ResNet-34 with ImageNet could arrive 1684FPS and 80FPS respectively with the implementation under 500MHz clock frequency.
author2	張添烜
author_facet	張添烜 Chiu,Kuan-Lin 邱冠霖
author	Chiu,Kuan-Lin 邱冠霖
spellingShingle	Chiu,Kuan-Lin 邱冠霖 Sparse Ternary Convolutional Neural Network Model and its Hardware Design
author_sort	Chiu,Kuan-Lin
title	Sparse Ternary Convolutional Neural Network Model and its Hardware Design
title_short	Sparse Ternary Convolutional Neural Network Model and its Hardware Design
title_full	Sparse Ternary Convolutional Neural Network Model and its Hardware Design
title_fullStr	Sparse Ternary Convolutional Neural Network Model and its Hardware Design
title_full_unstemmed	Sparse Ternary Convolutional Neural Network Model and its Hardware Design
title_sort	sparse ternary convolutional neural network model and its hardware design
publishDate	2017
url	http://ndltd.ncl.edu.tw/handle/vkwg7p
work_keys_str_mv	AT chiukuanlin sparseternaryconvolutionalneuralnetworkmodelanditshardwaredesign AT qiūguānlín sparseternaryconvolutionalneuralnetworkmodelanditshardwaredesign AT chiukuanlin xīshūsānyuánjuǎnjīlèishénjīngwǎnglùmóxíngjíqíyìngtǐshèjì AT qiūguānlín xīshūsānyuánjuǎnjīlèishénjīngwǎnglùmóxíngjíqíyìngtǐshèjì
_version_	1719161762513682432

Sparse Ternary Convolutional Neural Network Model and its Hardware Design

Similar Items