Sparse Ternary Convolutional Neural Network Model and its Hardware Design

碩士 === 國立交通大學 === 電子研究所 === 106 === Convolutional neural networks(CNNs) blow up in the last few years. The performance is impressive especially in the computer vision field. However, the computation complexity of state-of-art models is very high. As the result, powerful GPU is ne...

Full description

Bibliographic Details
Main Authors: Chiu,Kuan-Lin, 邱冠霖
Other Authors: 張添烜
Format: Others
Language:en_US
Published: 2017
Online Access:http://ndltd.ncl.edu.tw/handle/vkwg7p
id ndltd-TW-106NCTU5428076
record_format oai_dc
spelling ndltd-TW-106NCTU54280762019-05-16T00:08:11Z http://ndltd.ncl.edu.tw/handle/vkwg7p Sparse Ternary Convolutional Neural Network Model and its Hardware Design 稀疏三元卷積類神經網路模型及其硬體設計 Chiu,Kuan-Lin 邱冠霖 碩士 國立交通大學 電子研究所 106 Convolutional neural networks(CNNs) blow up in the last few years. The performance is impressive especially in the computer vision field. However, the computation complexity of state-of-art models is very high. As the result, powerful GPU is needed to compute CNN models. Several works try to reduce the computation by quantizing weights and activations. But quantizing models directly may have negative effect on accuracy. Thus, this thesis propose a systematic method named progressive quantization to simplified models when training. We could simplify weights from floating point to ternary values and quantize activation from floating point to fixed point values when training models. Besides, we also simplify batch normalization at proper time. Training models through our method, the accuracy drops in ResNet-56 and DenseNet-40 are 1.61% and 3.9% respectively in our experiment. On the other side, this thesis also propose a compatible hardware. We import only non-zero values to our accelerator by sparse matrix loading and group-sort and merge method. In addition, we make good use of the ternary weights to replace multipliers to multiplexers and shift operators. As for data reuse, this thesis propose input view convolution to reduce the dependency between convolution inputs and propose PE cooperation to calculation in high level parallel with few inputs in different output feature maps. At last, an implementation synthetized with TSMC 40nm process consumes 3.28M gate counts. ResNet-56 with CIFAR10 and ResNet-34 with ImageNet could arrive 1684FPS and 80FPS respectively with the implementation under 500MHz clock frequency. 張添烜 2017 學位論文 ; thesis 68 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立交通大學 === 電子研究所 === 106 === Convolutional neural networks(CNNs) blow up in the last few years. The performance is impressive especially in the computer vision field. However, the computation complexity of state-of-art models is very high. As the result, powerful GPU is needed to compute CNN models. Several works try to reduce the computation by quantizing weights and activations. But quantizing models directly may have negative effect on accuracy. Thus, this thesis propose a systematic method named progressive quantization to simplified models when training. We could simplify weights from floating point to ternary values and quantize activation from floating point to fixed point values when training models. Besides, we also simplify batch normalization at proper time. Training models through our method, the accuracy drops in ResNet-56 and DenseNet-40 are 1.61% and 3.9% respectively in our experiment. On the other side, this thesis also propose a compatible hardware. We import only non-zero values to our accelerator by sparse matrix loading and group-sort and merge method. In addition, we make good use of the ternary weights to replace multipliers to multiplexers and shift operators. As for data reuse, this thesis propose input view convolution to reduce the dependency between convolution inputs and propose PE cooperation to calculation in high level parallel with few inputs in different output feature maps. At last, an implementation synthetized with TSMC 40nm process consumes 3.28M gate counts. ResNet-56 with CIFAR10 and ResNet-34 with ImageNet could arrive 1684FPS and 80FPS respectively with the implementation under 500MHz clock frequency.
author2 張添烜
author_facet 張添烜
Chiu,Kuan-Lin
邱冠霖
author Chiu,Kuan-Lin
邱冠霖
spellingShingle Chiu,Kuan-Lin
邱冠霖
Sparse Ternary Convolutional Neural Network Model and its Hardware Design
author_sort Chiu,Kuan-Lin
title Sparse Ternary Convolutional Neural Network Model and its Hardware Design
title_short Sparse Ternary Convolutional Neural Network Model and its Hardware Design
title_full Sparse Ternary Convolutional Neural Network Model and its Hardware Design
title_fullStr Sparse Ternary Convolutional Neural Network Model and its Hardware Design
title_full_unstemmed Sparse Ternary Convolutional Neural Network Model and its Hardware Design
title_sort sparse ternary convolutional neural network model and its hardware design
publishDate 2017
url http://ndltd.ncl.edu.tw/handle/vkwg7p
work_keys_str_mv AT chiukuanlin sparseternaryconvolutionalneuralnetworkmodelanditshardwaredesign
AT qiūguānlín sparseternaryconvolutionalneuralnetworkmodelanditshardwaredesign
AT chiukuanlin xīshūsānyuánjuǎnjīlèishénjīngwǎnglùmóxíngjíqíyìngtǐshèjì
AT qiūguānlín xīshūsānyuánjuǎnjīlèishénjīngwǎnglùmóxíngjíqíyìngtǐshèjì
_version_ 1719161762513682432