Sparse Ternary Convolutional Neural Network Model and its Hardware Design
碩士 === 國立交通大學 === 電子研究所 === 106 === Convolutional neural networks(CNNs) blow up in the last few years. The performance is impressive especially in the computer vision field. However, the computation complexity of state-of-art models is very high. As the result, powerful GPU is ne...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2017
|
Online Access: | http://ndltd.ncl.edu.tw/handle/vkwg7p |
id |
ndltd-TW-106NCTU5428076 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-106NCTU54280762019-05-16T00:08:11Z http://ndltd.ncl.edu.tw/handle/vkwg7p Sparse Ternary Convolutional Neural Network Model and its Hardware Design 稀疏三元卷積類神經網路模型及其硬體設計 Chiu,Kuan-Lin 邱冠霖 碩士 國立交通大學 電子研究所 106 Convolutional neural networks(CNNs) blow up in the last few years. The performance is impressive especially in the computer vision field. However, the computation complexity of state-of-art models is very high. As the result, powerful GPU is needed to compute CNN models. Several works try to reduce the computation by quantizing weights and activations. But quantizing models directly may have negative effect on accuracy. Thus, this thesis propose a systematic method named progressive quantization to simplified models when training. We could simplify weights from floating point to ternary values and quantize activation from floating point to fixed point values when training models. Besides, we also simplify batch normalization at proper time. Training models through our method, the accuracy drops in ResNet-56 and DenseNet-40 are 1.61% and 3.9% respectively in our experiment. On the other side, this thesis also propose a compatible hardware. We import only non-zero values to our accelerator by sparse matrix loading and group-sort and merge method. In addition, we make good use of the ternary weights to replace multipliers to multiplexers and shift operators. As for data reuse, this thesis propose input view convolution to reduce the dependency between convolution inputs and propose PE cooperation to calculation in high level parallel with few inputs in different output feature maps. At last, an implementation synthetized with TSMC 40nm process consumes 3.28M gate counts. ResNet-56 with CIFAR10 and ResNet-34 with ImageNet could arrive 1684FPS and 80FPS respectively with the implementation under 500MHz clock frequency. 張添烜 2017 學位論文 ; thesis 68 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立交通大學 === 電子研究所 === 106 === Convolutional neural networks(CNNs) blow up in the last few years. The
performance is impressive especially in the computer vision field. However, the
computation complexity of state-of-art models is very high. As the result, powerful
GPU is needed to compute CNN models.
Several works try to reduce the computation by quantizing weights and activations.
But quantizing models directly may have negative effect on accuracy. Thus, this thesis
propose a systematic method named progressive quantization to simplified models
when training. We could simplify weights from floating point to ternary values and
quantize activation from floating point to fixed point values when training models.
Besides, we also simplify batch normalization at proper time. Training models through
our method, the accuracy drops in ResNet-56 and DenseNet-40 are 1.61% and 3.9%
respectively in our experiment.
On the other side, this thesis also propose a compatible hardware. We import only
non-zero values to our accelerator by sparse matrix loading and group-sort and merge
method. In addition, we make good use of the ternary weights to replace multipliers to
multiplexers and shift operators. As for data reuse, this thesis propose input view
convolution to reduce the dependency between convolution inputs and propose PE
cooperation to calculation in high level parallel with few inputs in different output
feature maps.
At last, an implementation synthetized with TSMC 40nm process consumes
3.28M gate counts. ResNet-56 with CIFAR10 and ResNet-34 with ImageNet could
arrive 1684FPS and 80FPS respectively with the implementation under 500MHz clock
frequency.
|
author2 |
張添烜 |
author_facet |
張添烜 Chiu,Kuan-Lin 邱冠霖 |
author |
Chiu,Kuan-Lin 邱冠霖 |
spellingShingle |
Chiu,Kuan-Lin 邱冠霖 Sparse Ternary Convolutional Neural Network Model and its Hardware Design |
author_sort |
Chiu,Kuan-Lin |
title |
Sparse Ternary Convolutional Neural Network Model and its Hardware Design |
title_short |
Sparse Ternary Convolutional Neural Network Model and its Hardware Design |
title_full |
Sparse Ternary Convolutional Neural Network Model and its Hardware Design |
title_fullStr |
Sparse Ternary Convolutional Neural Network Model and its Hardware Design |
title_full_unstemmed |
Sparse Ternary Convolutional Neural Network Model and its Hardware Design |
title_sort |
sparse ternary convolutional neural network model and its hardware design |
publishDate |
2017 |
url |
http://ndltd.ncl.edu.tw/handle/vkwg7p |
work_keys_str_mv |
AT chiukuanlin sparseternaryconvolutionalneuralnetworkmodelanditshardwaredesign AT qiūguānlín sparseternaryconvolutionalneuralnetworkmodelanditshardwaredesign AT chiukuanlin xīshūsānyuánjuǎnjīlèishénjīngwǎnglùmóxíngjíqíyìngtǐshèjì AT qiūguānlín xīshūsānyuánjuǎnjīlèishénjīngwǎnglùmóxíngjíqíyìngtǐshèjì |
_version_ |
1719161762513682432 |