Summary: | 碩士 === 國立交通大學 === 電子研究所 === 106 === Convolutional neural networks(CNNs) blow up in the last few years. The
performance is impressive especially in the computer vision field. However, the
computation complexity of state-of-art models is very high. As the result, powerful
GPU is needed to compute CNN models.
Several works try to reduce the computation by quantizing weights and activations.
But quantizing models directly may have negative effect on accuracy. Thus, this thesis
propose a systematic method named progressive quantization to simplified models
when training. We could simplify weights from floating point to ternary values and
quantize activation from floating point to fixed point values when training models.
Besides, we also simplify batch normalization at proper time. Training models through
our method, the accuracy drops in ResNet-56 and DenseNet-40 are 1.61% and 3.9%
respectively in our experiment.
On the other side, this thesis also propose a compatible hardware. We import only
non-zero values to our accelerator by sparse matrix loading and group-sort and merge
method. In addition, we make good use of the ternary weights to replace multipliers to
multiplexers and shift operators. As for data reuse, this thesis propose input view
convolution to reduce the dependency between convolution inputs and propose PE
cooperation to calculation in high level parallel with few inputs in different output
feature maps.
At last, an implementation synthetized with TSMC 40nm process consumes
3.28M gate counts. ResNet-56 with CIFAR10 and ResNet-34 with ImageNet could
arrive 1684FPS and 80FPS respectively with the implementation under 500MHz clock
frequency.
|