Efficient Execution via Dynamic Network Slimming

碩士 === 國立交通大學 === 電子研究所 === 108 === Convolutional neural networks (CNN) reach the state-of-the-art in computer vision. However, its huge computation and large model size cause that it is hard to execute on mobile and wearable devices with limited resources. Therefore, the model compression method an...

Full description

Bibliographic Details
Main Authors: Tseng, Yu-Che, 曾于哲
Other Authors: Chang, Tian-Sheuan
Format: Others
Language:en_US
Published: 2019
Online Access:http://ndltd.ncl.edu.tw/handle/m38494
Description
Summary:碩士 === 國立交通大學 === 電子研究所 === 108 === Convolutional neural networks (CNN) reach the state-of-the-art in computer vision. However, its huge computation and large model size cause that it is hard to execute on mobile and wearable devices with limited resources. Therefore, the model compression method and acceleration of model execution become one of the most salient researches. Conventional compression and acceleration methods pay efforts on removing the unimportant part in different scales, like weight pruning, filter pruning, and channel pruning. Nevertheless, the irreversible pruning methods result in permanent damage to the structure in the model. Consequently, the dynamic pruning method that differs from the difficulty of the classification task becomes relatively advantageous. In the past neural network research, different classes are hierarchically constructed by specific lower features. In a particular category, there is a myriad of low-level features useless, so we follow this idea to skip unnecessary features to accelerate the inference and to reduce the model size. Besides, theoretically, with the dynamic accelerating method that executes different substructures depending on different input images can reach a better performance than static pruning methods. In this paper, to measure the salience of features, we consider the absolute value of the scaling factor, gamma, in the batch normalization layer as the importance of the channels. Furthermore, we predict a threshold list with a tiny CNN model depending on different inputs and this list proves a threshold for each batch normalization layer. In each batch normalization layer, the channels with smaller-than-this-threshold gamma are skipped. During the training, we compute the expected pruning rate by the variance of predicted outputs, which increases with epochs. Besides, we increase the expected pruning rate by dividing by the average of the variance. Furthermore, we introduce a parameter, epoch_ratio, to force the expected pruning rate close to the target pruning rate. Besides, the threshold prediction network and the target model are trained with stochastic gradient descent, which is possible to find the best substructure that meets the target pruning rate during the inference. The simulation results show that this approach accelerates ResNet [1] by 2 to 5.49 times for CIFAR-10 [2], and only loses 0.94% accuracy. The execution of ResNet38 on the CIFAR-100 [2] with a larger number of categories also had a 1.67× acceleration, with a 1.81% accuracy drop. On M-CifarNet, compared to the FBS [3] method (3.93× acceleration, 0.87% accuracy reduction), it has better performance, 4.29 times acceleration and 0.33% accuracy drops. In the case of maintaining 90.50% accuracy, the conventional static pruning, Network Slimming, has a 1.429× acceleration and our approach has 2× acceleration. In addition, the threshold predictor brings in overhead costs in FLOPs and model size that does not exceed 1% at most.