Efficient Execution via Dynamic Network Slimming

碩士 === 國立交通大學 === 電子研究所 === 108 === Convolutional neural networks (CNN) reach the state-of-the-art in computer vision. However, its huge computation and large model size cause that it is hard to execute on mobile and wearable devices with limited resources. Therefore, the model compression method an...

Full description

Bibliographic Details
Main Authors: Tseng, Yu-Che, 曾于哲
Other Authors: Chang, Tian-Sheuan
Format: Others
Language:en_US
Published: 2019
Online Access:http://ndltd.ncl.edu.tw/handle/m38494
id ndltd-TW-108NCTU5428031
record_format oai_dc
spelling ndltd-TW-108NCTU54280312019-11-26T05:16:55Z http://ndltd.ncl.edu.tw/handle/m38494 Efficient Execution via Dynamic Network Slimming 動態網路精簡之高效執行研究 Tseng, Yu-Che 曾于哲 碩士 國立交通大學 電子研究所 108 Convolutional neural networks (CNN) reach the state-of-the-art in computer vision. However, its huge computation and large model size cause that it is hard to execute on mobile and wearable devices with limited resources. Therefore, the model compression method and acceleration of model execution become one of the most salient researches. Conventional compression and acceleration methods pay efforts on removing the unimportant part in different scales, like weight pruning, filter pruning, and channel pruning. Nevertheless, the irreversible pruning methods result in permanent damage to the structure in the model. Consequently, the dynamic pruning method that differs from the difficulty of the classification task becomes relatively advantageous. In the past neural network research, different classes are hierarchically constructed by specific lower features. In a particular category, there is a myriad of low-level features useless, so we follow this idea to skip unnecessary features to accelerate the inference and to reduce the model size. Besides, theoretically, with the dynamic accelerating method that executes different substructures depending on different input images can reach a better performance than static pruning methods. In this paper, to measure the salience of features, we consider the absolute value of the scaling factor, gamma, in the batch normalization layer as the importance of the channels. Furthermore, we predict a threshold list with a tiny CNN model depending on different inputs and this list proves a threshold for each batch normalization layer. In each batch normalization layer, the channels with smaller-than-this-threshold gamma are skipped. During the training, we compute the expected pruning rate by the variance of predicted outputs, which increases with epochs. Besides, we increase the expected pruning rate by dividing by the average of the variance. Furthermore, we introduce a parameter, epoch_ratio, to force the expected pruning rate close to the target pruning rate. Besides, the threshold prediction network and the target model are trained with stochastic gradient descent, which is possible to find the best substructure that meets the target pruning rate during the inference. The simulation results show that this approach accelerates ResNet [1] by 2 to 5.49 times for CIFAR-10 [2], and only loses 0.94% accuracy. The execution of ResNet38 on the CIFAR-100 [2] with a larger number of categories also had a 1.67× acceleration, with a 1.81% accuracy drop. On M-CifarNet, compared to the FBS [3] method (3.93× acceleration, 0.87% accuracy reduction), it has better performance, 4.29 times acceleration and 0.33% accuracy drops. In the case of maintaining 90.50% accuracy, the conventional static pruning, Network Slimming, has a 1.429× acceleration and our approach has 2× acceleration. In addition, the threshold predictor brings in overhead costs in FLOPs and model size that does not exceed 1% at most. Chang, Tian-Sheuan 張添烜 2019 學位論文 ; thesis 51 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立交通大學 === 電子研究所 === 108 === Convolutional neural networks (CNN) reach the state-of-the-art in computer vision. However, its huge computation and large model size cause that it is hard to execute on mobile and wearable devices with limited resources. Therefore, the model compression method and acceleration of model execution become one of the most salient researches. Conventional compression and acceleration methods pay efforts on removing the unimportant part in different scales, like weight pruning, filter pruning, and channel pruning. Nevertheless, the irreversible pruning methods result in permanent damage to the structure in the model. Consequently, the dynamic pruning method that differs from the difficulty of the classification task becomes relatively advantageous. In the past neural network research, different classes are hierarchically constructed by specific lower features. In a particular category, there is a myriad of low-level features useless, so we follow this idea to skip unnecessary features to accelerate the inference and to reduce the model size. Besides, theoretically, with the dynamic accelerating method that executes different substructures depending on different input images can reach a better performance than static pruning methods. In this paper, to measure the salience of features, we consider the absolute value of the scaling factor, gamma, in the batch normalization layer as the importance of the channels. Furthermore, we predict a threshold list with a tiny CNN model depending on different inputs and this list proves a threshold for each batch normalization layer. In each batch normalization layer, the channels with smaller-than-this-threshold gamma are skipped. During the training, we compute the expected pruning rate by the variance of predicted outputs, which increases with epochs. Besides, we increase the expected pruning rate by dividing by the average of the variance. Furthermore, we introduce a parameter, epoch_ratio, to force the expected pruning rate close to the target pruning rate. Besides, the threshold prediction network and the target model are trained with stochastic gradient descent, which is possible to find the best substructure that meets the target pruning rate during the inference. The simulation results show that this approach accelerates ResNet [1] by 2 to 5.49 times for CIFAR-10 [2], and only loses 0.94% accuracy. The execution of ResNet38 on the CIFAR-100 [2] with a larger number of categories also had a 1.67× acceleration, with a 1.81% accuracy drop. On M-CifarNet, compared to the FBS [3] method (3.93× acceleration, 0.87% accuracy reduction), it has better performance, 4.29 times acceleration and 0.33% accuracy drops. In the case of maintaining 90.50% accuracy, the conventional static pruning, Network Slimming, has a 1.429× acceleration and our approach has 2× acceleration. In addition, the threshold predictor brings in overhead costs in FLOPs and model size that does not exceed 1% at most.
author2 Chang, Tian-Sheuan
author_facet Chang, Tian-Sheuan
Tseng, Yu-Che
曾于哲
author Tseng, Yu-Che
曾于哲
spellingShingle Tseng, Yu-Che
曾于哲
Efficient Execution via Dynamic Network Slimming
author_sort Tseng, Yu-Che
title Efficient Execution via Dynamic Network Slimming
title_short Efficient Execution via Dynamic Network Slimming
title_full Efficient Execution via Dynamic Network Slimming
title_fullStr Efficient Execution via Dynamic Network Slimming
title_full_unstemmed Efficient Execution via Dynamic Network Slimming
title_sort efficient execution via dynamic network slimming
publishDate 2019
url http://ndltd.ncl.edu.tw/handle/m38494
work_keys_str_mv AT tsengyuche efficientexecutionviadynamicnetworkslimming
AT céngyúzhé efficientexecutionviadynamicnetworkslimming
AT tsengyuche dòngtàiwǎnglùjīngjiǎnzhīgāoxiàozhíxíngyánjiū
AT céngyúzhé dòngtàiwǎnglùjīngjiǎnzhīgāoxiàozhíxíngyánjiū
_version_ 1719296692052819968