Efficient Execution via Dynamic Network Slimming

碩士 === 國立交通大學 === 電子研究所 === 108 === Convolutional neural networks (CNN) reach the state-of-the-art in computer vision. However, its huge computation and large model size cause that it is hard to execute on mobile and wearable devices with limited resources. Therefore, the model compression method an...

Full description

Bibliographic Details
Main Authors:	Tseng, Yu-Che, 曾于哲
Other Authors:	Chang, Tian-Sheuan
Format:	Others
Language:	en_US
Published:	2019
Online Access:	http://ndltd.ncl.edu.tw/handle/m38494

id	ndltd-TW-108NCTU5428031
record_format	oai_dc
spelling	ndltd-TW-108NCTU54280312019-11-26T05:16:55Z http://ndltd.ncl.edu.tw/handle/m38494 Efficient Execution via Dynamic Network Slimming 動態網路精簡之高效執行研究 Tseng, Yu-Che 曾于哲碩士國立交通大學電子研究所 108 Convolutional neural networks (CNN) reach the state-of-the-art in computer vision. However, its huge computation and large model size cause that it is hard to execute on mobile and wearable devices with limited resources. Therefore, the model compression method and acceleration of model execution become one of the most salient researches. Conventional compression and acceleration methods pay efforts on removing the unimportant part in different scales, like weight pruning, filter pruning, and channel pruning. Nevertheless, the irreversible pruning methods result in permanent damage to the structure in the model. Consequently, the dynamic pruning method that differs from the difficulty of the classification task becomes relatively advantageous. In the past neural network research, different classes are hierarchically constructed by specific lower features. In a particular category, there is a myriad of low-level features useless, so we follow this idea to skip unnecessary features to accelerate the inference and to reduce the model size. Besides, theoretically, with the dynamic accelerating method that executes different substructures depending on different input images can reach a better performance than static pruning methods. In this paper, to measure the salience of features, we consider the absolute value of the scaling factor, gamma, in the batch normalization layer as the importance of the channels. Furthermore, we predict a threshold list with a tiny CNN model depending on different inputs and this list proves a threshold for each batch normalization layer. In each batch normalization layer, the channels with smaller-than-this-threshold gamma are skipped. During the training, we compute the expected pruning rate by the variance of predicted outputs, which increases with epochs. Besides, we increase the expected pruning rate by dividing by the average of the variance. Furthermore, we introduce a parameter, epoch_ratio, to force the expected pruning rate close to the target pruning rate. Besides, the threshold prediction network and the target model are trained with stochastic gradient descent, which is possible to find the best substructure that meets the target pruning rate during the inference. The simulation results show that this approach accelerates ResNet [1] by 2 to 5.49 times for CIFAR-10 [2], and only loses 0.94% accuracy. The execution of ResNet38 on the CIFAR-100 [2] with a larger number of categories also had a 1.67× acceleration, with a 1.81% accuracy drop. On M-CifarNet, compared to the FBS [3] method (3.93× acceleration, 0.87% accuracy reduction), it has better performance, 4.29 times acceleration and 0.33% accuracy drops. In the case of maintaining 90.50% accuracy, the conventional static pruning, Network Slimming, has a 1.429× acceleration and our approach has 2× acceleration. In addition, the threshold predictor brings in overhead costs in FLOPs and model size that does not exceed 1% at most. Chang, Tian-Sheuan 張添烜 2019 學位論文 ; thesis 51 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	碩士 === 國立交通大學 === 電子研究所 === 108 === Convolutional neural networks (CNN) reach the state-of-the-art in computer vision. However, its huge computation and large model size cause that it is hard to execute on mobile and wearable devices with limited resources. Therefore, the model compression method and acceleration of model execution become one of the most salient researches. Conventional compression and acceleration methods pay efforts on removing the unimportant part in different scales, like weight pruning, filter pruning, and channel pruning. Nevertheless, the irreversible pruning methods result in permanent damage to the structure in the model. Consequently, the dynamic pruning method that differs from the difficulty of the classification task becomes relatively advantageous. In the past neural network research, different classes are hierarchically constructed by specific lower features. In a particular category, there is a myriad of low-level features useless, so we follow this idea to skip unnecessary features to accelerate the inference and to reduce the model size. Besides, theoretically, with the dynamic accelerating method that executes different substructures depending on different input images can reach a better performance than static pruning methods. In this paper, to measure the salience of features, we consider the absolute value of the scaling factor, gamma, in the batch normalization layer as the importance of the channels. Furthermore, we predict a threshold list with a tiny CNN model depending on different inputs and this list proves a threshold for each batch normalization layer. In each batch normalization layer, the channels with smaller-than-this-threshold gamma are skipped. During the training, we compute the expected pruning rate by the variance of predicted outputs, which increases with epochs. Besides, we increase the expected pruning rate by dividing by the average of the variance. Furthermore, we introduce a parameter, epoch_ratio, to force the expected pruning rate close to the target pruning rate. Besides, the threshold prediction network and the target model are trained with stochastic gradient descent, which is possible to find the best substructure that meets the target pruning rate during the inference. The simulation results show that this approach accelerates ResNet [1] by 2 to 5.49 times for CIFAR-10 [2], and only loses 0.94% accuracy. The execution of ResNet38 on the CIFAR-100 [2] with a larger number of categories also had a 1.67× acceleration, with a 1.81% accuracy drop. On M-CifarNet, compared to the FBS [3] method (3.93× acceleration, 0.87% accuracy reduction), it has better performance, 4.29 times acceleration and 0.33% accuracy drops. In the case of maintaining 90.50% accuracy, the conventional static pruning, Network Slimming, has a 1.429× acceleration and our approach has 2× acceleration. In addition, the threshold predictor brings in overhead costs in FLOPs and model size that does not exceed 1% at most.
author2	Chang, Tian-Sheuan
author_facet	Chang, Tian-Sheuan Tseng, Yu-Che 曾于哲
author	Tseng, Yu-Che 曾于哲
spellingShingle	Tseng, Yu-Che 曾于哲 Efficient Execution via Dynamic Network Slimming
author_sort	Tseng, Yu-Che
title	Efficient Execution via Dynamic Network Slimming
title_short	Efficient Execution via Dynamic Network Slimming
title_full	Efficient Execution via Dynamic Network Slimming
title_fullStr	Efficient Execution via Dynamic Network Slimming
title_full_unstemmed	Efficient Execution via Dynamic Network Slimming
title_sort	efficient execution via dynamic network slimming
publishDate	2019
url	http://ndltd.ncl.edu.tw/handle/m38494
work_keys_str_mv	AT tsengyuche efficientexecutionviadynamicnetworkslimming AT céngyúzhé efficientexecutionviadynamicnetworkslimming AT tsengyuche dòngtàiwǎnglùjīngjiǎnzhīgāoxiàozhíxíngyánjiū AT céngyúzhé dòngtàiwǎnglùjīngjiǎnzhīgāoxiàozhíxíngyánjiū
_version_	1719296692052819968

Efficient Execution via Dynamic Network Slimming

Similar Items