Optimizing neural network structures: faster speed, smaller size, less tuning

Deep neural networks have achieved tremendous success in many domains (e.g., computer vision~\cite{Alexnet12,vggnet15,fastrcnn15}, speech recognition~\cite{hinton2012deep,dahl2012context}, natural language processing~\cite{dahl2012context,collobert2011natural}, games~\cite{silver2017mastering,silver...

Full description

Bibliographic Details
Main Author:	Li, Zhe
Other Authors:	Yang, Tianbao, 1971-
Format:	Others
Language:	English
Published:	University of Iowa 2018
Subjects:	Computational cost Dropout Ecologically-Inspired Genetic approach Model size Neural network Computer Sciences
Online Access:	https://ir.uiowa.edu/etd/6460 https://ir.uiowa.edu/cgi/viewcontent.cgi?article=7960&context=etd

id	ndltd-uiowa.edu-oai-ir.uiowa.edu-etd-7960
record_format	oai_dc
collection	NDLTD
language	English
format	Others
sources	NDLTD
topic	Computational cost Dropout Ecologically-Inspired Genetic approach Model size Neural network Computer Sciences
spellingShingle	Computational cost Dropout Ecologically-Inspired Genetic approach Model size Neural network Computer Sciences Li, Zhe Optimizing neural network structures: faster speed, smaller size, less tuning
description	Deep neural networks have achieved tremendous success in many domains (e.g., computer vision~\cite{Alexnet12,vggnet15,fastrcnn15}, speech recognition~\cite{hinton2012deep,dahl2012context}, natural language processing~\cite{dahl2012context,collobert2011natural}, games~\cite{silver2017mastering,silver2016mastering}), however, there are still many challenges in deep learning comunity such as how to speed up training large deep neural networks, how to compress large nerual networks for mobile/embed device without performance loss, how to automatically design the optimal network structures for a certain task, and how to further design the optimal networks with improved performance and certain model size with reduced computation cost. To speed up training large neural networks, we propose to use multinomial sampling for dropout, i.e., sampling features or neurons according to a multinomial distribution with different probabilities for different features/neurons. To exhibit the optimal dropout probabilities, we analyze the shallow learning with multinomial dropout and establish the risk bound for stochastic optimization. By minimizing a sampling dependent factor in the risk bound, we obtain a distribution-dependent dropout with sampling probabilities dependent on the second order statistics of the data distribution. To tackle the issue of evolving distribution of neurons in deep learning, we propose an efficient adaptive dropout (named evolutional dropout) that computes the sampling probabilities on-the-fly from a mini-batch of examples. To compress large neural network structures, we propose a simple yet powerful method for compressing the size of deep Convolutional Neural Networks (CNNs) based on parameter binarization. The striking difference from most previous work on parameter binarization/quantization lies at different treatments of $1\times 1$ convolutions and $k\times k$ convolutions ($k>1$), where we only binarize $k\times k$ convolutions into binary patterns. By doing this, we show that previous deep CNNs such as GoogLeNet and Inception-type Nets can be compressed dramatically with marginal drop in performance. Second, in light of the different functionalities of $1\times 1$ (data projection/transformation) and $k\times k$ convolutions (pattern extraction), we propose a new block structure codenamed the pattern residual block that adds transformed feature maps generated by $1\times 1$ convolutions to the pattern feature maps generated by $k\times k$ convolutions, based on which we design a small network with $\sim 1$ million parameters. Combining with our parameter binarization, we achieve better performance on ImageNet than using similar sized networks including recently released Google MobileNets. To automatically design neural networks, we study how to design a genetic programming approach for optimizing the structure of a CNN for a given task under limited computational resources yet without imposing strong restrictions on the search space. To reduce the computational costs, we propose two general strategies that are observed to be helpful: (i) aggressively selecting strongest individuals for survival and reproduction, and killing weaker individuals at a very early age; (ii) increasing mutation frequency to encourage diversity and faster evolution. The combined strategy with additional optimization techniques allows us to explore a large search space but with affordable computational costs. To further design the optimal networks with improved performance and certain model size under reduced computation cost, we propose an ecologically inspired genetic approach for neural network structure search , that includes two types of succession: primary and secondary succession as well as accelerated extinction. Specifically, we first use primary succession to rapidly evolve a community of poor initialized neural network structures into a more diverse community, followed by a secondary succession stage for fine-grained searching based on the networks from the primary succession. Accelerated extinction is applied in both stages to reduce computational cost. In addition, we also introduce the gene duplication to further utilize the novel block of layers that appeared in the discovered network structure.
author2	Yang, Tianbao, 1971-
author_facet	Yang, Tianbao, 1971- Li, Zhe
author	Li, Zhe
author_sort	Li, Zhe
title	Optimizing neural network structures: faster speed, smaller size, less tuning
title_short	Optimizing neural network structures: faster speed, smaller size, less tuning
title_full	Optimizing neural network structures: faster speed, smaller size, less tuning
title_fullStr	Optimizing neural network structures: faster speed, smaller size, less tuning
title_full_unstemmed	Optimizing neural network structures: faster speed, smaller size, less tuning
title_sort	optimizing neural network structures: faster speed, smaller size, less tuning
publisher	University of Iowa
publishDate	2018
url	https://ir.uiowa.edu/etd/6460 https://ir.uiowa.edu/cgi/viewcontent.cgi?article=7960&context=etd
work_keys_str_mv	AT lizhe optimizingneuralnetworkstructuresfasterspeedsmallersizelesstuning
_version_	1719290806092693504
spelling	ndltd-uiowa.edu-oai-ir.uiowa.edu-etd-79602019-11-13T09:14:15Z Optimizing neural network structures: faster speed, smaller size, less tuning Li, Zhe Deep neural networks have achieved tremendous success in many domains (e.g., computer vision~\cite{Alexnet12,vggnet15,fastrcnn15}, speech recognition~\cite{hinton2012deep,dahl2012context}, natural language processing~\cite{dahl2012context,collobert2011natural}, games~\cite{silver2017mastering,silver2016mastering}), however, there are still many challenges in deep learning comunity such as how to speed up training large deep neural networks, how to compress large nerual networks for mobile/embed device without performance loss, how to automatically design the optimal network structures for a certain task, and how to further design the optimal networks with improved performance and certain model size with reduced computation cost. To speed up training large neural networks, we propose to use multinomial sampling for dropout, i.e., sampling features or neurons according to a multinomial distribution with different probabilities for different features/neurons. To exhibit the optimal dropout probabilities, we analyze the shallow learning with multinomial dropout and establish the risk bound for stochastic optimization. By minimizing a sampling dependent factor in the risk bound, we obtain a distribution-dependent dropout with sampling probabilities dependent on the second order statistics of the data distribution. To tackle the issue of evolving distribution of neurons in deep learning, we propose an efficient adaptive dropout (named evolutional dropout) that computes the sampling probabilities on-the-fly from a mini-batch of examples. To compress large neural network structures, we propose a simple yet powerful method for compressing the size of deep Convolutional Neural Networks (CNNs) based on parameter binarization. The striking difference from most previous work on parameter binarization/quantization lies at different treatments of $1\times 1$ convolutions and $k\times k$ convolutions ($k>1$), where we only binarize $k\times k$ convolutions into binary patterns. By doing this, we show that previous deep CNNs such as GoogLeNet and Inception-type Nets can be compressed dramatically with marginal drop in performance. Second, in light of the different functionalities of $1\times 1$ (data projection/transformation) and $k\times k$ convolutions (pattern extraction), we propose a new block structure codenamed the pattern residual block that adds transformed feature maps generated by $1\times 1$ convolutions to the pattern feature maps generated by $k\times k$ convolutions, based on which we design a small network with $\sim 1$ million parameters. Combining with our parameter binarization, we achieve better performance on ImageNet than using similar sized networks including recently released Google MobileNets. To automatically design neural networks, we study how to design a genetic programming approach for optimizing the structure of a CNN for a given task under limited computational resources yet without imposing strong restrictions on the search space. To reduce the computational costs, we propose two general strategies that are observed to be helpful: (i) aggressively selecting strongest individuals for survival and reproduction, and killing weaker individuals at a very early age; (ii) increasing mutation frequency to encourage diversity and faster evolution. The combined strategy with additional optimization techniques allows us to explore a large search space but with affordable computational costs. To further design the optimal networks with improved performance and certain model size under reduced computation cost, we propose an ecologically inspired genetic approach for neural network structure search , that includes two types of succession: primary and secondary succession as well as accelerated extinction. Specifically, we first use primary succession to rapidly evolve a community of poor initialized neural network structures into a more diverse community, followed by a secondary succession stage for fine-grained searching based on the networks from the primary succession. Accelerated extinction is applied in both stages to reduce computational cost. In addition, we also introduce the gene duplication to further utilize the novel block of layers that appeared in the discovered network structure. 2018-01-01T08:00:00Z dissertation application/pdf https://ir.uiowa.edu/etd/6460 https://ir.uiowa.edu/cgi/viewcontent.cgi?article=7960&context=etd Copyright © 2018 Zhe Li Theses and Dissertations eng University of IowaYang, Tianbao, 1971- Computational cost Dropout Ecologically-Inspired Genetic approach Model size Neural network Computer Sciences

Optimizing neural network structures: faster speed, smaller size, less tuning

Similar Items