Optimizing neural network structures: faster speed, smaller size, less tuning

Deep neural networks have achieved tremendous success in many domains (e.g., computer vision~\cite{Alexnet12,vggnet15,fastrcnn15}, speech recognition~\cite{hinton2012deep,dahl2012context}, natural language processing~\cite{dahl2012context,collobert2011natural}, games~\cite{silver2017mastering,silver...

Full description

Bibliographic Details
Main Author: Li, Zhe
Other Authors: Yang, Tianbao, 1971-
Format: Others
Language:English
Published: University of Iowa 2018
Subjects:
Online Access:https://ir.uiowa.edu/etd/6460
https://ir.uiowa.edu/cgi/viewcontent.cgi?article=7960&context=etd
id ndltd-uiowa.edu-oai-ir.uiowa.edu-etd-7960
record_format oai_dc
collection NDLTD
language English
format Others
sources NDLTD
topic Computational cost
Dropout
Ecologically-Inspired
Genetic approach
Model size
Neural network
Computer Sciences
spellingShingle Computational cost
Dropout
Ecologically-Inspired
Genetic approach
Model size
Neural network
Computer Sciences
Li, Zhe
Optimizing neural network structures: faster speed, smaller size, less tuning
description Deep neural networks have achieved tremendous success in many domains (e.g., computer vision~\cite{Alexnet12,vggnet15,fastrcnn15}, speech recognition~\cite{hinton2012deep,dahl2012context}, natural language processing~\cite{dahl2012context,collobert2011natural}, games~\cite{silver2017mastering,silver2016mastering}), however, there are still many challenges in deep learning comunity such as how to speed up training large deep neural networks, how to compress large nerual networks for mobile/embed device without performance loss, how to automatically design the optimal network structures for a certain task, and how to further design the optimal networks with improved performance and certain model size with reduced computation cost. To speed up training large neural networks, we propose to use multinomial sampling for dropout, i.e., sampling features or neurons according to a multinomial distribution with different probabilities for different features/neurons. To exhibit the optimal dropout probabilities, we analyze the shallow learning with multinomial dropout and establish the risk bound for stochastic optimization. By minimizing a sampling dependent factor in the risk bound, we obtain a distribution-dependent dropout with sampling probabilities dependent on the second order statistics of the data distribution. To tackle the issue of evolving distribution of neurons in deep learning, we propose an efficient adaptive dropout (named evolutional dropout) that computes the sampling probabilities on-the-fly from a mini-batch of examples. To compress large neural network structures, we propose a simple yet powerful method for compressing the size of deep Convolutional Neural Networks (CNNs) based on parameter binarization. The striking difference from most previous work on parameter binarization/quantization lies at different treatments of $1\times 1$ convolutions and $k\times k$ convolutions ($k>1$), where we only binarize $k\times k$ convolutions into binary patterns. By doing this, we show that previous deep CNNs such as GoogLeNet and Inception-type Nets can be compressed dramatically with marginal drop in performance. Second, in light of the different functionalities of $1\times 1$ (data projection/transformation) and $k\times k$ convolutions (pattern extraction), we propose a new block structure codenamed the pattern residual block that adds transformed feature maps generated by $1\times 1$ convolutions to the pattern feature maps generated by $k\times k$ convolutions, based on which we design a small network with $\sim 1$ million parameters. Combining with our parameter binarization, we achieve better performance on ImageNet than using similar sized networks including recently released Google MobileNets. To automatically design neural networks, we study how to design a genetic programming approach for optimizing the structure of a CNN for a given task under limited computational resources yet without imposing strong restrictions on the search space. To reduce the computational costs, we propose two general strategies that are observed to be helpful: (i) aggressively selecting strongest individuals for survival and reproduction, and killing weaker individuals at a very early age; (ii) increasing mutation frequency to encourage diversity and faster evolution. The combined strategy with additional optimization techniques allows us to explore a large search space but with affordable computational costs. To further design the optimal networks with improved performance and certain model size under reduced computation cost, we propose an ecologically inspired genetic approach for neural network structure search , that includes two types of succession: primary and secondary succession as well as accelerated extinction. Specifically, we first use primary succession to rapidly evolve a community of poor initialized neural network structures into a more diverse community, followed by a secondary succession stage for fine-grained searching based on the networks from the primary succession. Accelerated extinction is applied in both stages to reduce computational cost. In addition, we also introduce the gene duplication to further utilize the novel block of layers that appeared in the discovered network structure.
author2 Yang, Tianbao, 1971-
author_facet Yang, Tianbao, 1971-
Li, Zhe
author Li, Zhe
author_sort Li, Zhe
title Optimizing neural network structures: faster speed, smaller size, less tuning
title_short Optimizing neural network structures: faster speed, smaller size, less tuning
title_full Optimizing neural network structures: faster speed, smaller size, less tuning
title_fullStr Optimizing neural network structures: faster speed, smaller size, less tuning
title_full_unstemmed Optimizing neural network structures: faster speed, smaller size, less tuning
title_sort optimizing neural network structures: faster speed, smaller size, less tuning
publisher University of Iowa
publishDate 2018
url https://ir.uiowa.edu/etd/6460
https://ir.uiowa.edu/cgi/viewcontent.cgi?article=7960&context=etd
work_keys_str_mv AT lizhe optimizingneuralnetworkstructuresfasterspeedsmallersizelesstuning
_version_ 1719290806092693504
spelling ndltd-uiowa.edu-oai-ir.uiowa.edu-etd-79602019-11-13T09:14:15Z Optimizing neural network structures: faster speed, smaller size, less tuning Li, Zhe Deep neural networks have achieved tremendous success in many domains (e.g., computer vision~\cite{Alexnet12,vggnet15,fastrcnn15}, speech recognition~\cite{hinton2012deep,dahl2012context}, natural language processing~\cite{dahl2012context,collobert2011natural}, games~\cite{silver2017mastering,silver2016mastering}), however, there are still many challenges in deep learning comunity such as how to speed up training large deep neural networks, how to compress large nerual networks for mobile/embed device without performance loss, how to automatically design the optimal network structures for a certain task, and how to further design the optimal networks with improved performance and certain model size with reduced computation cost. To speed up training large neural networks, we propose to use multinomial sampling for dropout, i.e., sampling features or neurons according to a multinomial distribution with different probabilities for different features/neurons. To exhibit the optimal dropout probabilities, we analyze the shallow learning with multinomial dropout and establish the risk bound for stochastic optimization. By minimizing a sampling dependent factor in the risk bound, we obtain a distribution-dependent dropout with sampling probabilities dependent on the second order statistics of the data distribution. To tackle the issue of evolving distribution of neurons in deep learning, we propose an efficient adaptive dropout (named evolutional dropout) that computes the sampling probabilities on-the-fly from a mini-batch of examples. To compress large neural network structures, we propose a simple yet powerful method for compressing the size of deep Convolutional Neural Networks (CNNs) based on parameter binarization. The striking difference from most previous work on parameter binarization/quantization lies at different treatments of $1\times 1$ convolutions and $k\times k$ convolutions ($k>1$), where we only binarize $k\times k$ convolutions into binary patterns. By doing this, we show that previous deep CNNs such as GoogLeNet and Inception-type Nets can be compressed dramatically with marginal drop in performance. Second, in light of the different functionalities of $1\times 1$ (data projection/transformation) and $k\times k$ convolutions (pattern extraction), we propose a new block structure codenamed the pattern residual block that adds transformed feature maps generated by $1\times 1$ convolutions to the pattern feature maps generated by $k\times k$ convolutions, based on which we design a small network with $\sim 1$ million parameters. Combining with our parameter binarization, we achieve better performance on ImageNet than using similar sized networks including recently released Google MobileNets. To automatically design neural networks, we study how to design a genetic programming approach for optimizing the structure of a CNN for a given task under limited computational resources yet without imposing strong restrictions on the search space. To reduce the computational costs, we propose two general strategies that are observed to be helpful: (i) aggressively selecting strongest individuals for survival and reproduction, and killing weaker individuals at a very early age; (ii) increasing mutation frequency to encourage diversity and faster evolution. The combined strategy with additional optimization techniques allows us to explore a large search space but with affordable computational costs. To further design the optimal networks with improved performance and certain model size under reduced computation cost, we propose an ecologically inspired genetic approach for neural network structure search , that includes two types of succession: primary and secondary succession as well as accelerated extinction. Specifically, we first use primary succession to rapidly evolve a community of poor initialized neural network structures into a more diverse community, followed by a secondary succession stage for fine-grained searching based on the networks from the primary succession. Accelerated extinction is applied in both stages to reduce computational cost. In addition, we also introduce the gene duplication to further utilize the novel block of layers that appeared in the discovered network structure. 2018-01-01T08:00:00Z dissertation application/pdf https://ir.uiowa.edu/etd/6460 https://ir.uiowa.edu/cgi/viewcontent.cgi?article=7960&context=etd Copyright © 2018 Zhe Li Theses and Dissertations eng University of IowaYang, Tianbao, 1971- Computational cost Dropout Ecologically-Inspired Genetic approach Model size Neural network Computer Sciences