Joint Optimization of Quantization and Structured Sparsity for Compressed Deep Neural Networks

abstract: Deep neural networks (DNN) have shown tremendous success in various cognitive tasks, such as image classification, speech recognition, etc. However, their usage on resource-constrained edge devices has been limited due to high computation and large memory requirement. To overcome these c...

Full description

Bibliographic Details
Other Authors:	Srivastava, Gaurav (Author)
Format:	Dissertation
Language:	English
Published:	2018
Subjects:	Artificial intelligence Computer engineering Computer science Deep learning Deep Neural Networks DNN quantization DNN structured sparsity DNN weight memory Pareto-optimal
Online Access:	http://hdl.handle.net/2286/R.I.50451

id	ndltd-asu.edu-item-50451
record_format	oai_dc
spelling	ndltd-asu.edu-item-504512018-10-02T03:01:04Z Joint Optimization of Quantization and Structured Sparsity for Compressed Deep Neural Networks abstract: Deep neural networks (DNN) have shown tremendous success in various cognitive tasks, such as image classification, speech recognition, etc. However, their usage on resource-constrained edge devices has been limited due to high computation and large memory requirement. To overcome these challenges, recent works have extensively investigated model compression techniques such as element-wise sparsity, structured sparsity and quantization. While most of these works have applied these compression techniques in isolation, there have been very few studies on application of quantization and structured sparsity together on a DNN model. This thesis co-optimizes structured sparsity and quantization constraints on DNN models during training. Specifically, it obtains optimal setting of 2-bit weight and 2-bit activation coupled with 4X structured compression by performing combined exploration of quantization and structured compression settings. The optimal DNN model achieves 50X weight memory reduction compared to floating-point uncompressed DNN. This memory saving is significant since applying only structured sparsity constraints achieves 2X memory savings and only quantization constraints achieves 16X memory savings. The algorithm has been validated on both high and low capacity DNNs and on wide-sparse and deep-sparse DNN models. Experiments demonstrated that deep-sparse DNN outperforms shallow-dense DNN with varying level of memory savings depending on DNN precision and sparsity levels. This work further proposed a Pareto-optimal approach to systematically extract optimal DNN models from a huge set of sparse and dense DNN models. The resulting 11 optimal designs were further evaluated by considering overall DNN memory which includes activation memory and weight memory. It was found that there is only a small change in the memory footprint of the optimal designs corresponding to the low sparsity DNNs. However, activation memory cannot be ignored for high sparsity DNNs. Dissertation/Thesis Srivastava, Gaurav (Author) Seo, Jae-Sun (Advisor) Chakrabarti, Chaitali (Committee member) Berisha, Visar (Committee member) Arizona State University (Publisher) Artificial intelligence Computer engineering Computer science Deep learning Deep Neural Networks DNN quantization DNN structured sparsity DNN weight memory Pareto-optimal eng 64 pages Masters Thesis Computer Engineering 2018 Masters Thesis http://hdl.handle.net/2286/R.I.50451 http://rightsstatements.org/vocab/InC/1.0/ 2018
collection	NDLTD
language	English
format	Dissertation
sources	NDLTD
topic	Artificial intelligence Computer engineering Computer science Deep learning Deep Neural Networks DNN quantization DNN structured sparsity DNN weight memory Pareto-optimal
spellingShingle	Artificial intelligence Computer engineering Computer science Deep learning Deep Neural Networks DNN quantization DNN structured sparsity DNN weight memory Pareto-optimal Joint Optimization of Quantization and Structured Sparsity for Compressed Deep Neural Networks
description	abstract: Deep neural networks (DNN) have shown tremendous success in various cognitive tasks, such as image classification, speech recognition, etc. However, their usage on resource-constrained edge devices has been limited due to high computation and large memory requirement. To overcome these challenges, recent works have extensively investigated model compression techniques such as element-wise sparsity, structured sparsity and quantization. While most of these works have applied these compression techniques in isolation, there have been very few studies on application of quantization and structured sparsity together on a DNN model. This thesis co-optimizes structured sparsity and quantization constraints on DNN models during training. Specifically, it obtains optimal setting of 2-bit weight and 2-bit activation coupled with 4X structured compression by performing combined exploration of quantization and structured compression settings. The optimal DNN model achieves 50X weight memory reduction compared to floating-point uncompressed DNN. This memory saving is significant since applying only structured sparsity constraints achieves 2X memory savings and only quantization constraints achieves 16X memory savings. The algorithm has been validated on both high and low capacity DNNs and on wide-sparse and deep-sparse DNN models. Experiments demonstrated that deep-sparse DNN outperforms shallow-dense DNN with varying level of memory savings depending on DNN precision and sparsity levels. This work further proposed a Pareto-optimal approach to systematically extract optimal DNN models from a huge set of sparse and dense DNN models. The resulting 11 optimal designs were further evaluated by considering overall DNN memory which includes activation memory and weight memory. It was found that there is only a small change in the memory footprint of the optimal designs corresponding to the low sparsity DNNs. However, activation memory cannot be ignored for high sparsity DNNs. === Dissertation/Thesis === Masters Thesis Computer Engineering 2018
author2	Srivastava, Gaurav (Author)
author_facet	Srivastava, Gaurav (Author)
title	Joint Optimization of Quantization and Structured Sparsity for Compressed Deep Neural Networks
title_short	Joint Optimization of Quantization and Structured Sparsity for Compressed Deep Neural Networks
title_full	Joint Optimization of Quantization and Structured Sparsity for Compressed Deep Neural Networks
title_fullStr	Joint Optimization of Quantization and Structured Sparsity for Compressed Deep Neural Networks
title_full_unstemmed	Joint Optimization of Quantization and Structured Sparsity for Compressed Deep Neural Networks
title_sort	joint optimization of quantization and structured sparsity for compressed deep neural networks
publishDate	2018
url	http://hdl.handle.net/2286/R.I.50451
_version_	1718756991630835712

Joint Optimization of Quantization and Structured Sparsity for Compressed Deep Neural Networks

Similar Items