Joint Optimization of Quantization and Structured Sparsity for Compressed Deep Neural Networks

abstract: Deep neural networks (DNN) have shown tremendous success in various cognitive tasks, such as image classification, speech recognition, etc. However, their usage on resource-constrained edge devices has been limited due to high computation and large memory requirement. To overcome these c...

Full description

Bibliographic Details
Other Authors: Srivastava, Gaurav (Author)
Format: Dissertation
Language:English
Published: 2018
Subjects:
Online Access:http://hdl.handle.net/2286/R.I.50451
id ndltd-asu.edu-item-50451
record_format oai_dc
spelling ndltd-asu.edu-item-504512018-10-02T03:01:04Z Joint Optimization of Quantization and Structured Sparsity for Compressed Deep Neural Networks abstract: Deep neural networks (DNN) have shown tremendous success in various cognitive tasks, such as image classification, speech recognition, etc. However, their usage on resource-constrained edge devices has been limited due to high computation and large memory requirement. To overcome these challenges, recent works have extensively investigated model compression techniques such as element-wise sparsity, structured sparsity and quantization. While most of these works have applied these compression techniques in isolation, there have been very few studies on application of quantization and structured sparsity together on a DNN model. This thesis co-optimizes structured sparsity and quantization constraints on DNN models during training. Specifically, it obtains optimal setting of 2-bit weight and 2-bit activation coupled with 4X structured compression by performing combined exploration of quantization and structured compression settings. The optimal DNN model achieves 50X weight memory reduction compared to floating-point uncompressed DNN. This memory saving is significant since applying only structured sparsity constraints achieves 2X memory savings and only quantization constraints achieves 16X memory savings. The algorithm has been validated on both high and low capacity DNNs and on wide-sparse and deep-sparse DNN models. Experiments demonstrated that deep-sparse DNN outperforms shallow-dense DNN with varying level of memory savings depending on DNN precision and sparsity levels. This work further proposed a Pareto-optimal approach to systematically extract optimal DNN models from a huge set of sparse and dense DNN models. The resulting 11 optimal designs were further evaluated by considering overall DNN memory which includes activation memory and weight memory. It was found that there is only a small change in the memory footprint of the optimal designs corresponding to the low sparsity DNNs. However, activation memory cannot be ignored for high sparsity DNNs. Dissertation/Thesis Srivastava, Gaurav (Author) Seo, Jae-Sun (Advisor) Chakrabarti, Chaitali (Committee member) Berisha, Visar (Committee member) Arizona State University (Publisher) Artificial intelligence Computer engineering Computer science Deep learning Deep Neural Networks DNN quantization DNN structured sparsity DNN weight memory Pareto-optimal eng 64 pages Masters Thesis Computer Engineering 2018 Masters Thesis http://hdl.handle.net/2286/R.I.50451 http://rightsstatements.org/vocab/InC/1.0/ 2018
collection NDLTD
language English
format Dissertation
sources NDLTD
topic Artificial intelligence
Computer engineering
Computer science
Deep learning
Deep Neural Networks
DNN quantization
DNN structured sparsity
DNN weight memory
Pareto-optimal
spellingShingle Artificial intelligence
Computer engineering
Computer science
Deep learning
Deep Neural Networks
DNN quantization
DNN structured sparsity
DNN weight memory
Pareto-optimal
Joint Optimization of Quantization and Structured Sparsity for Compressed Deep Neural Networks
description abstract: Deep neural networks (DNN) have shown tremendous success in various cognitive tasks, such as image classification, speech recognition, etc. However, their usage on resource-constrained edge devices has been limited due to high computation and large memory requirement. To overcome these challenges, recent works have extensively investigated model compression techniques such as element-wise sparsity, structured sparsity and quantization. While most of these works have applied these compression techniques in isolation, there have been very few studies on application of quantization and structured sparsity together on a DNN model. This thesis co-optimizes structured sparsity and quantization constraints on DNN models during training. Specifically, it obtains optimal setting of 2-bit weight and 2-bit activation coupled with 4X structured compression by performing combined exploration of quantization and structured compression settings. The optimal DNN model achieves 50X weight memory reduction compared to floating-point uncompressed DNN. This memory saving is significant since applying only structured sparsity constraints achieves 2X memory savings and only quantization constraints achieves 16X memory savings. The algorithm has been validated on both high and low capacity DNNs and on wide-sparse and deep-sparse DNN models. Experiments demonstrated that deep-sparse DNN outperforms shallow-dense DNN with varying level of memory savings depending on DNN precision and sparsity levels. This work further proposed a Pareto-optimal approach to systematically extract optimal DNN models from a huge set of sparse and dense DNN models. The resulting 11 optimal designs were further evaluated by considering overall DNN memory which includes activation memory and weight memory. It was found that there is only a small change in the memory footprint of the optimal designs corresponding to the low sparsity DNNs. However, activation memory cannot be ignored for high sparsity DNNs. === Dissertation/Thesis === Masters Thesis Computer Engineering 2018
author2 Srivastava, Gaurav (Author)
author_facet Srivastava, Gaurav (Author)
title Joint Optimization of Quantization and Structured Sparsity for Compressed Deep Neural Networks
title_short Joint Optimization of Quantization and Structured Sparsity for Compressed Deep Neural Networks
title_full Joint Optimization of Quantization and Structured Sparsity for Compressed Deep Neural Networks
title_fullStr Joint Optimization of Quantization and Structured Sparsity for Compressed Deep Neural Networks
title_full_unstemmed Joint Optimization of Quantization and Structured Sparsity for Compressed Deep Neural Networks
title_sort joint optimization of quantization and structured sparsity for compressed deep neural networks
publishDate 2018
url http://hdl.handle.net/2286/R.I.50451
_version_ 1718756991630835712