DMS: Dynamic Model Scaling for Quality-Aware Deep Learning Inference in Mobile and Embedded Devices

Recently, deep learning has brought revolutions to many mobile and embedded systems that interact with the physical world using continuous video streams. Although there have been significant efforts to reduce the computational overheads of deep learning inference in such systems, previous approaches...

Full description

Bibliographic Details
Main Authors:	Woochul Kang, Daeyeon Kim, Junyoung Park
Format:	Article
Language:	English
Published:	IEEE 2019-01-01
Series:	IEEE Access
Subjects:	Deep learning edge devices embedded systems energy efficiency feedback control filter pruning
Online Access:	https://ieeexplore.ieee.org/document/8907822/

id	doaj-57c106e5aea4419da51bdbd4112d77e8
record_format	Article
spelling	doaj-57c106e5aea4419da51bdbd4112d77e82021-03-30T00:55:45ZengIEEEIEEE Access2169-35362019-01-01716804816805910.1109/ACCESS.2019.29545468907822DMS: Dynamic Model Scaling for Quality-Aware Deep Learning Inference in Mobile and Embedded DevicesWoochul Kang0https://orcid.org/0000-0002-4757-8999Daeyeon Kim1Junyoung Park2Department of Embedded Systems Engineering, Incheon National University, Incheon, South KoreaDepartment of Embedded Systems Engineering, Incheon National University, Incheon, South KoreaDepartment of Embedded Systems Engineering, Incheon National University, Incheon, South KoreaRecently, deep learning has brought revolutions to many mobile and embedded systems that interact with the physical world using continuous video streams. Although there have been significant efforts to reduce the computational overheads of deep learning inference in such systems, previous approaches have focused on delivering `best-effort' performance, resulting in unpredictable performance under variable environments. In this paper, we propose a runtime control method, called DMS (Dynamic Model Scaling), that enables dynamic resource-accuracy trade-offs to support various QoS requirements of deep learning applications. In DMS, the resource demands of deep learning inference can be controlled by adaptive pruning of computation-intensive convolution filters. DMS avoids irregularity of pruned models by reorganizing filters according to their importance so that varying number of filters can be applied efficiently. Since DMS's pruning method incurs no runtime overhead and preserves the full capacity of original deep learning models, DMS can tailor the models at runtime for concurrent deep learning applications with their respective resource-accuracy trade-offs. We demonstrate the viability of DMS by implementing a prototype. The evaluation results demonstrate that, if properly coordinated with system level resource managers, DMS can support highly robust and efficient inference performance against unpredictable workloads.https://ieeexplore.ieee.org/document/8907822/Deep learningedge devicesembedded systemsenergy efficiencyfeedback controlfilter pruning
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Woochul Kang Daeyeon Kim Junyoung Park
spellingShingle	Woochul Kang Daeyeon Kim Junyoung Park DMS: Dynamic Model Scaling for Quality-Aware Deep Learning Inference in Mobile and Embedded Devices IEEE Access Deep learning edge devices embedded systems energy efficiency feedback control filter pruning
author_facet	Woochul Kang Daeyeon Kim Junyoung Park
author_sort	Woochul Kang
title	DMS: Dynamic Model Scaling for Quality-Aware Deep Learning Inference in Mobile and Embedded Devices
title_short	DMS: Dynamic Model Scaling for Quality-Aware Deep Learning Inference in Mobile and Embedded Devices
title_full	DMS: Dynamic Model Scaling for Quality-Aware Deep Learning Inference in Mobile and Embedded Devices
title_fullStr	DMS: Dynamic Model Scaling for Quality-Aware Deep Learning Inference in Mobile and Embedded Devices
title_full_unstemmed	DMS: Dynamic Model Scaling for Quality-Aware Deep Learning Inference in Mobile and Embedded Devices
title_sort	dms: dynamic model scaling for quality-aware deep learning inference in mobile and embedded devices
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2019-01-01
description	Recently, deep learning has brought revolutions to many mobile and embedded systems that interact with the physical world using continuous video streams. Although there have been significant efforts to reduce the computational overheads of deep learning inference in such systems, previous approaches have focused on delivering `best-effort' performance, resulting in unpredictable performance under variable environments. In this paper, we propose a runtime control method, called DMS (Dynamic Model Scaling), that enables dynamic resource-accuracy trade-offs to support various QoS requirements of deep learning applications. In DMS, the resource demands of deep learning inference can be controlled by adaptive pruning of computation-intensive convolution filters. DMS avoids irregularity of pruned models by reorganizing filters according to their importance so that varying number of filters can be applied efficiently. Since DMS's pruning method incurs no runtime overhead and preserves the full capacity of original deep learning models, DMS can tailor the models at runtime for concurrent deep learning applications with their respective resource-accuracy trade-offs. We demonstrate the viability of DMS by implementing a prototype. The evaluation results demonstrate that, if properly coordinated with system level resource managers, DMS can support highly robust and efficient inference performance against unpredictable workloads.
topic	Deep learning edge devices embedded systems energy efficiency feedback control filter pruning
url	https://ieeexplore.ieee.org/document/8907822/
work_keys_str_mv	AT woochulkang dmsdynamicmodelscalingforqualityawaredeeplearninginferenceinmobileandembeddeddevices AT daeyeonkim dmsdynamicmodelscalingforqualityawaredeeplearninginferenceinmobileandembeddeddevices AT junyoungpark dmsdynamicmodelscalingforqualityawaredeeplearninginferenceinmobileandembeddeddevices
_version_	1724187584064651264

DMS: Dynamic Model Scaling for Quality-Aware Deep Learning Inference in Mobile and Embedded Devices

Similar Items