McDRAM v2: In-Dynamic Random Access Memory Systolic Array Accelerator to Address the Large Model Problem in Deep Neural Networks on the Edge

The energy efficiency of accelerating hundreds of MB-large deep neural networks (DNNs) in a mobile environment is less than that of a server-class big chip accelerator because of the limited power budget, silicon area, and smaller buffer size of static random access memory associated with mobile sys...

Full description

Bibliographic Details
Main Authors:	Seunghwan Cho, Haerang Choi, Eunhyeok Park, Hyunsung Shin, Sungjoo Yoo
Format:	Article
Language:	English
Published:	IEEE 2020-01-01
Series:	IEEE Access
Subjects:	Accelerator convolutional neural network deep neural network dynamic random access memory edge inference multi-layer perceptron
Online Access:	https://ieeexplore.ieee.org/document/9146167/

id	doaj-2483953c4df9456586dedd6f0d2eba5e
record_format	Article
spelling	doaj-2483953c4df9456586dedd6f0d2eba5e2021-03-30T03:24:33ZengIEEEIEEE Access2169-35362020-01-01813522313524310.1109/ACCESS.2020.30112659146167McDRAM v2: In-Dynamic Random Access Memory Systolic Array Accelerator to Address the Large Model Problem in Deep Neural Networks on the EdgeSeunghwan Cho0https://orcid.org/0000-0001-5373-329XHaerang Choi1Eunhyeok Park2https://orcid.org/0000-0002-7331-9819Hyunsung Shin3https://orcid.org/0000-0001-8104-3368Sungjoo Yoo4https://orcid.org/0000-0002-5853-0675Department of Computer Science and Engineering, Seoul National University, Seoul, South KoreaDepartment of Computer Science and Engineering, Seoul National University, Seoul, South KoreaInter-University Semiconductor Research Center (ISRC), Seoul National University, Seoul, South KoreaMemory Division, Samsung Electronics, Hwaseong, South KoreaDepartment of Computer Science and Engineering, Seoul National University, Seoul, South KoreaThe energy efficiency of accelerating hundreds of MB-large deep neural networks (DNNs) in a mobile environment is less than that of a server-class big chip accelerator because of the limited power budget, silicon area, and smaller buffer size of static random access memory associated with mobile systems. To address this challenge and provide powerful computing capability for processing large DNN models in power/resource-limited mobile systems, we propose McDRAM v2, which is a novel in-dynamic random access memory (DRAM) systolic array accelerator architecture. McDRAM v2 makes the best use of large in-DRAM bandwidths for accelerating various DNN applications. It can handle large DNN models without off-chip memory accesses, in a fast and efficient manner, by exposing the large DRAM capacity and large in-DRAM bandwidth directly to an input systolic array of a processing element matrix. Additionally, it maximizes data reuse using a systolic multiply-accumulate (MAC) structure. The proposed architecture maximizes the utilization of large-scale MAC units by judiciously exploiting the DRAM's internal bus and buffer structure. An evaluation of large DNN models in the fields of image classification, natural language processing, and recommendation systems shows that it achieves 1.7 times tera operations per second (TOPS), 3.7 times TOPS/watt, and 8.6 times TOPS/mm<sup>2</sup> improvements over a state-of-the-art mobile graphics processing unit accelerator, and 4.1 times better energy efficiency over a state-of-the-art server-class accelerator. Moreover, it incurs a minimal overhead, i.e., a 9.7% increase in area, and uses less than 4.4 W of peak operating power.https://ieeexplore.ieee.org/document/9146167/Acceleratorconvolutional neural networkdeep neural networkdynamic random access memoryedge inferencemulti-layer perceptron
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Seunghwan Cho Haerang Choi Eunhyeok Park Hyunsung Shin Sungjoo Yoo
spellingShingle	Seunghwan Cho Haerang Choi Eunhyeok Park Hyunsung Shin Sungjoo Yoo McDRAM v2: In-Dynamic Random Access Memory Systolic Array Accelerator to Address the Large Model Problem in Deep Neural Networks on the Edge IEEE Access Accelerator convolutional neural network deep neural network dynamic random access memory edge inference multi-layer perceptron
author_facet	Seunghwan Cho Haerang Choi Eunhyeok Park Hyunsung Shin Sungjoo Yoo
author_sort	Seunghwan Cho
title	McDRAM v2: In-Dynamic Random Access Memory Systolic Array Accelerator to Address the Large Model Problem in Deep Neural Networks on the Edge
title_short	McDRAM v2: In-Dynamic Random Access Memory Systolic Array Accelerator to Address the Large Model Problem in Deep Neural Networks on the Edge
title_full	McDRAM v2: In-Dynamic Random Access Memory Systolic Array Accelerator to Address the Large Model Problem in Deep Neural Networks on the Edge
title_fullStr	McDRAM v2: In-Dynamic Random Access Memory Systolic Array Accelerator to Address the Large Model Problem in Deep Neural Networks on the Edge
title_full_unstemmed	McDRAM v2: In-Dynamic Random Access Memory Systolic Array Accelerator to Address the Large Model Problem in Deep Neural Networks on the Edge
title_sort	mcdram v2: in-dynamic random access memory systolic array accelerator to address the large model problem in deep neural networks on the edge
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2020-01-01
description	The energy efficiency of accelerating hundreds of MB-large deep neural networks (DNNs) in a mobile environment is less than that of a server-class big chip accelerator because of the limited power budget, silicon area, and smaller buffer size of static random access memory associated with mobile systems. To address this challenge and provide powerful computing capability for processing large DNN models in power/resource-limited mobile systems, we propose McDRAM v2, which is a novel in-dynamic random access memory (DRAM) systolic array accelerator architecture. McDRAM v2 makes the best use of large in-DRAM bandwidths for accelerating various DNN applications. It can handle large DNN models without off-chip memory accesses, in a fast and efficient manner, by exposing the large DRAM capacity and large in-DRAM bandwidth directly to an input systolic array of a processing element matrix. Additionally, it maximizes data reuse using a systolic multiply-accumulate (MAC) structure. The proposed architecture maximizes the utilization of large-scale MAC units by judiciously exploiting the DRAM's internal bus and buffer structure. An evaluation of large DNN models in the fields of image classification, natural language processing, and recommendation systems shows that it achieves 1.7 times tera operations per second (TOPS), 3.7 times TOPS/watt, and 8.6 times TOPS/mm<sup>2</sup> improvements over a state-of-the-art mobile graphics processing unit accelerator, and 4.1 times better energy efficiency over a state-of-the-art server-class accelerator. Moreover, it incurs a minimal overhead, i.e., a 9.7% increase in area, and uses less than 4.4 W of peak operating power.
topic	Accelerator convolutional neural network deep neural network dynamic random access memory edge inference multi-layer perceptron
url	https://ieeexplore.ieee.org/document/9146167/
work_keys_str_mv	AT seunghwancho mcdramv2indynamicrandomaccessmemorysystolicarrayacceleratortoaddressthelargemodelproblemindeepneuralnetworksontheedge AT haerangchoi mcdramv2indynamicrandomaccessmemorysystolicarrayacceleratortoaddressthelargemodelproblemindeepneuralnetworksontheedge AT eunhyeokpark mcdramv2indynamicrandomaccessmemorysystolicarrayacceleratortoaddressthelargemodelproblemindeepneuralnetworksontheedge AT hyunsungshin mcdramv2indynamicrandomaccessmemorysystolicarrayacceleratortoaddressthelargemodelproblemindeepneuralnetworksontheedge AT sungjooyoo mcdramv2indynamicrandomaccessmemorysystolicarrayacceleratortoaddressthelargemodelproblemindeepneuralnetworksontheedge
_version_	1724183493472157696

McDRAM v2: In-Dynamic Random Access Memory Systolic Array Accelerator to Address the Large Model Problem in Deep Neural Networks on the Edge

Similar Items