High Efficiency Accelerator for Deep Convolutional Neural Network by Using High-Level-Synthesis Design Flow

碩士 === 逢甲大學 === 電子工程學系 === 107 === This paper uses the hottest deep learning in recent years to detect objects, including cars, trucks, locomotives, and pedestrians. This study is divided into two parts, the training model and the hardware implementation. The training model uses compression techniqu...

Full description

Bibliographic Details
Main Authors: Gu, Wen-Sheng, 辜玟勝
Other Authors: Chen, Kuan-Hung
Format: Others
Language:zh-TW
Published: 2019
Online Access:http://ndltd.ncl.edu.tw/handle/tqdb4m
Description
Summary:碩士 === 逢甲大學 === 電子工程學系 === 107 === This paper uses the hottest deep learning in recent years to detect objects, including cars, trucks, locomotives, and pedestrians. This study is divided into two parts, the training model and the hardware implementation. The training model uses compression techniques to reduce the number of parameters and increase the pedestrian sample to enhance the AP of the pedestrian. Therefore, we propose the Agile Model. There are 19,061 training images, 4,950 test images, and a total of 24011 images, with Tiny-Yolo [17]. Compared with the Model Size, the reduction is 97.4%, the execution speed is 15FPS. In the hardware design, we use High-Level-Synthesis to build DCNN IP Core. IP Core has Convolution Layer, Batch Normalization Layer, Leaky ReLU and Pooling Layer. In order to store the data in the block RAM, firstly, the original information Floating-Point 32 bit is turned into a Fixed-Point 8 bit after Truncation, and the block RAM access data mode is improved to maximize the block RAM ac-cess. Create PS/PL Interface to send our feature map and weight val-ues to IP Core for acceleration and transfer back to DRAM. Then we will use Python interface to control data flow. This circuit performs an Agile Model at 100 Mega HZ with a GOPS/Power of 30.1.