A Cloud Server Oriented FPGA Accelerator for LSTM Recurrent Neural Network

Long Short-Term Memory network (LSTM) is the most widely used recurrent neural network architecture. It plays an important role in a number of research areas, such as language modeling, machine translation and image captioning. However, owing to its recurrent nature, general-purpose processors like...

Full description

Bibliographic Details
Main Authors: Jun Liu, Jiasheng Wang, Yu Zhou, Fang Liu
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8819996/
id doaj-3a97a08e64cb4b16b373ac8b6ec9a39e
record_format Article
spelling doaj-3a97a08e64cb4b16b373ac8b6ec9a39e2021-03-29T23:24:45ZengIEEEIEEE Access2169-35362019-01-01712240812241810.1109/ACCESS.2019.29382348819996A Cloud Server Oriented FPGA Accelerator for LSTM Recurrent Neural NetworkJun Liu0https://orcid.org/0000-0003-4007-6109Jiasheng Wang1Yu Zhou2Fang Liu3Center for Data Science Beijing University of Posts and Telecommunications, Beijing, ChinaCenter for Data Science Beijing University of Posts and Telecommunications, Beijing, ChinaCenter for Data Science Beijing University of Posts and Telecommunications, Beijing, ChinaCenter for Data Science Beijing University of Posts and Telecommunications, Beijing, ChinaLong Short-Term Memory network (LSTM) is the most widely used recurrent neural network architecture. It plays an important role in a number of research areas, such as language modeling, machine translation and image captioning. However, owing to its recurrent nature, general-purpose processors like CPUs and GPGPUs can achieve limited parallelism while consuming high power energy. FPGA accelerators can outperform general-purpose processors with flexibility, energy-efficiency and more delicate optimization capabilities for the recurrent based algorithms. In this paper, we present the design and implementation of a cloud-oriented FPGA accelerator for LSTM. Different from most of previous works designed for embedded systems, our FPGA accelerator transfers data sequences from and to the host server through PCIe and performs multiple time series predictions in parallel. We optimize both the on-chip computation and the communication between the host server and the FPGA board. We perform experiments to evaluate the overall performance as well as the computation and the PCIe communication efforts. The results show that the performance of our implementation is better than the CPU-based and other hardware-based implementations.https://ieeexplore.ieee.org/document/8819996/Long short-term memoryFPGAcloud computingrecurrent neural networkOpenCL
collection DOAJ
language English
format Article
sources DOAJ
author Jun Liu
Jiasheng Wang
Yu Zhou
Fang Liu
spellingShingle Jun Liu
Jiasheng Wang
Yu Zhou
Fang Liu
A Cloud Server Oriented FPGA Accelerator for LSTM Recurrent Neural Network
IEEE Access
Long short-term memory
FPGA
cloud computing
recurrent neural network
OpenCL
author_facet Jun Liu
Jiasheng Wang
Yu Zhou
Fang Liu
author_sort Jun Liu
title A Cloud Server Oriented FPGA Accelerator for LSTM Recurrent Neural Network
title_short A Cloud Server Oriented FPGA Accelerator for LSTM Recurrent Neural Network
title_full A Cloud Server Oriented FPGA Accelerator for LSTM Recurrent Neural Network
title_fullStr A Cloud Server Oriented FPGA Accelerator for LSTM Recurrent Neural Network
title_full_unstemmed A Cloud Server Oriented FPGA Accelerator for LSTM Recurrent Neural Network
title_sort cloud server oriented fpga accelerator for lstm recurrent neural network
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2019-01-01
description Long Short-Term Memory network (LSTM) is the most widely used recurrent neural network architecture. It plays an important role in a number of research areas, such as language modeling, machine translation and image captioning. However, owing to its recurrent nature, general-purpose processors like CPUs and GPGPUs can achieve limited parallelism while consuming high power energy. FPGA accelerators can outperform general-purpose processors with flexibility, energy-efficiency and more delicate optimization capabilities for the recurrent based algorithms. In this paper, we present the design and implementation of a cloud-oriented FPGA accelerator for LSTM. Different from most of previous works designed for embedded systems, our FPGA accelerator transfers data sequences from and to the host server through PCIe and performs multiple time series predictions in parallel. We optimize both the on-chip computation and the communication between the host server and the FPGA board. We perform experiments to evaluate the overall performance as well as the computation and the PCIe communication efforts. The results show that the performance of our implementation is better than the CPU-based and other hardware-based implementations.
topic Long short-term memory
FPGA
cloud computing
recurrent neural network
OpenCL
url https://ieeexplore.ieee.org/document/8819996/
work_keys_str_mv AT junliu acloudserverorientedfpgaacceleratorforlstmrecurrentneuralnetwork
AT jiashengwang acloudserverorientedfpgaacceleratorforlstmrecurrentneuralnetwork
AT yuzhou acloudserverorientedfpgaacceleratorforlstmrecurrentneuralnetwork
AT fangliu acloudserverorientedfpgaacceleratorforlstmrecurrentneuralnetwork
AT junliu cloudserverorientedfpgaacceleratorforlstmrecurrentneuralnetwork
AT jiashengwang cloudserverorientedfpgaacceleratorforlstmrecurrentneuralnetwork
AT yuzhou cloudserverorientedfpgaacceleratorforlstmrecurrentneuralnetwork
AT fangliu cloudserverorientedfpgaacceleratorforlstmrecurrentneuralnetwork
_version_ 1724189526056763392