On the Programmability and Performance of OpenCL Designs for FPGA

Field programmable gate arrays (FPGAs) have been emerging as a promising bedrock to provide opportunities for several types of accelerators that spans across various domains such as finance, web-search, and data center networking, among others. Research interests facilitating the development of acce...

Full description

Bibliographic Details
Main Author: Verma, Anshuman
Other Authors: Electrical and Computer Engineering
Format: Others
Published: Virginia Tech 2019
Subjects:
HDL
HLS
GEM
Online Access:http://hdl.handle.net/10919/92699
id ndltd-VTETD-oai-vtechworks.lib.vt.edu-10919-92699
record_format oai_dc
spelling ndltd-VTETD-oai-vtechworks.lib.vt.edu-10919-926992020-09-29T05:40:49Z On the Programmability and Performance of OpenCL Designs for FPGA Verma, Anshuman Electrical and Computer Engineering Feng, Wu-Chun Zhou, Huiyang Athanas, Peter M. FPGA OpenCL HDL HLS AOCL Verilog Accelerator GEM Stencil Field programmable gate arrays (FPGAs) have been emerging as a promising bedrock to provide opportunities for several types of accelerators that spans across various domains such as finance, web-search, and data center networking, among others. Research interests facilitating the development of accelerators on FPGAs are increasing significantly, in particular, because of their effectiveness with a variety of applications, flexibility, and high performance per watt. However, several key challenges remain that hinder their large-scale deployment. Overcoming these challenges would enable them to match the pervasiveness of graphics processor units (GPUs), their principal competitors in this arena. One of the primary reasons responsible for the slow adaptation by programmers has been the programming model, which uses a low-level hardware description language (HDL). Using HDLs require a detailed understanding of logic design and significant effort to implement and verify the behavioral models, with the latter growing with its complexity. Recent advancements in high-level language synthesis (HLS) tools have addressed this challenge to a considerable extent by allowing the programmers to write their applications in a high-level language named OpenCL. These applications are then compiled and synthesized to create a bitstream that configures the FPGA. This thesis characterizes the efficacy of HLS compiler optimizations that can be employed to improve the performance of these applications. The synthesized hardware from OpenCL kernels is fundamentally different from traditional hardware such as CPUs and GPUs, which exploit instruction level parallelism (ILP) thread level parallelism (TLP), or data level parallelism (DLP) for performance gains. FPGAs typically use deep pipelining (i.e., ILP) for performance. A stall in this pipeline may severely undermine the performance of applications. Thus, it is imperative to identify and remove any such bottlenecks. To this end, this thesis presents and discusses a software-centric framework to debug and profile the synthesized designs generated using HLS tools. This thesis proposes basic code patterns, including a timestamp and a scalable framework, which can be plugged easily into OpenCL kernels, to collect and process run-time information dynamically. This scalable framework has a small overhead for area utilization and frequency but provides fine-grained information about the bottlenecks and latencies in design. Additionally, although HLS tools have improved programmability, this may come at the cost of performance or area utilization. This thesis addresses this design trade-off via a comparative study of a hand-coded design in HDL and an architecturally similar, tool-generated design using an OpenCL compiler in the application area of 3D-stencil (i.e., structured grid) computation. Experiments in this thesis show that the performance of an OpenCL approach can achieve 95% of the peak attainable performance of a microkernel for multiple problem sizes. In comparison to the OpenCL approach, an HDL approach results in approximately 50% less memory usage and only 2% better performance on average. MS 2019-08-04T06:00:33Z 2019-08-04T06:00:33Z 2018-02-09 Thesis vt_gsexam:12265 http://hdl.handle.net/10919/92699 In Copyright http://rightsstatements.org/vocab/InC/1.0/ ETD application/pdf Virginia Tech
collection NDLTD
format Others
sources NDLTD
topic FPGA
OpenCL
HDL
HLS
AOCL
Verilog
Accelerator
GEM
Stencil
spellingShingle FPGA
OpenCL
HDL
HLS
AOCL
Verilog
Accelerator
GEM
Stencil
Verma, Anshuman
On the Programmability and Performance of OpenCL Designs for FPGA
description Field programmable gate arrays (FPGAs) have been emerging as a promising bedrock to provide opportunities for several types of accelerators that spans across various domains such as finance, web-search, and data center networking, among others. Research interests facilitating the development of accelerators on FPGAs are increasing significantly, in particular, because of their effectiveness with a variety of applications, flexibility, and high performance per watt. However, several key challenges remain that hinder their large-scale deployment. Overcoming these challenges would enable them to match the pervasiveness of graphics processor units (GPUs), their principal competitors in this arena. One of the primary reasons responsible for the slow adaptation by programmers has been the programming model, which uses a low-level hardware description language (HDL). Using HDLs require a detailed understanding of logic design and significant effort to implement and verify the behavioral models, with the latter growing with its complexity. Recent advancements in high-level language synthesis (HLS) tools have addressed this challenge to a considerable extent by allowing the programmers to write their applications in a high-level language named OpenCL. These applications are then compiled and synthesized to create a bitstream that configures the FPGA. This thesis characterizes the efficacy of HLS compiler optimizations that can be employed to improve the performance of these applications. The synthesized hardware from OpenCL kernels is fundamentally different from traditional hardware such as CPUs and GPUs, which exploit instruction level parallelism (ILP) thread level parallelism (TLP), or data level parallelism (DLP) for performance gains. FPGAs typically use deep pipelining (i.e., ILP) for performance. A stall in this pipeline may severely undermine the performance of applications. Thus, it is imperative to identify and remove any such bottlenecks. To this end, this thesis presents and discusses a software-centric framework to debug and profile the synthesized designs generated using HLS tools. This thesis proposes basic code patterns, including a timestamp and a scalable framework, which can be plugged easily into OpenCL kernels, to collect and process run-time information dynamically. This scalable framework has a small overhead for area utilization and frequency but provides fine-grained information about the bottlenecks and latencies in design. Additionally, although HLS tools have improved programmability, this may come at the cost of performance or area utilization. This thesis addresses this design trade-off via a comparative study of a hand-coded design in HDL and an architecturally similar, tool-generated design using an OpenCL compiler in the application area of 3D-stencil (i.e., structured grid) computation. Experiments in this thesis show that the performance of an OpenCL approach can achieve 95% of the peak attainable performance of a microkernel for multiple problem sizes. In comparison to the OpenCL approach, an HDL approach results in approximately 50% less memory usage and only 2% better performance on average. === MS
author2 Electrical and Computer Engineering
author_facet Electrical and Computer Engineering
Verma, Anshuman
author Verma, Anshuman
author_sort Verma, Anshuman
title On the Programmability and Performance of OpenCL Designs for FPGA
title_short On the Programmability and Performance of OpenCL Designs for FPGA
title_full On the Programmability and Performance of OpenCL Designs for FPGA
title_fullStr On the Programmability and Performance of OpenCL Designs for FPGA
title_full_unstemmed On the Programmability and Performance of OpenCL Designs for FPGA
title_sort on the programmability and performance of opencl designs for fpga
publisher Virginia Tech
publishDate 2019
url http://hdl.handle.net/10919/92699
work_keys_str_mv AT vermaanshuman ontheprogrammabilityandperformanceofopencldesignsforfpga
_version_ 1719345120106512384