Overlay Architectures for FPGA-Based Software Packet Processing

Packet processing is the enabling technology of networked information systems such as the Internet and is usually performed with fixed-function custom-made ASIC chips. As communication protocols evolve rapidly, there is increasing interest in adapting features of the processing over time and, since...

Full description

Bibliographic Details
Main Author: Martin, Labrecque
Other Authors: Gregory, Steffan
Language:en_ca
Published: 2011
Subjects:
Online Access:http://hdl.handle.net/1807/27612
id ndltd-TORONTO-oai-tspace.library.utoronto.ca-1807-27612
record_format oai_dc
spelling ndltd-TORONTO-oai-tspace.library.utoronto.ca-1807-276122013-04-19T19:56:26ZOverlay Architectures for FPGA-Based Software Packet ProcessingMartin, Labrecquecomputer architecturesoft processorsFPGApacket processingnetwork processormultithreadedtransactional memory0544Packet processing is the enabling technology of networked information systems such as the Internet and is usually performed with fixed-function custom-made ASIC chips. As communication protocols evolve rapidly, there is increasing interest in adapting features of the processing over time and, since software is the preferred way of expressing complex computation, we are interested in finding a platform to execute packet processing software with the best possible throughput. Because FPGAs are widely used in network equipment and they can implement processors, we are motivated to investigate executing software directly on the FPGAs. Off-the-shelf soft processors on FPGA fabric are currently geared towards performing embedded sequential tasks and, in contrast, network processing is most often inherently parallel between packet flows, if not between each individual packet. Our goal is to allow multiple threads of execution in an FPGA to reach a higher aggregate throughput than commercially available shared-memory soft multi-processors via improvements to the underlying soft processor architecture. We study a number of processor pipeline organizations to identify which ones can scale to a larger number of execution threads and find that tuning multithreaded pipelines can provide compact cores with high throughput. We then perform a design space exploration of multicore soft systems, compare single-threaded and multithreaded designs to identify scalability limits and develop processor architectures allowing threads to execute with as little architectural stalls as possible: in particular with instruction replay and static hazard detection mechanisms. To further reduce the wait times, we allow threads to speculatively execute by leveraging transactional memory. Our multithreaded multiprocessor along with our compilation and simulation framework makes the FPGA easy to use for an average programmer who can write an application as a single thread of computation with coarse-grained synchronization around shared data structures. Comparing with multithreaded processors using lock-based synchronization, we measure up to 57\% additional throughput with the use of transactional-memory-based synchronization. Given our applications, gigabit interfaces and 125 MHz system clock rate, our results suggest that soft processors can process packets in software at high throughput and low latency, while capitalizing on the FPGAs already available in network equipment.Gregory, Steffan2011-112011-06-16T15:43:13ZNO_RESTRICTION2011-06-16T15:43:13Z2011-06-16T15:43:13ZThesishttp://hdl.handle.net/1807/27612en_ca
collection NDLTD
language en_ca
sources NDLTD
topic computer architecture
soft processors
FPGA
packet processing
network processor
multithreaded
transactional memory
0544
spellingShingle computer architecture
soft processors
FPGA
packet processing
network processor
multithreaded
transactional memory
0544
Martin, Labrecque
Overlay Architectures for FPGA-Based Software Packet Processing
description Packet processing is the enabling technology of networked information systems such as the Internet and is usually performed with fixed-function custom-made ASIC chips. As communication protocols evolve rapidly, there is increasing interest in adapting features of the processing over time and, since software is the preferred way of expressing complex computation, we are interested in finding a platform to execute packet processing software with the best possible throughput. Because FPGAs are widely used in network equipment and they can implement processors, we are motivated to investigate executing software directly on the FPGAs. Off-the-shelf soft processors on FPGA fabric are currently geared towards performing embedded sequential tasks and, in contrast, network processing is most often inherently parallel between packet flows, if not between each individual packet. Our goal is to allow multiple threads of execution in an FPGA to reach a higher aggregate throughput than commercially available shared-memory soft multi-processors via improvements to the underlying soft processor architecture. We study a number of processor pipeline organizations to identify which ones can scale to a larger number of execution threads and find that tuning multithreaded pipelines can provide compact cores with high throughput. We then perform a design space exploration of multicore soft systems, compare single-threaded and multithreaded designs to identify scalability limits and develop processor architectures allowing threads to execute with as little architectural stalls as possible: in particular with instruction replay and static hazard detection mechanisms. To further reduce the wait times, we allow threads to speculatively execute by leveraging transactional memory. Our multithreaded multiprocessor along with our compilation and simulation framework makes the FPGA easy to use for an average programmer who can write an application as a single thread of computation with coarse-grained synchronization around shared data structures. Comparing with multithreaded processors using lock-based synchronization, we measure up to 57\% additional throughput with the use of transactional-memory-based synchronization. Given our applications, gigabit interfaces and 125 MHz system clock rate, our results suggest that soft processors can process packets in software at high throughput and low latency, while capitalizing on the FPGAs already available in network equipment.
author2 Gregory, Steffan
author_facet Gregory, Steffan
Martin, Labrecque
author Martin, Labrecque
author_sort Martin, Labrecque
title Overlay Architectures for FPGA-Based Software Packet Processing
title_short Overlay Architectures for FPGA-Based Software Packet Processing
title_full Overlay Architectures for FPGA-Based Software Packet Processing
title_fullStr Overlay Architectures for FPGA-Based Software Packet Processing
title_full_unstemmed Overlay Architectures for FPGA-Based Software Packet Processing
title_sort overlay architectures for fpga-based software packet processing
publishDate 2011
url http://hdl.handle.net/1807/27612
work_keys_str_mv AT martinlabrecque overlayarchitecturesforfpgabasedsoftwarepacketprocessing
_version_ 1716582067038322688