Overlay Architectures for FPGA-Based Software Packet Processing

Packet processing is the enabling technology of networked information systems such as the Internet and is usually performed with fixed-function custom-made ASIC chips. As communication protocols evolve rapidly, there is increasing interest in adapting features of the processing over time and, since...

Full description

Bibliographic Details
Main Author:	Martin, Labrecque
Other Authors:	Gregory, Steffan
Language:	en_ca
Published:	2011
Subjects:	computer architecture soft processors FPGA packet processing network processor multithreaded transactional memory 0544
Online Access:	http://hdl.handle.net/1807/27612

id	ndltd-TORONTO-oai-tspace.library.utoronto.ca-1807-27612
record_format	oai_dc
spelling	ndltd-TORONTO-oai-tspace.library.utoronto.ca-1807-276122013-04-19T19:56:26ZOverlay Architectures for FPGA-Based Software Packet ProcessingMartin, Labrecquecomputer architecturesoft processorsFPGApacket processingnetwork processormultithreadedtransactional memory0544Packet processing is the enabling technology of networked information systems such as the Internet and is usually performed with fixed-function custom-made ASIC chips. As communication protocols evolve rapidly, there is increasing interest in adapting features of the processing over time and, since software is the preferred way of expressing complex computation, we are interested in finding a platform to execute packet processing software with the best possible throughput. Because FPGAs are widely used in network equipment and they can implement processors, we are motivated to investigate executing software directly on the FPGAs. Off-the-shelf soft processors on FPGA fabric are currently geared towards performing embedded sequential tasks and, in contrast, network processing is most often inherently parallel between packet flows, if not between each individual packet. Our goal is to allow multiple threads of execution in an FPGA to reach a higher aggregate throughput than commercially available shared-memory soft multi-processors via improvements to the underlying soft processor architecture. We study a number of processor pipeline organizations to identify which ones can scale to a larger number of execution threads and find that tuning multithreaded pipelines can provide compact cores with high throughput. We then perform a design space exploration of multicore soft systems, compare single-threaded and multithreaded designs to identify scalability limits and develop processor architectures allowing threads to execute with as little architectural stalls as possible: in particular with instruction replay and static hazard detection mechanisms. To further reduce the wait times, we allow threads to speculatively execute by leveraging transactional memory. Our multithreaded multiprocessor along with our compilation and simulation framework makes the FPGA easy to use for an average programmer who can write an application as a single thread of computation with coarse-grained synchronization around shared data structures. Comparing with multithreaded processors using lock-based synchronization, we measure up to 57\% additional throughput with the use of transactional-memory-based synchronization. Given our applications, gigabit interfaces and 125 MHz system clock rate, our results suggest that soft processors can process packets in software at high throughput and low latency, while capitalizing on the FPGAs already available in network equipment.Gregory, Steffan2011-112011-06-16T15:43:13ZNO_RESTRICTION2011-06-16T15:43:13Z2011-06-16T15:43:13ZThesishttp://hdl.handle.net/1807/27612en_ca
collection	NDLTD
language	en_ca
sources	NDLTD
topic	computer architecture soft processors FPGA packet processing network processor multithreaded transactional memory 0544
spellingShingle	computer architecture soft processors FPGA packet processing network processor multithreaded transactional memory 0544 Martin, Labrecque Overlay Architectures for FPGA-Based Software Packet Processing
description	Packet processing is the enabling technology of networked information systems such as the Internet and is usually performed with fixed-function custom-made ASIC chips. As communication protocols evolve rapidly, there is increasing interest in adapting features of the processing over time and, since software is the preferred way of expressing complex computation, we are interested in finding a platform to execute packet processing software with the best possible throughput. Because FPGAs are widely used in network equipment and they can implement processors, we are motivated to investigate executing software directly on the FPGAs. Off-the-shelf soft processors on FPGA fabric are currently geared towards performing embedded sequential tasks and, in contrast, network processing is most often inherently parallel between packet flows, if not between each individual packet. Our goal is to allow multiple threads of execution in an FPGA to reach a higher aggregate throughput than commercially available shared-memory soft multi-processors via improvements to the underlying soft processor architecture. We study a number of processor pipeline organizations to identify which ones can scale to a larger number of execution threads and find that tuning multithreaded pipelines can provide compact cores with high throughput. We then perform a design space exploration of multicore soft systems, compare single-threaded and multithreaded designs to identify scalability limits and develop processor architectures allowing threads to execute with as little architectural stalls as possible: in particular with instruction replay and static hazard detection mechanisms. To further reduce the wait times, we allow threads to speculatively execute by leveraging transactional memory. Our multithreaded multiprocessor along with our compilation and simulation framework makes the FPGA easy to use for an average programmer who can write an application as a single thread of computation with coarse-grained synchronization around shared data structures. Comparing with multithreaded processors using lock-based synchronization, we measure up to 57\% additional throughput with the use of transactional-memory-based synchronization. Given our applications, gigabit interfaces and 125 MHz system clock rate, our results suggest that soft processors can process packets in software at high throughput and low latency, while capitalizing on the FPGAs already available in network equipment.
author2	Gregory, Steffan
author_facet	Gregory, Steffan Martin, Labrecque
author	Martin, Labrecque
author_sort	Martin, Labrecque
title	Overlay Architectures for FPGA-Based Software Packet Processing
title_short	Overlay Architectures for FPGA-Based Software Packet Processing
title_full	Overlay Architectures for FPGA-Based Software Packet Processing
title_fullStr	Overlay Architectures for FPGA-Based Software Packet Processing
title_full_unstemmed	Overlay Architectures for FPGA-Based Software Packet Processing
title_sort	overlay architectures for fpga-based software packet processing
publishDate	2011
url	http://hdl.handle.net/1807/27612
work_keys_str_mv	AT martinlabrecque overlayarchitecturesforfpgabasedsoftwarepacketprocessing
_version_	1716582067038322688

Overlay Architectures for FPGA-Based Software Packet Processing

Similar Items