Overlay Architectures for FPGA-Based Software Packet Processing
Packet processing is the enabling technology of networked information systems such as the Internet and is usually performed with fixed-function custom-made ASIC chips. As communication protocols evolve rapidly, there is increasing interest in adapting features of the processing over time and, since...
Main Author: | |
---|---|
Other Authors: | |
Language: | en_ca |
Published: |
2011
|
Subjects: | |
Online Access: | http://hdl.handle.net/1807/27612 |
id |
ndltd-TORONTO-oai-tspace.library.utoronto.ca-1807-27612 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TORONTO-oai-tspace.library.utoronto.ca-1807-276122013-04-19T19:56:26ZOverlay Architectures for FPGA-Based Software Packet ProcessingMartin, Labrecquecomputer architecturesoft processorsFPGApacket processingnetwork processormultithreadedtransactional memory0544Packet processing is the enabling technology of networked information systems such as the Internet and is usually performed with fixed-function custom-made ASIC chips. As communication protocols evolve rapidly, there is increasing interest in adapting features of the processing over time and, since software is the preferred way of expressing complex computation, we are interested in finding a platform to execute packet processing software with the best possible throughput. Because FPGAs are widely used in network equipment and they can implement processors, we are motivated to investigate executing software directly on the FPGAs. Off-the-shelf soft processors on FPGA fabric are currently geared towards performing embedded sequential tasks and, in contrast, network processing is most often inherently parallel between packet flows, if not between each individual packet. Our goal is to allow multiple threads of execution in an FPGA to reach a higher aggregate throughput than commercially available shared-memory soft multi-processors via improvements to the underlying soft processor architecture. We study a number of processor pipeline organizations to identify which ones can scale to a larger number of execution threads and find that tuning multithreaded pipelines can provide compact cores with high throughput. We then perform a design space exploration of multicore soft systems, compare single-threaded and multithreaded designs to identify scalability limits and develop processor architectures allowing threads to execute with as little architectural stalls as possible: in particular with instruction replay and static hazard detection mechanisms. To further reduce the wait times, we allow threads to speculatively execute by leveraging transactional memory. Our multithreaded multiprocessor along with our compilation and simulation framework makes the FPGA easy to use for an average programmer who can write an application as a single thread of computation with coarse-grained synchronization around shared data structures. Comparing with multithreaded processors using lock-based synchronization, we measure up to 57\% additional throughput with the use of transactional-memory-based synchronization. Given our applications, gigabit interfaces and 125 MHz system clock rate, our results suggest that soft processors can process packets in software at high throughput and low latency, while capitalizing on the FPGAs already available in network equipment.Gregory, Steffan2011-112011-06-16T15:43:13ZNO_RESTRICTION2011-06-16T15:43:13Z2011-06-16T15:43:13ZThesishttp://hdl.handle.net/1807/27612en_ca |
collection |
NDLTD |
language |
en_ca |
sources |
NDLTD |
topic |
computer architecture soft processors FPGA packet processing network processor multithreaded transactional memory 0544 |
spellingShingle |
computer architecture soft processors FPGA packet processing network processor multithreaded transactional memory 0544 Martin, Labrecque Overlay Architectures for FPGA-Based Software Packet Processing |
description |
Packet processing is the enabling technology of networked information systems
such as the Internet and is usually performed with fixed-function custom-made
ASIC chips. As communication protocols evolve rapidly, there is increasing
interest in adapting features of the processing over time and, since software
is the preferred way of expressing complex computation, we are interested in
finding a platform to execute packet processing software with the best
possible throughput. Because FPGAs are widely used in network equipment and
they can implement processors, we are motivated to investigate executing
software directly on the FPGAs. Off-the-shelf soft processors on FPGA fabric
are currently geared towards performing embedded sequential tasks and, in
contrast, network processing is most often inherently parallel between packet
flows, if not between each individual packet.
Our goal is to allow multiple threads of execution in an FPGA to reach a
higher aggregate throughput than commercially available shared-memory soft
multi-processors via improvements to the underlying soft processor
architecture. We study a number of processor pipeline organizations to
identify which ones can scale to a larger number of execution threads and find
that tuning multithreaded pipelines can provide compact cores with high
throughput. We then perform a design space exploration of multicore soft
systems, compare single-threaded and multithreaded designs to identify
scalability limits and develop processor architectures allowing threads to
execute with as little architectural stalls as possible: in particular with
instruction replay and static hazard detection mechanisms. To further reduce
the wait times, we allow threads to speculatively execute by leveraging
transactional memory. Our multithreaded multiprocessor along with our
compilation and simulation framework makes the FPGA easy to use for an average
programmer who can write an application as a single thread of computation with
coarse-grained synchronization around shared data structures. Comparing with
multithreaded processors using lock-based synchronization, we measure up to
57\% additional throughput with the use of transactional-memory-based
synchronization. Given our applications, gigabit interfaces and 125 MHz system
clock rate, our results suggest that soft processors can process packets in
software at high throughput and low latency, while capitalizing on the FPGAs
already available in network equipment. |
author2 |
Gregory, Steffan |
author_facet |
Gregory, Steffan Martin, Labrecque |
author |
Martin, Labrecque |
author_sort |
Martin, Labrecque |
title |
Overlay Architectures for FPGA-Based Software Packet Processing |
title_short |
Overlay Architectures for FPGA-Based Software Packet Processing |
title_full |
Overlay Architectures for FPGA-Based Software Packet Processing |
title_fullStr |
Overlay Architectures for FPGA-Based Software Packet Processing |
title_full_unstemmed |
Overlay Architectures for FPGA-Based Software Packet Processing |
title_sort |
overlay architectures for fpga-based software packet processing |
publishDate |
2011 |
url |
http://hdl.handle.net/1807/27612 |
work_keys_str_mv |
AT martinlabrecque overlayarchitecturesforfpgabasedsoftwarepacketprocessing |
_version_ |
1716582067038322688 |