StreamWorks: An Energy-efficient Embedded Co-processor for Stream Computing

abstract: Stream processing has emerged as an important model of computation especially in the context of multimedia and communication sub-systems of embedded System-on-Chip (SoC) architectures. The dataflow nature of streaming applications allows them to be most naturally expressed as a set of kern...

Full description

Bibliographic Details
Other Authors: Panda, Amrit Kumar (Author)
Format: Doctoral Thesis
Language:English
Published: 2014
Subjects:
Online Access:http://hdl.handle.net/2286/R.I.25936
id ndltd-asu.edu-item-25936
record_format oai_dc
spelling ndltd-asu.edu-item-259362018-06-22T03:05:29Z StreamWorks: An Energy-efficient Embedded Co-processor for Stream Computing abstract: Stream processing has emerged as an important model of computation especially in the context of multimedia and communication sub-systems of embedded System-on-Chip (SoC) architectures. The dataflow nature of streaming applications allows them to be most naturally expressed as a set of kernels iteratively operating on continuous streams of data. The kernels are computationally intensive and are mainly characterized by real-time constraints that demand high throughput and data bandwidth with limited global data reuse. Conventional architectures fail to meet these demands due to their poorly matched execution models and the overheads associated with instruction and data movements. This work presents StreamWorks, a multi-core embedded architecture for energy-efficient stream computing. The basic processing element in the StreamWorks architecture is the StreamEngine (SE) which is responsible for iteratively executing a stream kernel. SE introduces an instruction locking mechanism that exploits the iterative nature of the kernels and enables fine-grain instruction reuse. Each instruction in a SE is locked to a Reservation Station (RS) and revitalizes itself after execution; thus never retiring from the RS. The entire kernel is hosted in RS Banks (RSBs) close to functional units for energy-efficient instruction delivery. The dataflow semantics of stream kernels are captured by a context-aware dataflow execution mode that efficiently exploits the Instruction Level Parallelism (ILP) and Data-level parallelism (DLP) within stream kernels. Multiple SEs are grouped together to form a StreamCluster (SC) that communicate via a local interconnect. A novel software FIFO virtualization technique with split-join functionality is proposed for efficient and scalable stream communication across SEs. The proposed communication mechanism exploits the Task-level parallelism (TLP) of the stream application. The performance and scalability of the communication mechanism is evaluated against the existing data movement schemes for scratchpad based multi-core architectures. Further, overlay schemes and architectural support are proposed that allow hosting any number of kernels on the StreamWorks architecture. The proposed oevrlay schemes for code management supports kernel(context) switching for the most common use cases and can be adapted for any multi-core architecture that use software managed local memories. The performance and energy-efficiency of the StreamWorks architecture is evaluated for stream kernel and application benchmarks by implementing the architecture in 45nm TSMC and comparison with a low power RISC core and a contemporary accelerator. Dissertation/Thesis Panda, Amrit Kumar (Author) Chatha, Karam S. (Advisor) Wu, Carole-Jean (Advisor) Chakrabarti, Chaitali (Committee member) Shrivastava, Aviral (Committee member) Arizona State University (Publisher) Computer engineering Electrical engineering Computer science co-processor dataflow embedded energy-efficient microarchitecture streaming eng 138 pages Doctoral Dissertation Computer Science 2014 Doctoral Dissertation http://hdl.handle.net/2286/R.I.25936 http://rightsstatements.org/vocab/InC/1.0/ All Rights Reserved 2014
collection NDLTD
language English
format Doctoral Thesis
sources NDLTD
topic Computer engineering
Electrical engineering
Computer science
co-processor
dataflow
embedded
energy-efficient
microarchitecture
streaming
spellingShingle Computer engineering
Electrical engineering
Computer science
co-processor
dataflow
embedded
energy-efficient
microarchitecture
streaming
StreamWorks: An Energy-efficient Embedded Co-processor for Stream Computing
description abstract: Stream processing has emerged as an important model of computation especially in the context of multimedia and communication sub-systems of embedded System-on-Chip (SoC) architectures. The dataflow nature of streaming applications allows them to be most naturally expressed as a set of kernels iteratively operating on continuous streams of data. The kernels are computationally intensive and are mainly characterized by real-time constraints that demand high throughput and data bandwidth with limited global data reuse. Conventional architectures fail to meet these demands due to their poorly matched execution models and the overheads associated with instruction and data movements. This work presents StreamWorks, a multi-core embedded architecture for energy-efficient stream computing. The basic processing element in the StreamWorks architecture is the StreamEngine (SE) which is responsible for iteratively executing a stream kernel. SE introduces an instruction locking mechanism that exploits the iterative nature of the kernels and enables fine-grain instruction reuse. Each instruction in a SE is locked to a Reservation Station (RS) and revitalizes itself after execution; thus never retiring from the RS. The entire kernel is hosted in RS Banks (RSBs) close to functional units for energy-efficient instruction delivery. The dataflow semantics of stream kernels are captured by a context-aware dataflow execution mode that efficiently exploits the Instruction Level Parallelism (ILP) and Data-level parallelism (DLP) within stream kernels. Multiple SEs are grouped together to form a StreamCluster (SC) that communicate via a local interconnect. A novel software FIFO virtualization technique with split-join functionality is proposed for efficient and scalable stream communication across SEs. The proposed communication mechanism exploits the Task-level parallelism (TLP) of the stream application. The performance and scalability of the communication mechanism is evaluated against the existing data movement schemes for scratchpad based multi-core architectures. Further, overlay schemes and architectural support are proposed that allow hosting any number of kernels on the StreamWorks architecture. The proposed oevrlay schemes for code management supports kernel(context) switching for the most common use cases and can be adapted for any multi-core architecture that use software managed local memories. The performance and energy-efficiency of the StreamWorks architecture is evaluated for stream kernel and application benchmarks by implementing the architecture in 45nm TSMC and comparison with a low power RISC core and a contemporary accelerator. === Dissertation/Thesis === Doctoral Dissertation Computer Science 2014
author2 Panda, Amrit Kumar (Author)
author_facet Panda, Amrit Kumar (Author)
title StreamWorks: An Energy-efficient Embedded Co-processor for Stream Computing
title_short StreamWorks: An Energy-efficient Embedded Co-processor for Stream Computing
title_full StreamWorks: An Energy-efficient Embedded Co-processor for Stream Computing
title_fullStr StreamWorks: An Energy-efficient Embedded Co-processor for Stream Computing
title_full_unstemmed StreamWorks: An Energy-efficient Embedded Co-processor for Stream Computing
title_sort streamworks: an energy-efficient embedded co-processor for stream computing
publishDate 2014
url http://hdl.handle.net/2286/R.I.25936
_version_ 1718700524897828864