Compilation of Stream Programs onto Embedded Multicore Architectures

abstract: In recent years, we have observed the prevalence of stream applications in many embedded domains. Stream programs distinguish themselves from traditional sequential programming languages through well defined independent actors, explicit data communication, and stable code/data access patte...

Full description

Bibliographic Details
Other Authors: Che, Weijia (Author)
Format: Doctoral Thesis
Language:English
Published: 2012
Subjects:
Online Access:http://hdl.handle.net/2286/R.I.15224
id ndltd-asu.edu-item-15224
record_format oai_dc
spelling ndltd-asu.edu-item-152242018-06-22T03:03:21Z Compilation of Stream Programs onto Embedded Multicore Architectures abstract: In recent years, we have observed the prevalence of stream applications in many embedded domains. Stream programs distinguish themselves from traditional sequential programming languages through well defined independent actors, explicit data communication, and stable code/data access patterns. In order to achieve high performance and low power, scratch pad memory (SPM) has been introduced in today's embedded multicore processors. Current design frameworks for developing stream applications on SPM enhanced embedded architectures typically do not include a compiler that can perform automatic partitioning, mapping and scheduling under limited on-chip SPM capacities and memory access delays. Consequently, many designs are implemented manually, which leads to lengthy tasks and inferior designs. In this work, optimization techniques that automatically compile stream programs onto embedded multi-core architectures are proposed. As an initial case study, we implemented an automatic target recognition (ATR) algorithm on the IBM Cell Broadband Engine (BE). Then integer linear programming (ILP) and heuristic approaches were proposed to schedule stream programs on a single core embedded processor that has an SPM with code overlay. Later, ILP and heuristic approaches for Compiling Stream programs on SPM enhanced Multicore Processors (CSMP) were studied. The proposed CSMP ILP and heuristic approaches do not optimize for cycles in stream applications. Further, the number of software pipeline stages in the implementation is dependent on actor to processing engine (PE) mapping and is uncontrollable. We next presented a Retiming technique for Throughput optimization on Embedded Multi-core processors (RTEM). RTEM approach inherently handles cycles and can accept an upper bound on the number of software pipeline stages to be generated. We further enhanced RTEM by incorporating unrolling (URSTEM) that preserves all the beneficial properties of RTEM heuristic and also scales with the number of PEs through unrolling. Dissertation/Thesis Che, Weijia (Author) Chatha, Karam Singh (Advisor) Chatha, Karam Singh (Advisor) Vrudhula, Sarma (Committee member) Chakrabarti, Chaitali (Committee member) Shrivastava, Aviral (Committee member) Arizona State University (Publisher) Computer science compilation embedded multicore parallel scratchpad stream eng 250 pages Ph.D. Computer Science 2012 Doctoral Dissertation http://hdl.handle.net/2286/R.I.15224 http://rightsstatements.org/vocab/InC/1.0/ All Rights Reserved 2012
collection NDLTD
language English
format Doctoral Thesis
sources NDLTD
topic Computer science
compilation
embedded
multicore
parallel
scratchpad
stream
spellingShingle Computer science
compilation
embedded
multicore
parallel
scratchpad
stream
Compilation of Stream Programs onto Embedded Multicore Architectures
description abstract: In recent years, we have observed the prevalence of stream applications in many embedded domains. Stream programs distinguish themselves from traditional sequential programming languages through well defined independent actors, explicit data communication, and stable code/data access patterns. In order to achieve high performance and low power, scratch pad memory (SPM) has been introduced in today's embedded multicore processors. Current design frameworks for developing stream applications on SPM enhanced embedded architectures typically do not include a compiler that can perform automatic partitioning, mapping and scheduling under limited on-chip SPM capacities and memory access delays. Consequently, many designs are implemented manually, which leads to lengthy tasks and inferior designs. In this work, optimization techniques that automatically compile stream programs onto embedded multi-core architectures are proposed. As an initial case study, we implemented an automatic target recognition (ATR) algorithm on the IBM Cell Broadband Engine (BE). Then integer linear programming (ILP) and heuristic approaches were proposed to schedule stream programs on a single core embedded processor that has an SPM with code overlay. Later, ILP and heuristic approaches for Compiling Stream programs on SPM enhanced Multicore Processors (CSMP) were studied. The proposed CSMP ILP and heuristic approaches do not optimize for cycles in stream applications. Further, the number of software pipeline stages in the implementation is dependent on actor to processing engine (PE) mapping and is uncontrollable. We next presented a Retiming technique for Throughput optimization on Embedded Multi-core processors (RTEM). RTEM approach inherently handles cycles and can accept an upper bound on the number of software pipeline stages to be generated. We further enhanced RTEM by incorporating unrolling (URSTEM) that preserves all the beneficial properties of RTEM heuristic and also scales with the number of PEs through unrolling. === Dissertation/Thesis === Ph.D. Computer Science 2012
author2 Che, Weijia (Author)
author_facet Che, Weijia (Author)
title Compilation of Stream Programs onto Embedded Multicore Architectures
title_short Compilation of Stream Programs onto Embedded Multicore Architectures
title_full Compilation of Stream Programs onto Embedded Multicore Architectures
title_fullStr Compilation of Stream Programs onto Embedded Multicore Architectures
title_full_unstemmed Compilation of Stream Programs onto Embedded Multicore Architectures
title_sort compilation of stream programs onto embedded multicore architectures
publishDate 2012
url http://hdl.handle.net/2286/R.I.15224
_version_ 1718699852328599552