Allocation of SISAL program graphs to processors using BLAS

There are a number of well known techniques for extracting parallelism from a given program. They range from hardware implementations, building restructuring compilers or reorganizing of programs so as to specify all the available parallelism. The success rate of any of the known techniques is rathe...

Full description

Bibliographic Details
Main Author: Raisinghani, Manoj H.
Other Authors: Lee, Ben
Language:en_US
Published: 2012
Subjects:
Online Access:http://hdl.handle.net/1957/35764
id ndltd-ORGSU-oai-ir.library.oregonstate.edu-1957-35764
record_format oai_dc
spelling ndltd-ORGSU-oai-ir.library.oregonstate.edu-1957-357642012-12-14T03:14:22ZAllocation of SISAL program graphs to processors using BLASRaisinghani, Manoj H.Computer architectureProgramming languages (Electronic computers)Parallel processing (Electronic computers)There are a number of well known techniques for extracting parallelism from a given program. They range from hardware implementations, building restructuring compilers or reorganizing of programs so as to specify all the available parallelism. The success rate of any of the known techniques is rather poor over all types of programs. This has pushed the research community to explore new languages and design different architectures to exploit program parallelism. The principles of dataflow architectures have addressed the problem of exploiting parallelism in systems by executing dataflow graphs. These graphs or programs represent data dependencies among instructions and execution of the graph proceeds in a data-driven manner. That is, an instruction is executed as soon as all its operands are available, without waiting for any program counter to sequence its execution, as is the case in conventional von Neumann architectures. In this thesis, data flow graphs are generated during the intermediate compilation of a functional language called SISAL (Streams and Iterations in a Single Assignment Language). The Intermediate Form (IFl) is a graphical language consisting of multiple acyclic function graphs that represent a given program. Each graph consists of a sequence of nodes and edges. The nodes specify the operation and the edges indicate the dependencies between the nodes. The graphs are further connected to each other by means of implicit dependencies. The Automator package developed in this project, preprocesses these multiple IF1 graphs and translates them into a single connected graph. It converts all implicit dependencies into actual ones. Additionally, complex language constructs like For All, loops and if-then-else are treated in special ways together with their nested levels by the Automator. There is virtually no limit to the number of nested levels that can be translated by this package. The Automator's prime contribution is in translating real programs written in SISAL into a specified format required by an allocation algorithm called the Balanced Layered Allocation Scheme (BLAS). BLAS partitions a connected graph into independent tasks and assigns them to processors in a multicomputer system. The problem of program allocation lies in maximizing parallelism while minimizing interprocessor communication costs. Hence, allocation is based on the best choice of communication to execution ratio for each task. BLAS utilizes heuristic rules to find a balance between computation and communication costs in the target system. Here the target architecture is a simulated nCUBE 3E computer, having a hypercube topology. Simulations show that, BLAS is effective in reducing the overall execution time of a program by considering the communication costs on the execution times. The results will help in understanding the effects in packing nodes (grain-packing), routing issues in the network and in general, the allocation problem to any processor in a network. In addition, tasks have also been assigned to adjacent processors only, instead of any processor on the hypercube network. The adjacent allocation to processors helps to determine trade-offs required between achieved speed-ups and the time it takes to completely allocate large graphs on compilation.Graduation date: 1994Lee, Ben2012-12-13T20:24:53Z2012-12-13T20:24:53Z1994-04-071994-04-07Thesis/Dissertationhttp://hdl.handle.net/1957/35764en_US
collection NDLTD
language en_US
sources NDLTD
topic Computer architecture
Programming languages (Electronic computers)
Parallel processing (Electronic computers)
spellingShingle Computer architecture
Programming languages (Electronic computers)
Parallel processing (Electronic computers)
Raisinghani, Manoj H.
Allocation of SISAL program graphs to processors using BLAS
description There are a number of well known techniques for extracting parallelism from a given program. They range from hardware implementations, building restructuring compilers or reorganizing of programs so as to specify all the available parallelism. The success rate of any of the known techniques is rather poor over all types of programs. This has pushed the research community to explore new languages and design different architectures to exploit program parallelism. The principles of dataflow architectures have addressed the problem of exploiting parallelism in systems by executing dataflow graphs. These graphs or programs represent data dependencies among instructions and execution of the graph proceeds in a data-driven manner. That is, an instruction is executed as soon as all its operands are available, without waiting for any program counter to sequence its execution, as is the case in conventional von Neumann architectures. In this thesis, data flow graphs are generated during the intermediate compilation of a functional language called SISAL (Streams and Iterations in a Single Assignment Language). The Intermediate Form (IFl) is a graphical language consisting of multiple acyclic function graphs that represent a given program. Each graph consists of a sequence of nodes and edges. The nodes specify the operation and the edges indicate the dependencies between the nodes. The graphs are further connected to each other by means of implicit dependencies. The Automator package developed in this project, preprocesses these multiple IF1 graphs and translates them into a single connected graph. It converts all implicit dependencies into actual ones. Additionally, complex language constructs like For All, loops and if-then-else are treated in special ways together with their nested levels by the Automator. There is virtually no limit to the number of nested levels that can be translated by this package. The Automator's prime contribution is in translating real programs written in SISAL into a specified format required by an allocation algorithm called the Balanced Layered Allocation Scheme (BLAS). BLAS partitions a connected graph into independent tasks and assigns them to processors in a multicomputer system. The problem of program allocation lies in maximizing parallelism while minimizing interprocessor communication costs. Hence, allocation is based on the best choice of communication to execution ratio for each task. BLAS utilizes heuristic rules to find a balance between computation and communication costs in the target system. Here the target architecture is a simulated nCUBE 3E computer, having a hypercube topology. Simulations show that, BLAS is effective in reducing the overall execution time of a program by considering the communication costs on the execution times. The results will help in understanding the effects in packing nodes (grain-packing), routing issues in the network and in general, the allocation problem to any processor in a network. In addition, tasks have also been assigned to adjacent processors only, instead of any processor on the hypercube network. The adjacent allocation to processors helps to determine trade-offs required between achieved speed-ups and the time it takes to completely allocate large graphs on compilation. === Graduation date: 1994
author2 Lee, Ben
author_facet Lee, Ben
Raisinghani, Manoj H.
author Raisinghani, Manoj H.
author_sort Raisinghani, Manoj H.
title Allocation of SISAL program graphs to processors using BLAS
title_short Allocation of SISAL program graphs to processors using BLAS
title_full Allocation of SISAL program graphs to processors using BLAS
title_fullStr Allocation of SISAL program graphs to processors using BLAS
title_full_unstemmed Allocation of SISAL program graphs to processors using BLAS
title_sort allocation of sisal program graphs to processors using blas
publishDate 2012
url http://hdl.handle.net/1957/35764
work_keys_str_mv AT raisinghanimanojh allocationofsisalprogramgraphstoprocessorsusingblas
_version_ 1716393819300167680