Portable and productive high-performance computing

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017. === Cataloged from PDF version of thesis. === Includes bibliographical references (pages 115-120). === Performance portability of computer programs, and programmer productivity in...

Full description

Bibliographic Details
Main Author: Palamadai Natarajan, Ekanathan
Other Authors: Alan Edelman.
Format: Others
Language:English
Published: Massachusetts Institute of Technology 2017
Subjects:
Online Access:http://hdl.handle.net/1721.1/108988
id ndltd-MIT-oai-dspace.mit.edu-1721.1-108988
record_format oai_dc
spelling ndltd-MIT-oai-dspace.mit.edu-1721.1-1089882019-05-02T15:51:05Z Portable and productive high-performance computing Palamadai Natarajan, Ekanathan Alan Edelman. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017. Cataloged from PDF version of thesis. Includes bibliographical references (pages 115-120). Performance portability of computer programs, and programmer productivity in writing them are key expectations in software engineering. These expectations lead to the following questions: Can programmers write code once, and execute it at optimal speed on any machine configuration? Can programmers write parallel code to simple models that hide the complex details of parallel programming? This thesis addresses these questions for certain "classes" of computer programs. It describes "autotuning" techniques that achieve performance portability for serial divide-and-conquer programs, and an abstraction that improves programmer productivity in writing parallel code for a class of programs called "Star". We present a "pruned-exhaustive" autotuner called Ztune that optimizes the performance of serial divide-and-conquer programs for a given machine configuration. Whereas the traditional way of autotuning divide-and-conquer programs involves simply coarsening the base case of recursion optimally, Ztune searches for optimal divide-and-conquer trees. Although Ztune, in principle, exhaustively enumerates the search domain, it uses pruning properties that greatly reduce the size of the search domain without significantly sacrificing the quality of the autotuned code. We illustrate how to autotune divide-and-conquer stencil computations using Ztune, and present performance comparisons with state-of-the-art "heuristic" autotuning. Not only does Ztune autotune significantly faster than a heuristic autotuner, the Ztuned programs also run faster on average than their heuristic autotuner tuned counterparts. Surprisingly, for some stencil benchmarks, Ztune actually autotuned faster than the time it takes to execute the stencil computation once. We introduce the Star class that includes many seemingly different programs like solving symmetric, diagonally-dominant tridiagonal systems, executing "watershed" cuts on graphs, sample sort, fast multipole computations, and all-prefix-sums and its various applications. We present a programming model, which is also called Star, to generate and execute parallel code for the Star class of programs. The Star model abstracts the pattern of computation and interprocessor communication in the Star class of programs, hides low-level parallel programming details, and offers ease of expression, thereby improving programmer productivity in writing parallel code. Besides, we also present parallel algorithms, which offer asymptotic improvements over prior art, for two programs in the Star class - a Trip algorithm for solving symmetric, diagonally-dominant tridiagonal systems, and a Wasp algorithm for executing watershed cuts on graphs. The Star model is implemented in the Julia programming language, and leverages Julia's capabilities in expressing parallelism in code concisely, and in supporting both shared-memory and distributed-memory parallel programming alike. by Ekanathan Palamadai Natarajan. Ph. D. 2017-05-11T19:59:21Z 2017-05-11T19:59:21Z 2017 2017 Thesis http://hdl.handle.net/1721.1/108988 986521692 eng MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582 120 pages application/pdf Massachusetts Institute of Technology
collection NDLTD
language English
format Others
sources NDLTD
topic Electrical Engineering and Computer Science.
spellingShingle Electrical Engineering and Computer Science.
Palamadai Natarajan, Ekanathan
Portable and productive high-performance computing
description Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017. === Cataloged from PDF version of thesis. === Includes bibliographical references (pages 115-120). === Performance portability of computer programs, and programmer productivity in writing them are key expectations in software engineering. These expectations lead to the following questions: Can programmers write code once, and execute it at optimal speed on any machine configuration? Can programmers write parallel code to simple models that hide the complex details of parallel programming? This thesis addresses these questions for certain "classes" of computer programs. It describes "autotuning" techniques that achieve performance portability for serial divide-and-conquer programs, and an abstraction that improves programmer productivity in writing parallel code for a class of programs called "Star". We present a "pruned-exhaustive" autotuner called Ztune that optimizes the performance of serial divide-and-conquer programs for a given machine configuration. Whereas the traditional way of autotuning divide-and-conquer programs involves simply coarsening the base case of recursion optimally, Ztune searches for optimal divide-and-conquer trees. Although Ztune, in principle, exhaustively enumerates the search domain, it uses pruning properties that greatly reduce the size of the search domain without significantly sacrificing the quality of the autotuned code. We illustrate how to autotune divide-and-conquer stencil computations using Ztune, and present performance comparisons with state-of-the-art "heuristic" autotuning. Not only does Ztune autotune significantly faster than a heuristic autotuner, the Ztuned programs also run faster on average than their heuristic autotuner tuned counterparts. Surprisingly, for some stencil benchmarks, Ztune actually autotuned faster than the time it takes to execute the stencil computation once. We introduce the Star class that includes many seemingly different programs like solving symmetric, diagonally-dominant tridiagonal systems, executing "watershed" cuts on graphs, sample sort, fast multipole computations, and all-prefix-sums and its various applications. We present a programming model, which is also called Star, to generate and execute parallel code for the Star class of programs. The Star model abstracts the pattern of computation and interprocessor communication in the Star class of programs, hides low-level parallel programming details, and offers ease of expression, thereby improving programmer productivity in writing parallel code. Besides, we also present parallel algorithms, which offer asymptotic improvements over prior art, for two programs in the Star class - a Trip algorithm for solving symmetric, diagonally-dominant tridiagonal systems, and a Wasp algorithm for executing watershed cuts on graphs. The Star model is implemented in the Julia programming language, and leverages Julia's capabilities in expressing parallelism in code concisely, and in supporting both shared-memory and distributed-memory parallel programming alike. === by Ekanathan Palamadai Natarajan. === Ph. D.
author2 Alan Edelman.
author_facet Alan Edelman.
Palamadai Natarajan, Ekanathan
author Palamadai Natarajan, Ekanathan
author_sort Palamadai Natarajan, Ekanathan
title Portable and productive high-performance computing
title_short Portable and productive high-performance computing
title_full Portable and productive high-performance computing
title_fullStr Portable and productive high-performance computing
title_full_unstemmed Portable and productive high-performance computing
title_sort portable and productive high-performance computing
publisher Massachusetts Institute of Technology
publishDate 2017
url http://hdl.handle.net/1721.1/108988
work_keys_str_mv AT palamadainatarajanekanathan portableandproductivehighperformancecomputing
_version_ 1719029252885577728