SuperTaco : Taco Tensor Algebra kernels on distributed systems using Legion

This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. === Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019 === Cataloged from student-sub...

Full description

Bibliographic Details
Main Author: Shinde, Sachin Dilip.
Other Authors: Saman Amarasinghe.
Format: Others
Language:English
Published: Massachusetts Institute of Technology 2019
Subjects:
Online Access:https://hdl.handle.net/1721.1/121683
Description
Summary:This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. === Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019 === Cataloged from student-submitted PDF version of thesis. === Includes bibliographical references (pages 89-91). === Tensor algebra is a powerful language for expressing computation on multidimensional data. While many tensor datasets are sparse, most tensor algebra libraries have limited support for handling sparsity. The Tensor Algebra Compiler (Taco) has introduced a taxonomy for sparse tensor formats that has allowed them to compile sparse tensor algebra expressions to performant C code, but they have not taken advantage of distributed systems. This work provides a code generation technique for creating Legion programs that distribute the computation of Taco tensor algebra kernels across distributed systems, and a scheduling language for controlling how this distributed computation is structured. This technique is implemented in the form of a command-line tool called SuperTaco. We perform a strong scaling analysis for the SpMV and TTM kernels under a row blocking distribution schedule, and find speedups of 9-10x when using 20 cores on a single node. For multi-node systems using 20 cores per node, SpMV achieves a 33.3x speedup at 160 cores and TTM achieves a 42.0x speedup at 140 cores. === by Sachin Dilip Shinde. === M. Eng. === M.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science