Hardware Consolidation Of Systolic Algorithms On A Coarse Grained Runtime Reconfigurable Architecture

Application domains such as Bio-informatics, DSP, Structural Biology, Fluid Dynamics, high resolution direction finding, state estimation, adaptive noise cancellation etc. demand high performance computing solutions for their simulation environments. The core computations of these applications are i...

Full description

Bibliographic Details
Main Author: Biswas, Prasenjit
Other Authors: Nandy, S K
Language:en_US
Published: 2013
Subjects:
Online Access:http://etd.iisc.ernet.in/handle/2005/2108
http://etd.ncsi.iisc.ernet.in/abstracts/2705/G24895-Abs.pdf
id ndltd-IISc-oai-etd.ncsi.iisc.ernet.in-2005-2108
record_format oai_dc
spelling ndltd-IISc-oai-etd.ncsi.iisc.ernet.in-2005-21082018-01-10T03:36:25ZHardware Consolidation Of Systolic Algorithms On A Coarse Grained Runtime Reconfigurable ArchitectureBiswas, PrasenjitComputer ArchitectureSystolic AlgorithmsREDEFINENumerical Linear Algebra KernelsNLA KernelsCustom Functional Units (CFU)Computer ScienceApplication domains such as Bio-informatics, DSP, Structural Biology, Fluid Dynamics, high resolution direction finding, state estimation, adaptive noise cancellation etc. demand high performance computing solutions for their simulation environments. The core computations of these applications are in Numerical Linear Algebra (NLA) kernels. Direct solvers are predominantly required in the domains like DSP, estimation algorithms like Kalman Filter etc, where the matrices on which operations need to be performed are either small or medium sized, but dense. Faddeev's Algorithm is often used for solving dense linear system of equations. Modified Faddeev's algorithm (MFA) is a general algorithm on which LU decomposition, QR factorization or SVD of matrices can be realized. MFA has the good property of realizing a host of matrix operations by computing the Schur complements on four blocked matrices, thereby reducing the overall computation requirements. We will use MFA as a representative Direct Solver in this work. We further discuss Given's rotation based QR algorithm for Decomposition of any matrix, often used to solve the linear least square problem. Systolic Array Architectures are widely accepted ASIC solutions for NLA algorithms. But the \can of worms" associated with this traditional solution spawns the need for alternative solutions. While popular custom hardware solution in form of systolic arrays can deliver high performance, but because of their rigid structure they are not scalable and reconfigurable, and hence not commercially viable. We show how a Reconfigurable computing platform can serve to contain the \can of worms". REDEFINE, a coarse grained runtime reconfigurable architecture has been used for systolic actualization of NLA kernels. We elaborate upon streaming NLA-specific enhancements to REDEFINE in order to meet expected performance goals. We explore the need for an algorithm aware custom compilation framework. We bring about a proposition to realize Faddeev's Algorithm on REDEFINE. We show that REDEFINE performs several times faster than traditional GPPs. Further we direct our interest to QR Decomposition to be the next NLA kernel as it ensures better stability than LU and other decompositions. We use QR Decomposition as a case study to explore the design space of the proposed solution on REDEFINE. We also investigate the architectural details of the Custom Functional Units (CFU) for these NLA kernels. We determine the right size of the sub-array in accordance with the optimal pipeline depth of the core execution units and the number of such units to be used per sub-array. The framework used to realize QR Decomposition can be generalized for the realization of other algorithms dealing with decompositions like LU, Faddeev's Algorithm, Gauss-Jordon etc with different CFU definitions .Nandy, S K2013-07-10T07:56:16Z2013-07-10T07:56:16Z2013-07-102011-07Thesishttp://etd.iisc.ernet.in/handle/2005/2108http://etd.ncsi.iisc.ernet.in/abstracts/2705/G24895-Abs.pdfen_USG24895
collection NDLTD
language en_US
sources NDLTD
topic Computer Architecture
Systolic Algorithms
REDEFINE
Numerical Linear Algebra Kernels
NLA Kernels
Custom Functional Units (CFU)
Computer Science
spellingShingle Computer Architecture
Systolic Algorithms
REDEFINE
Numerical Linear Algebra Kernels
NLA Kernels
Custom Functional Units (CFU)
Computer Science
Biswas, Prasenjit
Hardware Consolidation Of Systolic Algorithms On A Coarse Grained Runtime Reconfigurable Architecture
description Application domains such as Bio-informatics, DSP, Structural Biology, Fluid Dynamics, high resolution direction finding, state estimation, adaptive noise cancellation etc. demand high performance computing solutions for their simulation environments. The core computations of these applications are in Numerical Linear Algebra (NLA) kernels. Direct solvers are predominantly required in the domains like DSP, estimation algorithms like Kalman Filter etc, where the matrices on which operations need to be performed are either small or medium sized, but dense. Faddeev's Algorithm is often used for solving dense linear system of equations. Modified Faddeev's algorithm (MFA) is a general algorithm on which LU decomposition, QR factorization or SVD of matrices can be realized. MFA has the good property of realizing a host of matrix operations by computing the Schur complements on four blocked matrices, thereby reducing the overall computation requirements. We will use MFA as a representative Direct Solver in this work. We further discuss Given's rotation based QR algorithm for Decomposition of any matrix, often used to solve the linear least square problem. Systolic Array Architectures are widely accepted ASIC solutions for NLA algorithms. But the \can of worms" associated with this traditional solution spawns the need for alternative solutions. While popular custom hardware solution in form of systolic arrays can deliver high performance, but because of their rigid structure they are not scalable and reconfigurable, and hence not commercially viable. We show how a Reconfigurable computing platform can serve to contain the \can of worms". REDEFINE, a coarse grained runtime reconfigurable architecture has been used for systolic actualization of NLA kernels. We elaborate upon streaming NLA-specific enhancements to REDEFINE in order to meet expected performance goals. We explore the need for an algorithm aware custom compilation framework. We bring about a proposition to realize Faddeev's Algorithm on REDEFINE. We show that REDEFINE performs several times faster than traditional GPPs. Further we direct our interest to QR Decomposition to be the next NLA kernel as it ensures better stability than LU and other decompositions. We use QR Decomposition as a case study to explore the design space of the proposed solution on REDEFINE. We also investigate the architectural details of the Custom Functional Units (CFU) for these NLA kernels. We determine the right size of the sub-array in accordance with the optimal pipeline depth of the core execution units and the number of such units to be used per sub-array. The framework used to realize QR Decomposition can be generalized for the realization of other algorithms dealing with decompositions like LU, Faddeev's Algorithm, Gauss-Jordon etc with different CFU definitions .
author2 Nandy, S K
author_facet Nandy, S K
Biswas, Prasenjit
author Biswas, Prasenjit
author_sort Biswas, Prasenjit
title Hardware Consolidation Of Systolic Algorithms On A Coarse Grained Runtime Reconfigurable Architecture
title_short Hardware Consolidation Of Systolic Algorithms On A Coarse Grained Runtime Reconfigurable Architecture
title_full Hardware Consolidation Of Systolic Algorithms On A Coarse Grained Runtime Reconfigurable Architecture
title_fullStr Hardware Consolidation Of Systolic Algorithms On A Coarse Grained Runtime Reconfigurable Architecture
title_full_unstemmed Hardware Consolidation Of Systolic Algorithms On A Coarse Grained Runtime Reconfigurable Architecture
title_sort hardware consolidation of systolic algorithms on a coarse grained runtime reconfigurable architecture
publishDate 2013
url http://etd.iisc.ernet.in/handle/2005/2108
http://etd.ncsi.iisc.ernet.in/abstracts/2705/G24895-Abs.pdf
work_keys_str_mv AT biswasprasenjit hardwareconsolidationofsystolicalgorithmsonacoarsegrainedruntimereconfigurablearchitecture
_version_ 1718603658558439424