Hardware Consolidation Of Systolic Algorithms On A Coarse Grained Runtime Reconfigurable Architecture

Application domains such as Bio-informatics, DSP, Structural Biology, Fluid Dynamics, high resolution direction finding, state estimation, adaptive noise cancellation etc. demand high performance computing solutions for their simulation environments. The core computations of these applications are i...

Full description

Bibliographic Details
Main Author:	Biswas, Prasenjit
Other Authors:	Nandy, S K
Language:	en_US
Published:	2013
Subjects:	Computer Architecture Systolic Algorithms REDEFINE Numerical Linear Algebra Kernels NLA Kernels Custom Functional Units (CFU) Computer Science
Online Access:	http://etd.iisc.ernet.in/handle/2005/2108 http://etd.ncsi.iisc.ernet.in/abstracts/2705/G24895-Abs.pdf

id	ndltd-IISc-oai-etd.ncsi.iisc.ernet.in-2005-2108
record_format	oai_dc
spelling	ndltd-IISc-oai-etd.ncsi.iisc.ernet.in-2005-21082018-01-10T03:36:25ZHardware Consolidation Of Systolic Algorithms On A Coarse Grained Runtime Reconfigurable ArchitectureBiswas, PrasenjitComputer ArchitectureSystolic AlgorithmsREDEFINENumerical Linear Algebra KernelsNLA KernelsCustom Functional Units (CFU)Computer ScienceApplication domains such as Bio-informatics, DSP, Structural Biology, Fluid Dynamics, high resolution direction finding, state estimation, adaptive noise cancellation etc. demand high performance computing solutions for their simulation environments. The core computations of these applications are in Numerical Linear Algebra (NLA) kernels. Direct solvers are predominantly required in the domains like DSP, estimation algorithms like Kalman Filter etc, where the matrices on which operations need to be performed are either small or medium sized, but dense. Faddeev's Algorithm is often used for solving dense linear system of equations. Modified Faddeev's algorithm (MFA) is a general algorithm on which LU decomposition, QR factorization or SVD of matrices can be realized. MFA has the good property of realizing a host of matrix operations by computing the Schur complements on four blocked matrices, thereby reducing the overall computation requirements. We will use MFA as a representative Direct Solver in this work. We further discuss Given's rotation based QR algorithm for Decomposition of any matrix, often used to solve the linear least square problem. Systolic Array Architectures are widely accepted ASIC solutions for NLA algorithms. But the \can of worms" associated with this traditional solution spawns the need for alternative solutions. While popular custom hardware solution in form of systolic arrays can deliver high performance, but because of their rigid structure they are not scalable and reconfigurable, and hence not commercially viable. We show how a Reconfigurable computing platform can serve to contain the \can of worms". REDEFINE, a coarse grained runtime reconfigurable architecture has been used for systolic actualization of NLA kernels. We elaborate upon streaming NLA-specific enhancements to REDEFINE in order to meet expected performance goals. We explore the need for an algorithm aware custom compilation framework. We bring about a proposition to realize Faddeev's Algorithm on REDEFINE. We show that REDEFINE performs several times faster than traditional GPPs. Further we direct our interest to QR Decomposition to be the next NLA kernel as it ensures better stability than LU and other decompositions. We use QR Decomposition as a case study to explore the design space of the proposed solution on REDEFINE. We also investigate the architectural details of the Custom Functional Units (CFU) for these NLA kernels. We determine the right size of the sub-array in accordance with the optimal pipeline depth of the core execution units and the number of such units to be used per sub-array. The framework used to realize QR Decomposition can be generalized for the realization of other algorithms dealing with decompositions like LU, Faddeev's Algorithm, Gauss-Jordon etc with different CFU definitions .Nandy, S K2013-07-10T07:56:16Z2013-07-10T07:56:16Z2013-07-102011-07Thesishttp://etd.iisc.ernet.in/handle/2005/2108http://etd.ncsi.iisc.ernet.in/abstracts/2705/G24895-Abs.pdfen_USG24895
collection	NDLTD
language	en_US
sources	NDLTD
topic	Computer Architecture Systolic Algorithms REDEFINE Numerical Linear Algebra Kernels NLA Kernels Custom Functional Units (CFU) Computer Science
spellingShingle	Computer Architecture Systolic Algorithms REDEFINE Numerical Linear Algebra Kernels NLA Kernels Custom Functional Units (CFU) Computer Science Biswas, Prasenjit Hardware Consolidation Of Systolic Algorithms On A Coarse Grained Runtime Reconfigurable Architecture
description	Application domains such as Bio-informatics, DSP, Structural Biology, Fluid Dynamics, high resolution direction finding, state estimation, adaptive noise cancellation etc. demand high performance computing solutions for their simulation environments. The core computations of these applications are in Numerical Linear Algebra (NLA) kernels. Direct solvers are predominantly required in the domains like DSP, estimation algorithms like Kalman Filter etc, where the matrices on which operations need to be performed are either small or medium sized, but dense. Faddeev's Algorithm is often used for solving dense linear system of equations. Modified Faddeev's algorithm (MFA) is a general algorithm on which LU decomposition, QR factorization or SVD of matrices can be realized. MFA has the good property of realizing a host of matrix operations by computing the Schur complements on four blocked matrices, thereby reducing the overall computation requirements. We will use MFA as a representative Direct Solver in this work. We further discuss Given's rotation based QR algorithm for Decomposition of any matrix, often used to solve the linear least square problem. Systolic Array Architectures are widely accepted ASIC solutions for NLA algorithms. But the \can of worms" associated with this traditional solution spawns the need for alternative solutions. While popular custom hardware solution in form of systolic arrays can deliver high performance, but because of their rigid structure they are not scalable and reconfigurable, and hence not commercially viable. We show how a Reconfigurable computing platform can serve to contain the \can of worms". REDEFINE, a coarse grained runtime reconfigurable architecture has been used for systolic actualization of NLA kernels. We elaborate upon streaming NLA-specific enhancements to REDEFINE in order to meet expected performance goals. We explore the need for an algorithm aware custom compilation framework. We bring about a proposition to realize Faddeev's Algorithm on REDEFINE. We show that REDEFINE performs several times faster than traditional GPPs. Further we direct our interest to QR Decomposition to be the next NLA kernel as it ensures better stability than LU and other decompositions. We use QR Decomposition as a case study to explore the design space of the proposed solution on REDEFINE. We also investigate the architectural details of the Custom Functional Units (CFU) for these NLA kernels. We determine the right size of the sub-array in accordance with the optimal pipeline depth of the core execution units and the number of such units to be used per sub-array. The framework used to realize QR Decomposition can be generalized for the realization of other algorithms dealing with decompositions like LU, Faddeev's Algorithm, Gauss-Jordon etc with different CFU definitions .
author2	Nandy, S K
author_facet	Nandy, S K Biswas, Prasenjit
author	Biswas, Prasenjit
author_sort	Biswas, Prasenjit
title	Hardware Consolidation Of Systolic Algorithms On A Coarse Grained Runtime Reconfigurable Architecture
title_short	Hardware Consolidation Of Systolic Algorithms On A Coarse Grained Runtime Reconfigurable Architecture
title_full	Hardware Consolidation Of Systolic Algorithms On A Coarse Grained Runtime Reconfigurable Architecture
title_fullStr	Hardware Consolidation Of Systolic Algorithms On A Coarse Grained Runtime Reconfigurable Architecture
title_full_unstemmed	Hardware Consolidation Of Systolic Algorithms On A Coarse Grained Runtime Reconfigurable Architecture
title_sort	hardware consolidation of systolic algorithms on a coarse grained runtime reconfigurable architecture
publishDate	2013
url	http://etd.iisc.ernet.in/handle/2005/2108 http://etd.ncsi.iisc.ernet.in/abstracts/2705/G24895-Abs.pdf
work_keys_str_mv	AT biswasprasenjit hardwareconsolidationofsystolicalgorithmsonacoarsegrainedruntimereconfigurablearchitecture
_version_	1718603658558439424

Hardware Consolidation Of Systolic Algorithms On A Coarse Grained Runtime Reconfigurable Architecture

Similar Items