Sparse Cholesky Factorization on FPGA Using Parameterized Model

Cholesky factorization is a fundamental problem in most engineering and science computation applications. When dealing with a large sparse matrix, numerical decomposition consumes the most time. We present a vector architecture to parallelize numerical decomposition of Cholesky factorization. We con...

Full description

Bibliographic Details
Main Authors: Yichun Sun, Hengzhu Liu, Tong Zhou
Format: Article
Language:English
Published: Hindawi Limited 2017-01-01
Series:Mathematical Problems in Engineering
Online Access:http://dx.doi.org/10.1155/2017/3021591
id doaj-db6053044f65462ba24e5d3db1976ff1
record_format Article
spelling doaj-db6053044f65462ba24e5d3db1976ff12020-11-25T00:52:53ZengHindawi LimitedMathematical Problems in Engineering1024-123X1563-51472017-01-01201710.1155/2017/30215913021591Sparse Cholesky Factorization on FPGA Using Parameterized ModelYichun Sun0Hengzhu Liu1Tong Zhou2School of Computer, National University of Defense Technology, Deya Road No. 109, Kaifu District, Changsha, Hunan 410073, ChinaSchool of Computer, National University of Defense Technology, Deya Road No. 109, Kaifu District, Changsha, Hunan 410073, ChinaSchool of Computer, National University of Defense Technology, Deya Road No. 109, Kaifu District, Changsha, Hunan 410073, ChinaCholesky factorization is a fundamental problem in most engineering and science computation applications. When dealing with a large sparse matrix, numerical decomposition consumes the most time. We present a vector architecture to parallelize numerical decomposition of Cholesky factorization. We construct an integrated analytical parameterized performance model to accurately predict the execution times of typical matrices under varying parameters. Our proposed approach is general for accelerator and limited by neither field-programmable gate arrays (FPGAs) nor application-specific integrated circuit. We implement a simplified module in FPGAs to prove the accuracy of the model. The experiments show that, for most cases, the performance differences between the predicted and measured execution are less than 10%. Based on the performance model, we optimize parameters and obtain a balance of resources and performance after analyzing the performance of varied parameter settings. Comparing with the state-of-the-art implementation in CPU and GPU, we find that the performance of the optimal parameters is 2x that of CPU. Our model offers several advantages, particularly in power consumption. It provides guidance for the design of future acceleration components.http://dx.doi.org/10.1155/2017/3021591
collection DOAJ
language English
format Article
sources DOAJ
author Yichun Sun
Hengzhu Liu
Tong Zhou
spellingShingle Yichun Sun
Hengzhu Liu
Tong Zhou
Sparse Cholesky Factorization on FPGA Using Parameterized Model
Mathematical Problems in Engineering
author_facet Yichun Sun
Hengzhu Liu
Tong Zhou
author_sort Yichun Sun
title Sparse Cholesky Factorization on FPGA Using Parameterized Model
title_short Sparse Cholesky Factorization on FPGA Using Parameterized Model
title_full Sparse Cholesky Factorization on FPGA Using Parameterized Model
title_fullStr Sparse Cholesky Factorization on FPGA Using Parameterized Model
title_full_unstemmed Sparse Cholesky Factorization on FPGA Using Parameterized Model
title_sort sparse cholesky factorization on fpga using parameterized model
publisher Hindawi Limited
series Mathematical Problems in Engineering
issn 1024-123X
1563-5147
publishDate 2017-01-01
description Cholesky factorization is a fundamental problem in most engineering and science computation applications. When dealing with a large sparse matrix, numerical decomposition consumes the most time. We present a vector architecture to parallelize numerical decomposition of Cholesky factorization. We construct an integrated analytical parameterized performance model to accurately predict the execution times of typical matrices under varying parameters. Our proposed approach is general for accelerator and limited by neither field-programmable gate arrays (FPGAs) nor application-specific integrated circuit. We implement a simplified module in FPGAs to prove the accuracy of the model. The experiments show that, for most cases, the performance differences between the predicted and measured execution are less than 10%. Based on the performance model, we optimize parameters and obtain a balance of resources and performance after analyzing the performance of varied parameter settings. Comparing with the state-of-the-art implementation in CPU and GPU, we find that the performance of the optimal parameters is 2x that of CPU. Our model offers several advantages, particularly in power consumption. It provides guidance for the design of future acceleration components.
url http://dx.doi.org/10.1155/2017/3021591
work_keys_str_mv AT yichunsun sparsecholeskyfactorizationonfpgausingparameterizedmodel
AT hengzhuliu sparsecholeskyfactorizationonfpgausingparameterizedmodel
AT tongzhou sparsecholeskyfactorizationonfpgausingparameterizedmodel
_version_ 1725240403693666304