Sparse Cholesky Factorization on FPGA Using Parameterized Model
Cholesky factorization is a fundamental problem in most engineering and science computation applications. When dealing with a large sparse matrix, numerical decomposition consumes the most time. We present a vector architecture to parallelize numerical decomposition of Cholesky factorization. We con...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Hindawi Limited
2017-01-01
|
Series: | Mathematical Problems in Engineering |
Online Access: | http://dx.doi.org/10.1155/2017/3021591 |
id |
doaj-db6053044f65462ba24e5d3db1976ff1 |
---|---|
record_format |
Article |
spelling |
doaj-db6053044f65462ba24e5d3db1976ff12020-11-25T00:52:53ZengHindawi LimitedMathematical Problems in Engineering1024-123X1563-51472017-01-01201710.1155/2017/30215913021591Sparse Cholesky Factorization on FPGA Using Parameterized ModelYichun Sun0Hengzhu Liu1Tong Zhou2School of Computer, National University of Defense Technology, Deya Road No. 109, Kaifu District, Changsha, Hunan 410073, ChinaSchool of Computer, National University of Defense Technology, Deya Road No. 109, Kaifu District, Changsha, Hunan 410073, ChinaSchool of Computer, National University of Defense Technology, Deya Road No. 109, Kaifu District, Changsha, Hunan 410073, ChinaCholesky factorization is a fundamental problem in most engineering and science computation applications. When dealing with a large sparse matrix, numerical decomposition consumes the most time. We present a vector architecture to parallelize numerical decomposition of Cholesky factorization. We construct an integrated analytical parameterized performance model to accurately predict the execution times of typical matrices under varying parameters. Our proposed approach is general for accelerator and limited by neither field-programmable gate arrays (FPGAs) nor application-specific integrated circuit. We implement a simplified module in FPGAs to prove the accuracy of the model. The experiments show that, for most cases, the performance differences between the predicted and measured execution are less than 10%. Based on the performance model, we optimize parameters and obtain a balance of resources and performance after analyzing the performance of varied parameter settings. Comparing with the state-of-the-art implementation in CPU and GPU, we find that the performance of the optimal parameters is 2x that of CPU. Our model offers several advantages, particularly in power consumption. It provides guidance for the design of future acceleration components.http://dx.doi.org/10.1155/2017/3021591 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Yichun Sun Hengzhu Liu Tong Zhou |
spellingShingle |
Yichun Sun Hengzhu Liu Tong Zhou Sparse Cholesky Factorization on FPGA Using Parameterized Model Mathematical Problems in Engineering |
author_facet |
Yichun Sun Hengzhu Liu Tong Zhou |
author_sort |
Yichun Sun |
title |
Sparse Cholesky Factorization on FPGA Using Parameterized Model |
title_short |
Sparse Cholesky Factorization on FPGA Using Parameterized Model |
title_full |
Sparse Cholesky Factorization on FPGA Using Parameterized Model |
title_fullStr |
Sparse Cholesky Factorization on FPGA Using Parameterized Model |
title_full_unstemmed |
Sparse Cholesky Factorization on FPGA Using Parameterized Model |
title_sort |
sparse cholesky factorization on fpga using parameterized model |
publisher |
Hindawi Limited |
series |
Mathematical Problems in Engineering |
issn |
1024-123X 1563-5147 |
publishDate |
2017-01-01 |
description |
Cholesky factorization is a fundamental problem in most engineering and science computation applications. When dealing with a large sparse matrix, numerical decomposition consumes the most time. We present a vector architecture to parallelize numerical decomposition of Cholesky factorization. We construct an integrated analytical parameterized performance model to accurately predict the execution times of typical matrices under varying parameters. Our proposed approach is general for accelerator and limited by neither field-programmable gate arrays (FPGAs) nor application-specific integrated circuit. We implement a simplified module in FPGAs to prove the accuracy of the model. The experiments show that, for most cases, the performance differences between the predicted and measured execution are less than 10%. Based on the performance model, we optimize parameters and obtain a balance of resources and performance after analyzing the performance of varied parameter settings. Comparing with the state-of-the-art implementation in CPU and GPU, we find that the performance of the optimal parameters is 2x that of CPU. Our model offers several advantages, particularly in power consumption. It provides guidance for the design of future acceleration components. |
url |
http://dx.doi.org/10.1155/2017/3021591 |
work_keys_str_mv |
AT yichunsun sparsecholeskyfactorizationonfpgausingparameterizedmodel AT hengzhuliu sparsecholeskyfactorizationonfpgausingparameterizedmodel AT tongzhou sparsecholeskyfactorizationonfpgausingparameterizedmodel |
_version_ |
1725240403693666304 |