SURAA: A Novel Method and Tool for Loadbalanced and Coalesced SpMV Computations on GPUs

Sparse matrix-vector (SpMV) multiplication is a vital building block for numerous scientific and engineering applications. This paper proposes SURAA (translates to speed in arabic), a novel method for SpMV computations on graphics processing units (GPUs). The novelty lies in the way we group matrix...

Full description

Bibliographic Details
Main Authors:	Thaha Muhammed, Rashid Mehmood, Aiiad Albeshri, Iyad Katib
Format:	Article
Language:	English
Published:	MDPI AG 2019-03-01
Series:	Applied Sciences
Subjects:	sparse matrix-vector multiplication (SpMV) high performance computing (HPC) graphics processing units general-purpose computing on graphics processing units (GPGPUs) iterative methods data analysis sparse matrix storage load balancing coalesced memory access thread divergence Freedman–Diaconis rule
Online Access:	http://www.mdpi.com/2076-3417/9/5/947

id	doaj-20e979a3eef34c76825122011a76ac67
record_format	Article
spelling	doaj-20e979a3eef34c76825122011a76ac672020-11-24T21:23:09ZengMDPI AGApplied Sciences2076-34172019-03-019594710.3390/app9050947app9050947SURAA: A Novel Method and Tool for Loadbalanced and Coalesced SpMV Computations on GPUsThaha Muhammed0Rashid Mehmood1Aiiad Albeshri2Iyad Katib3Department of Computer Science, King Abdulzaziz University, Jeddah 21589, Saudi ArabiaHigh Performance Computing Center, King Abdulzaziz University, Jeddah 21589, Saudi ArabiaDepartment of Computer Science, King Abdulzaziz University, Jeddah 21589, Saudi ArabiaDepartment of Computer Science, King Abdulzaziz University, Jeddah 21589, Saudi ArabiaSparse matrix-vector (SpMV) multiplication is a vital building block for numerous scientific and engineering applications. This paper proposes SURAA (translates to speed in arabic), a novel method for SpMV computations on graphics processing units (GPUs). The novelty lies in the way we group matrix rows into different segments, and adaptively schedule various segments to different types of kernels. The sparse matrix data structure is created by sorting the rows of the matrix on the basis of the nonzero elements per row ( n p r) and forming segments of equal size (containing approximately an equal number of nonzero elements per row) using the Freedman–Diaconis rule. The segments are assembled into three groups based on the mean n p r of the segments. For each group, we use multiple kernels to execute the group segments on different streams. Hence, the number of threads to execute each segment is adaptively chosen. Dynamic Parallelism available in Nvidia GPUs is utilized to execute the group containing segments with the largest mean n p r, providing improved load balancing and coalesced memory access, and hence more efficient SpMV computations on GPUs. Therefore, SURAA minimizes the adverse effects of the n p r variance by uniformly distributing the load using equal sized segments. We implement the SURAA method as a tool and compare its performance with the de facto best commercial (cuSPARSE) and open source (CUSP, MAGMA) tools using widely used benchmarks comprising 26 high n p r v a r i a n c e matrices from 13 diverse domains. SURAA outperforms the other tools by delivering 13.99x speedup on average. We believe that our approach provides a fundamental shift in addressing SpMV related challenges on GPUs including coalesced memory access, thread divergence, and load balancing, and is set to open new avenues for further improving SpMV performance in the future.http://www.mdpi.com/2076-3417/9/5/947sparse matrix-vector multiplication (SpMV)high performance computing (HPC)graphics processing unitsgeneral-purpose computing on graphics processing units (GPGPUs)iterative methodsdata analysissparse matrix storageload balancingcoalesced memory accessthread divergenceFreedman–Diaconis rule
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Thaha Muhammed Rashid Mehmood Aiiad Albeshri Iyad Katib
spellingShingle	Thaha Muhammed Rashid Mehmood Aiiad Albeshri Iyad Katib SURAA: A Novel Method and Tool for Loadbalanced and Coalesced SpMV Computations on GPUs Applied Sciences sparse matrix-vector multiplication (SpMV) high performance computing (HPC) graphics processing units general-purpose computing on graphics processing units (GPGPUs) iterative methods data analysis sparse matrix storage load balancing coalesced memory access thread divergence Freedman–Diaconis rule
author_facet	Thaha Muhammed Rashid Mehmood Aiiad Albeshri Iyad Katib
author_sort	Thaha Muhammed
title	SURAA: A Novel Method and Tool for Loadbalanced and Coalesced SpMV Computations on GPUs
title_short	SURAA: A Novel Method and Tool for Loadbalanced and Coalesced SpMV Computations on GPUs
title_full	SURAA: A Novel Method and Tool for Loadbalanced and Coalesced SpMV Computations on GPUs
title_fullStr	SURAA: A Novel Method and Tool for Loadbalanced and Coalesced SpMV Computations on GPUs
title_full_unstemmed	SURAA: A Novel Method and Tool for Loadbalanced and Coalesced SpMV Computations on GPUs
title_sort	suraa: a novel method and tool for loadbalanced and coalesced spmv computations on gpus
publisher	MDPI AG
series	Applied Sciences
issn	2076-3417
publishDate	2019-03-01
description	Sparse matrix-vector (SpMV) multiplication is a vital building block for numerous scientific and engineering applications. This paper proposes SURAA (translates to speed in arabic), a novel method for SpMV computations on graphics processing units (GPUs). The novelty lies in the way we group matrix rows into different segments, and adaptively schedule various segments to different types of kernels. The sparse matrix data structure is created by sorting the rows of the matrix on the basis of the nonzero elements per row ( n p r) and forming segments of equal size (containing approximately an equal number of nonzero elements per row) using the Freedman–Diaconis rule. The segments are assembled into three groups based on the mean n p r of the segments. For each group, we use multiple kernels to execute the group segments on different streams. Hence, the number of threads to execute each segment is adaptively chosen. Dynamic Parallelism available in Nvidia GPUs is utilized to execute the group containing segments with the largest mean n p r, providing improved load balancing and coalesced memory access, and hence more efficient SpMV computations on GPUs. Therefore, SURAA minimizes the adverse effects of the n p r variance by uniformly distributing the load using equal sized segments. We implement the SURAA method as a tool and compare its performance with the de facto best commercial (cuSPARSE) and open source (CUSP, MAGMA) tools using widely used benchmarks comprising 26 high n p r v a r i a n c e matrices from 13 diverse domains. SURAA outperforms the other tools by delivering 13.99x speedup on average. We believe that our approach provides a fundamental shift in addressing SpMV related challenges on GPUs including coalesced memory access, thread divergence, and load balancing, and is set to open new avenues for further improving SpMV performance in the future.
topic	sparse matrix-vector multiplication (SpMV) high performance computing (HPC) graphics processing units general-purpose computing on graphics processing units (GPGPUs) iterative methods data analysis sparse matrix storage load balancing coalesced memory access thread divergence Freedman–Diaconis rule
url	http://www.mdpi.com/2076-3417/9/5/947
work_keys_str_mv	AT thahamuhammed suraaanovelmethodandtoolforloadbalancedandcoalescedspmvcomputationsongpus AT rashidmehmood suraaanovelmethodandtoolforloadbalancedandcoalescedspmvcomputationsongpus AT aiiadalbeshri suraaanovelmethodandtoolforloadbalancedandcoalescedspmvcomputationsongpus AT iyadkatib suraaanovelmethodandtoolforloadbalancedandcoalescedspmvcomputationsongpus
_version_	1725993388850806784

SURAA: A Novel Method and Tool for Loadbalanced and Coalesced SpMV Computations on GPUs

Similar Items